Commit Graph

  • fdbfb460ed
    whisper : add OpenVINO init with state (#2464) Sandro Hanea 2024-10-08 19:08:00 +0200
  • ebca09a3d1
    release : v1.7.1 Georgi Gerganov 2024-10-07 13:06:48 +0300
  • 9f346d0084
    vulkan : retry allocation with fallback flags (#2451) SRHMorris 2024-10-06 08:34:20 +0100
  • 6a94163b91
    release : v1.7.0 Georgi Gerganov 2024-10-05 16:43:26 +0300
  • 8a35b58c4f
    scripts : bench v3-turbo Georgi Gerganov 2024-10-05 16:22:53 +0300
  • 1789abca84
    whisper : remove mel leftover constants (396089f) Georgi Gerganov 2024-10-05 16:13:03 +0300
  • 847f94fdeb whisper : zero-out the KV cache upon clear (#2445) Georgi Gerganov 2024-10-05 15:22:17 +0300
  • 6e40108a59 objc : fix build Georgi Gerganov 2024-10-05 15:18:50 +0300
  • 1ba185f4af metal : zero-init buffer contexts (#0) Georgi Gerganov 2024-10-05 14:33:54 +0300
  • 396089f3cf whisper : revert mel-related changes (#0) Georgi Gerganov 2024-10-05 14:29:45 +0300
  • 941912467d whisper : adapt to latest ggml (skip) (#0) Georgi Gerganov 2024-10-05 13:14:03 +0300
  • 0b1b094a67 ggml : fix typo in example usage ggml_gallocr_new (ggml/984) Daniel Bevenius 2024-10-04 15:46:18 +0200
  • 40e52a76b9 ggml : fixes after sync (ggml/983) Diego Devesa 2024-10-04 08:41:40 +0200
  • cf977670e6 ggml-backend : add device and backend reg interfaces (llama/9707) Diego Devesa 2024-10-03 21:25:11 +0300
  • df2c364de7 Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711) Ouadie EL FAROUKI 2024-10-03 07:50:44 +0100
  • 1acfadb721 ggml-backend : add device and backend reg interfaces (llama/9707) Diego Devesa 2024-10-03 01:49:47 +0200
  • ea642144d2 Initial cmake support of SYCL for AMD GPUs (llama/9658) Alberto Cabrera Pérez 2024-10-02 13:57:18 +0100
  • 282a8654c4 vulkan : do not use tensor->extra (llama/9407) Radoslav Gerganov 2024-10-02 13:49:16 +0300
  • 936cf3beb7 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) Johannes Gäßler 2024-10-03 17:29:59 +0200
  • bc92c2f8f0 ggml: refactor cross entropy loss CPU impl. (ggml/976) Johannes Gäßler 2024-10-02 15:32:39 +0200
  • f7d55e0614 scripts : sync ggml-backend.cpp Georgi Gerganov 2024-10-05 13:09:36 +0300
  • f62a546e03
    whisper : fix excessive memory usage (#2443) Georgi Gerganov 2024-10-05 12:36:40 +0300
  • 2944cb72d9
    examples : update dr_wav.h to newer version (#2449) Rahul Vadhyar 2024-10-04 13:34:51 +0530
  • ccc2547210 talk-llama : sync llama.cpp Georgi Gerganov 2024-10-02 15:14:46 +0300
  • 162a455402 metal : reduce command encoding overhead (llama/9698) Georgi Gerganov 2024-10-02 15:12:16 +0300
  • ff2cb0811f sync : ggml Georgi Gerganov 2024-10-02 15:11:43 +0300
  • 5e9d6baa48 test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +0200
  • 845f8d663e vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +0200
  • 31fdf05fda ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +0300
  • 0ac6666cd2 ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +0200
  • 6c91da80b8 ggml : define missing HWCAP flags (llama/9684) Georgi Gerganov 2024-09-29 21:18:23 +0300
  • c245168ba3 ggml : add run-time detection of neon, i8mm and sve (llama/9331) Dan Johansson 2024-09-28 14:06:16 +0200
  • 280fee8fa0 Enable use to the rebar feature to upload buffers to the device. (llama/9251) Markus Tavenrath 2024-09-28 12:05:05 +0200
  • 78b4c1c25f mtgpu: enable VMM (llama/9597) R0CKSTAR 2024-09-26 09:27:40 +0800
  • 1edea2eb4b ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217) Charles Xu 2024-09-25 15:12:20 +0200
  • 96808786b7 cann: fix crash when llama-bench is running on multiple cann devices (llama/9627) Dou Xinpeng 2024-09-25 11:30:38 +0800
  • bb57ecb85e CUDA: remove bad assert (ggml/972) Johannes Gäßler 2024-09-29 19:56:17 +0200
  • abdb73c7cc vulkan : multithread pipeline creation (ggml/963) Jeff Bolz 2024-09-29 11:50:17 -0500
  • 391e548a43 vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) Jeff Bolz 2024-09-27 02:58:01 -0500
  • 2a29afd4c6 vulkan : argsort barriers must be under uniform control flow (ggml/951) Salvatore Mesoraca 2024-09-26 08:59:42 +0200
  • 5963004ff9 ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969) Georgi Gerganov 2024-09-24 13:23:59 +0300
  • ede1718f6d
    server : ffmpeg overwrite leftover temp file (#2431) gilbertgong 2024-10-02 05:06:40 -0700
  • 2ef717b293
    whisper : add large-v3-turbo (#2440) Georgi Gerganov 2024-10-01 15:57:06 +0300
  • 8feb375fbd
    tests : remove test-backend-ops (#2434) Georgi Gerganov 2024-09-27 11:48:33 +0300
  • 69339af2d1
    ci : disable failing CUDA and Java builds Georgi Gerganov 2024-09-25 10:03:34 +0300
  • 0d2e2aed80
    readme : fix references to download-ggml-model.sh (#2427) Hugo 2024-09-24 20:07:51 +0200
  • 451e9ee92c make : remove "talk" target until updated Georgi Gerganov 2024-09-24 14:15:09 +0300
  • 1133ac98a8 ggml : add ggml-cpu-impl.h (skip) (#0) Georgi Gerganov 2024-09-24 13:27:33 +0300
  • 76d27eec9a sync : ggml Georgi Gerganov 2024-09-24 13:23:04 +0300
  • fe18c29ab8 talk-llama : sync llama.cpp Georgi Gerganov 2024-09-24 13:22:55 +0300
  • 234f9bd320 ggml : add AVX512DQ requirement for AVX512 builds (llama/9622) Eric Zhang 2024-09-24 16:03:21 +0800
  • 3b183cfae7 log : add CONT level for continuing previous log entry (llama/9610) Georgi Gerganov 2024-09-24 10:15:35 +0300
  • 02285dff81 threads: fix msvc build without openmp (llama/9615) Max Krasnyansky 2024-09-23 21:18:48 -0700
  • 2fc1d20f9e cuda: add q8_0->f32 cpy operation (llama/9571) Ivan 2024-09-24 03:14:24 +0300
  • 08e8414f27 threads: improve ggml_barrier scaling with large number of threads (llama/9598) Max Krasnyansky 2024-09-23 11:42:43 -0700
  • 05c6139625 ggml : AVX512 gemm for Q4_0_8_8 (llama/9532) Srihari-mcw 2024-09-23 19:36:38 +0530
  • 896c41ef30 metal : use F32 prec for K*Q in vec FA (llama/9595) Georgi Gerganov 2024-09-23 11:27:47 +0300
  • c36ddc43c6 Revert "[SYCL] fallback mmvq (ggml/9088)" (llama/9579) Akarshan Biswas 2024-09-23 08:58:06 +0530
  • 13f41af43e musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526) R0CKSTAR 2024-09-22 22:55:49 +0800
  • 3fc5306b82 Fix merge error in #9454 (llama/9589) Molly Sophia 2024-09-22 21:26:50 +0800
  • adf2474b10 CUDA: enable Gemma FA for HIP/Pascal (llama/9581) Johannes Gäßler 2024-09-22 09:34:52 +0200
  • 008816a257 RWKV v6: RWKV_WKV op CUDA implementation (llama/9454) Molly Sophia 2024-09-22 10:29:12 +0800
  • 33e5a6612e ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573) slaren 2024-09-21 14:24:23 +0200
  • f0a7d65b3d Update CUDA graph on scale change plus clear nodes/params (llama/9550) agray3 2024-09-21 01:41:07 +0100
  • 54e5095765 examples : adapt to ggml.h changes (ggml/0) Georgi Gerganov 2024-09-20 21:50:16 +0300
  • 34291099fb ggml : refactoring (llama/#0) Georgi Gerganov 2024-09-20 21:24:06 +0300
  • d245d7aec7 ggml : fix builds (llama/0) Georgi Gerganov 2024-09-20 20:12:52 +0300
  • d661283e68 ggml : fix trailing whitespace (llama/0) Georgi Gerganov 2024-09-20 19:13:02 +0300
  • c0761c95f5 CUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562) Johannes Gäßler 2024-09-20 18:35:35 +0200
  • 138e20b697 ggml : fix n_threads_cur initialization with one thread (llama/9538) slaren 2024-09-18 19:13:08 +0200
  • a8d9abfa22 threadpool : skip polling for unused threads (llama/9461) Max Krasnyansky 2024-09-17 01:19:46 -0700
  • 195afd6dc1 ggml : link MATH_LIBRARY not by its full path (llama/9339) Michael Podvitskiy 2024-09-16 13:06:50 +0200
  • 1fd78999e8 cmake : do not hide GGML options + rename option (llama/9465) Georgi Gerganov 2024-09-16 10:27:50 +0300
  • 374e9e0c5e ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422) Eve 2024-09-16 06:48:24 +0000
  • a2cb5b4183 metal : handle zero-sized allocs (llama/9466) Georgi Gerganov 2024-09-16 09:05:56 +0300
  • 288ae5176e common : reimplement logging (llama/9418) Georgi Gerganov 2024-09-15 20:46:12 +0300
  • d868122a5a cmake : correct order of sycl flags (llama/9497) Michael Podvitskiy 2024-09-15 18:55:52 +0200
  • 2ba25fb122 cmake : try to fix sycl+intel build (llama/9487) Michael Podvitskiy 2024-09-15 09:06:38 +0200
  • 4f4687cb74 ggml : ggml_type_name return "NONE" for invalid values (llama/9458) Yuri Khrustalev 2024-09-14 05:54:37 -0400
  • 66b00fad0d cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9463) Georgi Gerganov 2024-09-14 10:55:05 +0300
  • c6cc8d16c3 cann: Add host buffer type for Ascend NPU (llama/9406) Dou Xinpeng 2024-09-12 19:46:43 +0800
  • 3f8f8a78a2 riscv : modify Makefile and add a RISCV_VECT to print log info (llama/9442) Ahmad Tameem 2024-09-12 16:24:31 +0500
  • 3e47686919 cann: Fix error when running a non-exist op (llama/9424) Xinpeng Dou 2024-09-12 09:02:35 +0800
  • a53b69a003 CUDA: fix --split-mode row race condition (llama/9413) Johannes Gäßler 2024-09-11 10:22:40 +0200
  • d1c9b47360 musa: remove Clang builtins mapping (llama/9421) R0CKSTAR 2024-09-11 09:46:55 +0800
  • 32f659861a sycl : update support conditions (llama/9394) Alberto Cabrera Pérez 2024-09-11 01:53:42 +0100
  • a785232bf9 metal : fix compile warning with GGML_METAL_NDEBUG (llama/0) Georgi Gerganov 2024-09-10 10:17:03 +0300
  • 0677293503 rpc : fix segfault with nkvo (llama/9389) Radoslav Gerganov 2024-09-09 18:40:10 +0300
  • 1fbdb813c0 ggml : vector length agnostic SVE support (llama/9290) Prashant Vithule 2024-09-09 21:07:18 +0530
  • 67725ac8f3 CUDA: fix variable name conflict for Windows build (llama/9382) Johannes Gäßler 2024-09-09 14:22:53 +0200
  • dac89af357 Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (llama/9118) Markus Tavenrath 2024-09-08 21:43:48 +0200
  • 26225f1fb0 cuda : fix FA Q src index (1 -> 0) (llama/9374) Georgi Gerganov 2024-09-08 22:01:02 +0300
  • 3468983315 add check malloc result on device (llama/9346) Neo Zhang Jianyu 2024-09-08 19:05:29 +0800
  • c7515b0995 ggml/examples: add backend support for numerical optimization (ggml/949) Johannes Gäßler 2024-09-20 14:36:38 +0200
  • 253ce30004 examples : add null threadpool args where needed (ggml/0) Georgi Gerganov 2024-09-08 11:10:43 +0300
  • 03a6fae484 metal : update support condition for im2col + fix warning (llama/0) Georgi Gerganov 2024-09-08 09:57:57 +0300
  • d37fd275fd ggml : always check bounds on get_rows operations (llama/9354) slaren 2024-09-07 20:23:07 +0200
  • 195877fd72 ggml : fix missing cpu_set_t on emscripten (llama/9336) Xuan Son Nguyen 2024-09-07 12:01:34 +0200
  • 9e715e1b96 Improve Vulkan shader build system (llama/9239) Markus Tavenrath 2024-09-06 08:56:17 +0200
  • 6f5514b6e2 ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) compilade 2024-09-05 21:48:47 -0400