Commit Graph

  • a753a82462 vulkan: get the first command buffer submitted sooner (llama/10499) Jeff Bolz 2024-11-29 00:18:02 -0600
  • 276b08d8f0 ggml : remove redundant copyright notice + update authors Georgi Gerganov 2024-11-28 20:46:40 +0200
  • 4ca1e72fe0 ggml : fix row condition for i8mm kernels (llama/10561) Georgi Gerganov 2024-11-28 14:56:37 +0200
  • 16a66f103f cmake : fix ARM feature detection (llama/10543) Georgi Gerganov 2024-11-28 14:56:23 +0200
  • 330273901f ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) Shupei Fan 2024-11-28 20:52:03 +0800
  • 42099a9342 kompute : improve backend to pass test_backend_ops (llama/10542) Sergio López 2024-11-28 12:51:38 +0100
  • 90dd5fca9c CANN: Fix SOC_TYPE compile bug (llama/10519) leo-pony 2024-11-28 15:25:24 +0800
  • 2490f2a7f8 CANN: ROPE operator optimization (llama/10540) Chenguang Li 2024-11-28 14:24:46 +0800
  • 230e985633 Add some minimal optimizations for CDNA (llama/10498) uvos 2024-11-27 17:10:08 +0100
  • ae24083f23 metal : fix group_norm support condition (llama/0) Georgi Gerganov 2024-11-27 11:22:14 +0200
  • 6463e36369 vulkan: define all quant data structures in types.comp (llama/10440) Jeff Bolz 2024-11-27 01:32:54 -0600
  • b3301f7d82 vulkan: Handle GPUs with less shared memory (llama/10468) Jeff Bolz 2024-11-27 01:30:27 -0600
  • ab5d4d93ec vulkan: further optimize q5_k mul_mat_vec (llama/10479) Jeff Bolz 2024-11-27 01:21:59 -0600
  • 2d6e9dd723 vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506) Jeff Bolz 2024-11-27 01:08:54 -0600
  • 2f16e51553 vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459) Jeff Bolz 2024-11-27 01:00:50 -0600
  • 0f0994902f mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516) R0CKSTAR 2024-11-27 00:00:41 +0800
  • 5e1fcc1780 vulkan: fix group_norm (llama/10496) Jeff Bolz 2024-11-26 09:45:05 -0600
  • 48f421de23 cmake : enable warnings in llama (llama/10474) Georgi Gerganov 2024-11-26 14:18:08 +0200
  • e7afb2b991 ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487) Charles Xu 2024-11-26 12:37:05 +0100
  • 9a5ef7b169 CANN: Improve the Inferencing Performance for Ascend NPU Device (llama/10454) Shanshan Shen 2024-11-26 18:08:37 +0800
  • 453cc0fcf1 CANN: RoPE and CANCAT operator optimization (llama/10488) Chenguang Li 2024-11-26 17:31:05 +0800
  • 78dfec6bc5 vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484) Junil Kim 2024-11-26 10:47:20 +0900
  • f6d518fc4c metal : enable mat-vec kernels for bs <= 4 (llama/10491) Georgi Gerganov 2024-11-25 21:49:31 +0200
  • ac33379a35 llama : accept a list of devices to use to offload a model (llama/10497) Diego Devesa 2024-11-25 19:30:06 +0100
  • 77e3e4a090 ggml : add support for dynamic loading of backends (llama/10469) Diego Devesa 2024-11-25 15:13:39 +0100
  • b840bb09be metal : minor code formatting Georgi Gerganov 2024-11-25 15:08:04 +0200
  • 8b1c1c30a7 ggml : do not use ARM features not included in the build (llama/10457) Diego Devesa 2024-11-23 14:41:12 +0100
  • 4b81335f75 CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216) leo-pony 2024-11-22 14:07:20 +0800
  • 2a4b5c9d7e cuda : optimize argmax (llama/10441) Diego Devesa 2024-11-21 18:18:50 +0100
  • 04662748aa vulkan: predicate max operation in soft_max shaders/soft_max (llama/10437) Jeff Bolz 2024-11-20 13:47:36 -0600
  • a117279e13 vulkan: copy iq4_nl LUT into shared memory (llama/10409) Jeff Bolz 2024-11-20 01:40:18 -0600
  • bbb292ed38 vulkan: further optimize mul_mat_vec using larger loads (llama/10387) Jeff Bolz 2024-11-20 01:11:00 -0600
  • 95e8901e71 add cmake rvv support (llama/10411) haopeng 2024-11-20 04:10:31 +0800
  • 4af9626702 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +0800
  • c52d1035de feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +0100
  • 5773a14980 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +0100
  • 6939147c47 Do not include arm_neon.h when compiling CUDA code (ggml/1028) Frankie Robertson 2024-11-26 15:50:26 +0200
  • 98f9916c9f ggml-opt: fix data corruption (ggml/1022) Johannes Gäßler 2024-11-20 14:56:04 +0100
  • 021eef1000
    ruby : Add low-level methods to transcribe (#2585) KITAITI Makoto 2024-11-28 17:33:07 +0900
  • a9d06ce151
    models : add q8_0 models to download-ggml-model.sh (#2589) Michael Rienstra 2024-11-28 00:31:54 -0800
  • 8c6a9b8bb6
    ruby : Follow source tree change (#2580) KITAITI Makoto 2024-11-22 00:04:29 +0900
  • 37c88027e1 whisper : use backend registry (#0) Georgi Gerganov 2024-11-20 15:32:34 +0200
  • 9db070a3c5 ggml/sched : do not skip views in pre-assignments slaren 2024-11-20 13:25:08 +0100
  • 7fd8d9c220 whisper : adapt to new ggml (wip) Georgi Gerganov 2024-11-19 19:09:07 +0200
  • 06e059b8f8 talk-llama : sync llama.cpp Georgi Gerganov 2024-11-19 19:08:57 +0200
  • c9f49d5f9d sync : ggml Georgi Gerganov 2024-11-19 19:04:21 +0200
  • f4c1d7df39 ggml : sync resolve (skip) (#0) Georgi Gerganov 2024-11-19 19:03:47 +0200
  • 339b8e559c Add required ggml-base and backend libs to cmake pkg (llama/10407) bandoti 2024-11-19 12:10:30 -0400
  • 5f6d6919b4 cuda : fix CUDA_FLAGS not being applied (llama/10403) Diego Devesa 2024-11-19 14:29:38 +0100
  • 8ee767732f sycl : Add option to set the SYCL architecture for all targets (llama/10266) Romain Biessy 2024-11-19 09:02:23 +0100
  • 45f1f9144f vulkan: Optimize soft_max (llama/10301) Jeff Bolz 2024-11-19 01:25:17 -0600
  • 53589c8f12 sycl: Revert MUL_MAT_OP support changes (llama/10385) Alberto Cabrera Pérez 2024-11-19 00:50:04 +0000
  • 7ac2f17fac cuda : only use native when supported by cmake (llama/10389) Diego Devesa 2024-11-18 18:43:40 +0100
  • 48862c7b27 vulkan: remove use of null initializer (llama/10372) Jeff Bolz 2024-11-18 08:28:42 -0600
  • 44f7d9f4e3 metal : fox offset integer overflows in im2col (ggml/1015) Plamen Minev 2024-11-18 15:02:27 +0200
  • fd12302587 Vulkan: Fix device info output format specifiers (llama/10366) 0cc4m 2024-11-18 11:02:43 +0100
  • f80bef4630 metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) PAB 2024-11-18 10:02:49 +0100
  • 161b443514 CUDA: fix MMV kernel being used for FP16 src1 (llama/10357) Johannes Gäßler 2024-11-17 23:20:42 +0100
  • ef7fbe1c66 CMake: fix typo in comment [no ci] (llama/10360) Johannes Gäßler 2024-11-17 12:59:38 +0100
  • 0879d3599e llama : only use default buffer types for the KV cache (llama/10358) Diego Devesa 2024-11-17 12:25:45 +0100
  • 2a444dc5bd metal : refactor kernel args into structs (llama/10238) Georgi Gerganov 2024-11-17 11:23:01 +0200
  • 45cf1634dc ggml : fix undefined reference to 'getcpu' (llama/10354) FirstTimeEZ 2024-11-17 21:39:22 +1300
  • dcb2922d1d CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318) Johannes Gäßler 2024-11-17 09:09:55 +0100
  • 3c5c751174 CMake: default to -arch=native for CUDA build (llama/10320) Johannes Gäßler 2024-11-17 09:06:34 +0100
  • 24ad19d0e9 ggml : fix possible buffer use after free in sched reserve (llama/9930) Diego Devesa 2024-11-17 07:31:17 +0100
  • bd574b05af ggml : inttypes.h -> cinttypes (llama/0) Georgi Gerganov 2024-11-16 23:40:39 +0200
  • 7e0eafcb1e ggml : adapt AMX to tensor->grad removal (llama/0) Georgi Gerganov 2024-11-16 21:38:01 +0200
  • 75670ae673 ggml : fix compile warnings (llama/0) Georgi Gerganov 2024-11-16 21:32:41 +0200
  • d4fcdf602b llamafile : fix include path (llama/0) Georgi Gerganov 2024-11-16 17:58:56 +0200
  • 1bebb1a116 vulkan: Optimize some mat-vec mul quant shaders (llama/10296) Jeff Bolz 2024-11-16 00:26:57 -0600
  • ee437cde59 ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324) Dan Johansson 2024-11-16 01:53:37 +0100
  • c1506d38cf Make updates to fix issues with clang-cl builds while using AVX512 flags (llama/10314) Srihari-mcw 2024-11-16 02:57:00 +0530
  • c9541741e6 ggml: new optimization interface (ggml/988) Johannes Gäßler 2024-11-16 13:49:35 +0100
  • 6a55015dc4 ggml : remove duplicated sources from the last sync (ggml/1017) Georgi Gerganov 2024-11-15 23:52:31 +0200
  • 7e86030d4d ggml : fix some build issues slaren 2024-11-15 20:20:54 +0100
  • 401fbea326 sync : leftovers (ggml/0) Georgi Gerganov 2024-11-15 21:43:41 +0200
  • 44d1cbdfe9 cmake : restore CMakeLists.txt (llama/10256) Georgi Gerganov 2024-11-15 21:35:51 +0200
  • 3216efef2e AVX BF16 and single scale quant optimizations (llama/10212) Eve 2024-11-15 11:47:58 +0000
  • 2c0484ebf7 sycl: Use syclcompat::dp4a (llama/10267) Romain Biessy 2024-11-15 04:09:12 +0100
  • 3298916e5e backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) Charles Xu 2024-11-15 01:28:50 +0100
  • 746bf2596f ggml : build backends as libraries (llama/10256) Diego Devesa 2024-11-14 18:04:35 +0100
  • 5f7e094ccb scripts : update sync Georgi Gerganov 2024-11-19 18:59:18 +0200
  • 6266a9f9e5
    release : v1.7.2 Georgi Gerganov 2024-11-19 18:54:22 +0200
  • d24f981fb2
    sycl: fix example build (#2570) Stefan Sydow 2024-11-18 13:57:23 +0100
  • 01d3bd7d5c
    ci : use local ggml in Android build (#2567) Georgi Gerganov 2024-11-16 20:45:41 +0200
  • bb12cd9b77
    ggml : tmp workaround for whisper.cpp (skip) (#2565) Georgi Gerganov 2024-11-16 20:19:02 +0200
  • f02b40bcb4
    update : readme Georgi Gerganov 2024-11-15 16:00:10 +0200
  • 83ac2842bd
    scripts : fix sync path Georgi Gerganov 2024-11-15 15:24:09 +0200
  • c4e95fb74d
    whisper.swiftui : switch Mac dest to Mac (Designed for iPad) (#2562) Jhen-Jie Hong 2024-11-15 21:21:53 +0800
  • e23721f3fb cmake : fix ppc64 check (#0) Georgi Gerganov 2024-11-15 09:04:34 +0200
  • c0a9f8ef85 whisper : include ggml-cpu.h (#0) Georgi Gerganov 2024-11-15 11:01:47 +0200
  • 6477b84eb6 build : fixes Georgi Gerganov 2024-11-15 09:07:53 +0200
  • 24d706774d talk-llama : sync llama.cpp Georgi Gerganov 2024-11-15 08:41:06 +0200
  • 5089ab2d6a whisper : fix build (#0) Georgi Gerganov 2024-11-15 08:40:47 +0200
  • bdbb906817 sync : ggml Georgi Gerganov 2024-11-15 08:40:34 +0200
  • fa2ebd336e sycl : Fixes to broken builds and test-backend-ops (llama/10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +0000
  • 21b01a21b6 vulkan: Optimize contiguous copies (llama/10254) Jeff Bolz 2024-11-13 00:58:57 -0600
  • b54ce5edc5 vulkan: Throttle the number of shader compiles during the build step. (llama/10222) Jeff Bolz 2024-11-11 11:13:51 -0600
  • 26a31b78e9 metal : more precise Q*K in FA vec kernel (llama/10247) Georgi Gerganov 2024-11-11 08:39:13 +0200
  • 14d13c5f9f vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (llama/10226) Jeff Bolz 2024-11-10 05:37:56 -0600