whisper.cpp/ggml/src
Eve 374e9e0c5e ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)
* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before
2024-09-24 19:45:08 +03:00
..
ggml-cann cann : fix doxy (ggml/0) 2024-09-02 15:24:50 +03:00
ggml-cuda CUDA: fix --split-mode row race condition (llama/9413) 2024-09-24 19:45:08 +03:00
ggml-sycl Fix DMMV dequantization (llama/9279) 2024-09-24 19:45:08 +03:00
kompute-shaders whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
vulkan-shaders Improve Vulkan shader build system (llama/9239) 2024-09-24 19:45:08 +03:00
CMakeLists.txt cmake : correct order of sycl flags (llama/9497) 2024-09-24 19:45:08 +03:00
ggml-aarch64.c ggml : AVX2 support for Q4_0_8_8 (llama/8713) 2024-09-24 19:45:08 +03:00
ggml-aarch64.h ggml : add ggml-aarch64 (ggml/0) 2024-08-08 22:48:46 +03:00
ggml-alloc.c ggml : reduce hash table reset cost (llama/8698) 2024-08-08 22:48:46 +03:00
ggml-backend-impl.h ggml/examples: add backend support for numerical optimization (ggml/949) 2024-09-24 19:45:08 +03:00
ggml-backend.c ggml/examples: add backend support for numerical optimization (ggml/949) 2024-09-24 19:45:08 +03:00
ggml-blas.cpp ggml : reduce hash table reset cost (llama/8698) 2024-08-08 22:48:46 +03:00
ggml-cann.cpp cann: Add host buffer type for Ascend NPU (llama/9406) 2024-09-24 19:45:08 +03:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) 2024-09-24 19:45:08 +03:00
ggml-cuda.cu rpc : fix segfault with nkvo (llama/9389) 2024-09-24 19:45:08 +03:00
ggml-impl.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) 2024-09-24 19:45:08 +03:00
ggml-kompute.cpp ggml/examples: add backend support for numerical optimization (ggml/949) 2024-09-24 19:45:08 +03:00
ggml-metal.m metal : handle zero-sized allocs (llama/9466) 2024-09-24 19:45:08 +03:00
ggml-metal.metal metal : separate scale and mask from QKT in FA kernel (llama/9189) 2024-08-28 13:22:20 +03:00
ggml-quants.c ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422) 2024-09-24 19:45:08 +03:00
ggml-quants.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) 2024-09-24 19:45:08 +03:00
ggml-rpc.cpp rpc : fix segfault with nkvo (llama/9389) 2024-09-24 19:45:08 +03:00
ggml-sycl.cpp sycl : update support conditions (llama/9394) 2024-09-24 19:45:08 +03:00
ggml-vulkan.cpp Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (llama/9118) 2024-09-24 19:45:08 +03:00
ggml.c ggml : ggml_type_name return "NONE" for invalid values (llama/9458) 2024-09-24 19:45:08 +03:00
sgemm.cpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
sgemm.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00