Diego Devesa
3daeacad24
ggml : move AMX to the CPU backend (llama/10570)
...
ggml : automatic selection of best CPU backend (llama/10606)
2024-12-08 20:14:35 +02:00
Adrien Gallouët
4b7e059e15
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567)
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-08 20:14:35 +02:00
Georgi Gerganov
3623bd58f2
ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562)
...
ggml-ci
2024-12-08 20:14:35 +02:00
Shupei Fan
cb847c20a7
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)
2024-12-08 20:14:35 +02:00
Georgi Gerganov
276b08d8f0
ggml : remove redundant copyright notice + update authors
2024-12-08 20:14:35 +02:00
Georgi Gerganov
4ca1e72fe0
ggml : fix row condition for i8mm kernels (llama/10561)
...
ggml-ci
2024-12-08 20:14:35 +02:00
Georgi Gerganov
16a66f103f
cmake : fix ARM feature detection (llama/10543)
...
ggml-ci
2024-12-08 20:14:35 +02:00
Shupei Fan
330273901f
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-12-08 20:14:35 +02:00
Charles Xu
e7afb2b991
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487)
...
* ggml-cpu: cmake add arm64 cpu feature check for macos
* use vmmlaq_s32 for compile option i8mm check
2024-12-08 20:14:35 +02:00
Diego Devesa
77e3e4a090
ggml : add support for dynamic loading of backends (llama/10469)
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-08 20:14:35 +02:00
Diego Devesa
8b1c1c30a7
ggml : do not use ARM features not included in the build (llama/10457)
2024-12-08 20:14:35 +02:00
haopeng
95e8901e71
add cmake rvv support (llama/10411)
2024-12-08 20:14:35 +02:00
FirstTimeEZ
45cf1634dc
ggml : fix undefined reference to 'getcpu' (llama/10354)
...
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-20 21:00:08 +02:00
Georgi Gerganov
d4fcdf602b
llamafile : fix include path (llama/0)
...
ggml-ci
2024-11-20 21:00:08 +02:00
Dan Johansson
ee437cde59
ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324)
2024-11-20 21:00:08 +02:00
Srihari-mcw
c1506d38cf
Make updates to fix issues with clang-cl builds while using AVX512 flags (llama/10314)
2024-11-20 21:00:08 +02:00
Johannes Gäßler
c9541741e6
ggml: new optimization interface (ggml/988)
...
* ggml: new optimization interface
remove test2.c, test3.c
store adamw params in tensor
move grads from tensor to graph
* avoid segfault upon API misuse
* add ggml-opt.h to public headers
* remove dependence of ggml-opt.cpp on ggml-cpu.h
2024-11-20 21:00:08 +02:00
Georgi Gerganov
401fbea326
sync : leftovers (ggml/0)
...
ggml-ci
2024-11-20 21:00:08 +02:00
Eve
3216efef2e
AVX BF16 and single scale quant optimizations (llama/10212)
...
* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge
2024-11-20 21:00:08 +02:00
Charles Xu
3298916e5e
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921)
...
* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-11-20 21:00:08 +02:00
Diego Devesa
746bf2596f
ggml : build backends as libraries (llama/10256)
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-20 21:00:08 +02:00