whisper.cpp

History

slaren 1dce94cf26 ggml : mul_mat_id use the same tensor for all the experts (llama/6387) * ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2024-04-07 16:15:57 +03:00
..
acc.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
acc.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
alibi.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
alibi.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
arange.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
arange.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
argsort.cu	ggml : mul_mat_id use the same tensor for all the experts (llama/6387)	2024-04-07 16:15:57 +03:00
argsort.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
binbcast.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
binbcast.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
clamp.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
clamp.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
common.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
concat.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
concat.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
convert.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
convert.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
cpy.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
cpy.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
dequantize.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
diagmask.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
diagmask.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
dmmv.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
dmmv.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
getrows.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
getrows.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
im2col.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
im2col.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
mmq.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
mmq.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
mmvq.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
mmvq.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
norm.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
norm.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
pad.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
pad.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
pool2d.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
pool2d.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
quantize.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
quantize.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
rope.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
rope.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
scale.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
scale.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
softmax.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
softmax.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
sumrows.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
sumrows.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
tsembd.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
tsembd.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
unary.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
unary.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
upscale.cu	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
upscale.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00
vecdotq.cuh	sync : ggml (#2001 )	2024-03-27 18:55:10 +02:00