whisper.cpp

Author	SHA1	Message	Date
Karthick	2f2841bfce	whisper : add single-timestamp logic (#2629 ) * Fix hallucinations during silence When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment. This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py * Accept review comments related to formatting. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-17 19:07:08 +02:00
crummyh	09a1b61218	readme : fix typo (#2637 )	2024-12-17 19:05:35 +02:00
Georgi Gerganov	94e7da1ff2	cmake : fix "amd64" processor string (#2638 )	2024-12-17 18:34:32 +02:00
gn64	c4aed6831e	vulkan : fix soft_max.comp division by zero (#2633 ) This change prevents a division by zero error when p.KY is 0.	2024-12-16 12:34:38 +02:00
Georgi Gerganov	199579652e	common : add cstdio header	2024-12-16 08:57:04 +02:00
Georgi Gerganov	d17e7139d8	stream : update build instructions	2024-12-15 21:55:36 +02:00
Thamster	6a52eaea74	android : fix build and ci (#2624 ) * Adding missing CMakeLists.txt include for ggm-cpu needed by whisper.android * attempt to re-enable CI for JNI android --------- Co-authored-by: Your Name <you@example.com>	2024-12-14 17:25:53 +02:00
Michael Rienstra	6aa1d7b892	models : fix typo in download-ggml-model.sh (#2623 ) Introduced in #2589	2024-12-12 18:02:00 +02:00
KITAITI Makoto	262e865a70	ruby : Sync whisper.cpp and model download feature (#2617 ) * Use C++17 * Add test for Pathname of model * Make Whisper::Context#initialize accept Pathname * Add shorthand for pre-converted models * Update documents * Add headings to API section in README [skip ci] * Remove unused function * Don't care about no longer included file * Cosmetic fix * Use conditional get when get model files	2024-12-09 13:17:50 +02:00
Georgi Gerganov	ed733e85a1	scripts : update to new build system	2024-12-09 11:30:16 +02:00
Georgi Gerganov	5980b1ae77	devops : add cmake	2024-12-08 23:09:26 +02:00
Georgi Gerganov	0415a66044	devops : update make commands	2024-12-08 23:07:29 +02:00
Georgi Gerganov	7d134e3737	ggml : remove old files (skip) (#0 )	2024-12-08 23:04:26 +02:00
Georgi Gerganov	9df53b357e	ggml : sync remnants (skip) (#0 )	2024-12-08 22:48:25 +02:00
Georgi Gerganov	b2115b4d9b	scripts : remove amx from sync	2024-12-08 22:48:14 +02:00
Georgi Gerganov	0164427dd5	ci : disable freeBSD builds [no ci]	2024-12-08 20:14:35 +02:00
Georgi Gerganov	627b11c78a	readme : update build instructions	2024-12-08 20:14:35 +02:00
Georgi Gerganov	472464453d	ci : disable CUDA and Android builds	2024-12-08 20:14:35 +02:00
Georgi Gerganov	11dddfbc9e	ci : disable Obj-C build + fixes	2024-12-08 20:14:35 +02:00
Georgi Gerganov	384e214cc7	make : shim cmake	2024-12-08 20:14:35 +02:00
Georgi Gerganov	f2c680f893	talk-llama : sync llama.cpp	2024-12-08 20:14:35 +02:00
Georgi Gerganov	fbe66da0e5	sync : ggml	2024-12-08 20:14:35 +02:00
Diego Devesa	a815940e0e	ggml : add predefined list of CPU backend variants to build (llama/10626) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles	2024-12-08 20:14:35 +02:00
Diego Devesa	904e307bce	ggml-cpu : fix HWCAP2_I8MM value (llama/10646)	2024-12-08 20:14:35 +02:00
Jeff Bolz	491ec076b4	vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (llama/10642)	2024-12-08 20:14:35 +02:00
Nicolò Scipione	966433fdf2	SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (llama/10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2024-12-08 20:14:35 +02:00
Frankie Robertson	6f1ba9d82d	Avoid using __fp16 on ARM with old nvcc (llama/10616)	2024-12-08 20:14:35 +02:00
Jeff Bolz	015ecd0001	vulkan: optimize and reenable split_k (llama/10637) Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.	2024-12-08 20:14:35 +02:00
PAB	b7c64a4352	ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037) * implemented cpu kernel * add i32 test cases in test-backend-ops * typedef `ggml_metal_kargs_set` * implemented `kernel_set` * memcpy	2024-12-08 20:14:35 +02:00
PAB	7895d39508	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) * ggml_pad_reflect_1d defined in header * implemented on CPU * called the forward pass * impl Metal kernel * added Metal kernel * added OP_PAD_REFLECT_1D in test-backend-ops.cpp * add test-pad-reflect-1d test case * test case support multiple backend	2024-12-08 20:14:35 +02:00
Georgi Gerganov	22616f00f9	files : remove make artifacts	2024-12-08 20:14:35 +02:00
Georgi Gerganov	02c6fcbc2c	common : fix compile warning ggml-ci	2024-12-08 20:14:35 +02:00
Diego Devesa	3daeacad24	ggml : move AMX to the CPU backend (llama/10570) ggml : automatic selection of best CPU backend (llama/10606)	2024-12-08 20:14:35 +02:00
Georgi Gerganov	4d73962da4	metal : small-batch mat-mul kernels (llama/10581) * metal : small-batch mat-mul kernels ggml-ci * metal : add rest of types ggml-ci * metal : final adjustments ggml-ci * metal : add comments ggml-ci	2024-12-08 20:14:35 +02:00
Akarshan Biswas	068812650e	SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579) * Switched to GGML_LOG * Fix missing semicolon	2024-12-08 20:14:35 +02:00
Adrien Gallouët	4b7e059e15	ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2024-12-08 20:14:35 +02:00
Eve	30e35d7271	vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536) * subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45) * force 16 sequential threads per block * make 16 subgroup size a constant	2024-12-08 20:14:35 +02:00
Georgi Gerganov	3623bd58f2	ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562) ggml-ci	2024-12-08 20:14:35 +02:00
Shupei Fan	cb847c20a7	ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)	2024-12-08 20:14:35 +02:00
Alberto Cabrera Pérez	964b154a2a	sycl : offload of get_rows set to 0 (llama/10432)	2024-12-08 20:14:35 +02:00
Alberto Cabrera Pérez	d7c2a04bce	sycl : Reroute permuted mul_mats through oneMKL (llama/10408) This PR fixes the failing MUL_MAT tests for the sycl backend.	2024-12-08 20:14:35 +02:00
Chenguang Li	2bb4ca9cba	CANN: RoPE operator optimization (llama/10563) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-12-08 20:14:35 +02:00
Jeff Bolz	a753a82462	vulkan: get the first command buffer submitted sooner (llama/10499) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.	2024-12-08 20:14:35 +02:00
Georgi Gerganov	276b08d8f0	ggml : remove redundant copyright notice + update authors	2024-12-08 20:14:35 +02:00
Georgi Gerganov	4ca1e72fe0	ggml : fix row condition for i8mm kernels (llama/10561) ggml-ci	2024-12-08 20:14:35 +02:00
Georgi Gerganov	16a66f103f	cmake : fix ARM feature detection (llama/10543) ggml-ci	2024-12-08 20:14:35 +02:00
Shupei Fan	330273901f	ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-12-08 20:14:35 +02:00
Sergio López	42099a9342	kompute : improve backend to pass test_backend_ops (llama/10542) * kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_[q4_0\|q4_1\|q8_0] permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> --------- Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-12-08 20:14:35 +02:00
leo-pony	90dd5fca9c	CANN: Fix SOC_TYPE compile bug (llama/10519) * CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. * fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version	2024-12-08 20:14:35 +02:00
Chenguang Li	2490f2a7f8	CANN: ROPE operator optimization (llama/10540) * [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-12-08 20:14:35 +02:00

1 2 3 4 5 ...

1947 Commits