whisper.cpp

Author	SHA1	Message	Date
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
Georgi Gerganov	2948c740a2	sync : ggml (#2001 ) * sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build	2024-03-27 18:55:10 +02:00
Georgi Gerganov	1558ec5a16	whisper : improve handling of prompts (#1981 ) * whisper : improve handling of prompts * whisper : add whisper_token_count helper	2024-03-25 14:48:19 +02:00
Sanchit Gandhi	fff24a0148	whisper : improve support for distil-large-v3 (#1982 )	2024-03-21 18:53:30 +02:00
denersc	741abb162c	whisper : token-level timestamps with DTW (#1485 ) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 18:25:26 +02:00
Georgi Gerganov	725350d4ea	whisper : set outputs from conv graph (#1959 )	2024-03-16 17:30:55 +02:00
Josh Bleecher Snyder	a56f435fd4	whisper : document whisper_batch.n_seq_id (#1942 ) To prevent other people from attempting to remove it, as I did.	2024-03-10 16:55:22 +02:00
Josh Bleecher Snyder	ec166499d8	whisper : improve beam search candidate diversity (#1947 ) As of #1486, whisper.cpp uses a unified KV cache with KQ masking. As a result, depending on their location in the batch, identical sequences in a batch can have slightly different outputs due to floating point rounding errors during reduction. See the discussion in #1941 for more details. The beam search code used "has identical sum of log probabilities" as a shorthand for "is an identical token sequence". However, per above, identical tokens do not necessarily result in identical probabilities. Instead, explicitly compare on sequences. This is linear in cost when they are identical, but the lengths are always small and the comparisons are cheap. This increases diversity during beam search. This improves output quality for some short samples I've been working with, at no detectable performance cost. I haven't checked against larger corpuses. Fixes #1941	2024-03-10 16:54:43 +02:00
Josh Bleecher Snyder	2852e1af55	whisper : make beam candidate sort more stable (#1943 ) All else being otherwise equal, this encourages the beam candidate selection to re-use the same decoder, which slightly reduces the cache size. I wouldn't expect it to make much of a performance difference, but it helps when debug printing the cache and beam. Added as part of understanding #1941.	2024-03-09 18:50:03 +02:00
Georgi Gerganov	ed76818700	whisper : fix compute helper return (ggml/750)	2024-03-08 11:38:32 +02:00
zhouwg	897412b5b6	whisper : fix typo (#1925 )	2024-03-05 17:06:31 +02:00
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	65faae0b6a	build : update CBLAS flags + fix unused var warning (#0 )	2024-02-19 14:44:46 +02:00
Georgi Gerganov	e3c5e2cba8	whisper : fix external encoder (#1860 )	2024-02-12 19:53:51 +02:00
slaren	1d3270cc8f	ggml-alloc : v3 (ggml/727) * ggml-alloc v3 ggml-ci * fix ci ggml-ci * whisper : check for backend buffer allocation failures * whisper : avoid leaks when initialization fails * cleanup ggml-ci * style fixes ggml-ci	2024-02-12 09:31:11 +02:00
Michael Podvitskiy	f75e1197f1	ggml : add abort_callback for cpu backend (ggml/725) * a way to use abort_callback with the cpu backend * whisper update	2024-02-10 09:55:46 +02:00
Didzis Gosko	0f80e5a80a	whisper : expose CUDA device setting in public API (#1840 ) * Makefile : allow to override CUDA_ARCH_FLAG * whisper : allow to select GPU (CUDA) device from public API	2024-02-09 17:27:47 +02:00
Georgi Gerganov	d839dd0242	examples : adapt to metal API	2024-01-14 00:11:45 +02:00
Georgi Gerganov	519f8e8684	whisper : load the model into multiple buffers of max size 1GB (#1763 )	2024-01-13 17:47:40 +02:00
Georgi Gerganov	6b01e3fedd	whisper : fix segment length with params.no_timestamps == true	2024-01-12 13:37:38 +02:00
Georgi Gerganov	29f78392c1	main : add cli option to disable system prints (#1740 )	2024-01-08 16:41:28 +02:00
Georgi Gerganov	668ffc9b23	whispser : reset the "batched" timings (#1721 )	2024-01-04 13:38:39 +02:00
Finn Voorhees	a3d0aa73d1	ggml : add error handling to graph_compute (#1714 )	2024-01-03 15:39:43 +02:00
bobqianic	37a709f655	whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG (#1681 )	2023-12-23 12:02:58 +00:00
Georgi Gerganov	3a5302108d	sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677 ) * sync : ggml * sync : llama.cpp * talk-llama : fix obsolete param * ggml-alloc : fix ggml_tallocr_is_own * talk.wasm : update to new ggml * ggml : fix type punning in ggml_scale * ggml : cuda jetson + arm quants warnings	2023-12-22 17:53:39 +02:00
Georgi Gerganov	29511d33c7	whisper : more debug messages + fix fallback logic	2023-12-08 13:43:12 +02:00
Georgi Gerganov	afce6fa113	sync : ggml (new ops, new backend, etc) (#1602 ) * sync : ggml (new ops, new backend, etc) * whisper : remove obsolete broadcasting code * ggml : remove backend self-registers + fix ggml_concat + n_task logic * metal : fix assert * metal : print resource path * whisper : fix bug if metal init fails	2023-12-07 22:27:19 +02:00
Georgi Gerganov	0ba365f958	metal : add backend function to check device family support (#1547 )	2023-11-24 12:37:08 +02:00
Georgi Gerganov	ffdb5c4735	whisper : fix typo	2023-11-24 09:45:10 +02:00
bradmit	34f70b3a56	whisper : add whisper_lang_str_full (#1546 ) * Update whisper.h add whisper_lang_fullstr to retrieve the full language name * Update whisper.cpp add whisper_lang_fullstr to return the full language name * fullstr -> str_full --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-24 09:33:13 +02:00
Georgi Gerganov	146169ec38	bench : pass memcpy threads from cli	2023-11-21 22:27:22 +02:00
Georgi Gerganov	9befab5ab9	bench : multi-thread memcpy (#1534 )	2023-11-21 22:07:30 +02:00
Georgi Gerganov	8159a9ab99	whisper : reuse whisper_decode_with_state (#1521 )	2023-11-20 13:16:11 +02:00
sandrohanea	46cc26d1b9	whisper : fix with_state methods to use the correct state (#1519 ) Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>	2023-11-19 11:25:30 +02:00
Georgi Gerganov	f784f9fa12	whisper : fix overriding the audio context	2023-11-19 10:32:32 +02:00
Georgi Gerganov	848e54f3ad	bench : fix memcpy bench size	2023-11-16 10:59:32 +02:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
Georgi Gerganov	b6c5f49b78	whisper : add batched decoding (#1486 ) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes	2023-11-15 16:12:52 +02:00
Evan Jones	3e5c7feeff	whisper : add grammar-based sampling (#1229 ) * whisper : add grammar-based sampling * build : fix after master merge * command : fix exception when recognizing the command * whisper : fine-tuning grammar functionality * command : grammar-related improvements - option to read grammar from file - add sample grammars for colors and chess moves - fine-tune the performance further * grammars : add assistant + update comments * command : enable beam-search, add "no_timestamps", add "context", add p * whisper : remove comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-13 10:51:34 +02:00
Georgi Gerganov	3172006a24	ggml : fix some compile warnings	2023-11-12 16:36:20 +02:00
Georgi Gerganov	b0502836b8	whisper : add full CUDA and Metal offloading (#1472 ) * whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state	2023-11-12 15:31:08 +02:00
Ben Nortier	ec7a6f04f9	whisper : return with error from whisper_encode_internal and whisper_decode_internal when abort callback is true (#1456 ) Co-authored-by: Ben Nortier <ben@bjnortier.com>	2023-11-10 13:51:16 +02:00
Xiao-Yong Jin	0de8582f65	coreml : use the correct `n_mel` value (#1458 )	2023-11-08 20:01:41 +00:00
Ben Nortier	baeb733691	whisper : reset mel time when resetting timings (#1452 ) Co-authored-by: Ben Nortier <ben@bjnortier.com>	2023-11-08 15:52:23 +02:00
Georgi Gerganov	2cdfc4e025	whisper : add support for large v3 (#1444 ) * whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme	2023-11-07 15:30:18 +02:00
Ben Nortier	11b503055e	whisper : reset ctx->t_start_us when calling whisper_reset_timings() (#1434 ) Co-authored-by: Ben Nortier <ben@bjnortier.com>	2023-11-07 11:04:32 +02:00
Georgi Gerganov	0c91aef2d8	whisper : add missing about callback initializers	2023-11-07 10:49:51 +02:00
Jhen-Jie Hong	0463028bc2	whisper : add context param to disable gpu (#1293 ) * whisper : check state->ctx_metal not null * whisper : add whisper_context_params { use_gpu } * whisper : new API with params & deprecate old API * examples : use no-gpu param && whisper_init_from_file_with_params * whisper.objc : enable metal & disable on simulator * whisper.swiftui, metal : enable metal & support load default.metallib * whisper.android : use new API * bindings : use new API * addon.node : fix build & test * bindings : updata java binding * bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java * metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load * metal : move bundle var into block * metal : use SWIFT_PACKAGE instead of GGML_SWIFT * style : minor updates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-06 11:04:24 +02:00
Georgi Gerganov	39cfad0dee	whisper : add support for new distilled Whisper models (#1424 ) * whisper : add support for new distilled Whisper models * whisper : print log when using distilled models	2023-11-05 19:43:45 +02:00
Georgi Gerganov	f96e1c5b78	sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422 ) * sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) * metal : allow env metal variable to override resource path (#1415) * Allow env variable to override resource path * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * sync : restore common / main from `master` * sync : restore whisper from `master` * talk-llama : update to latest llama.cpp * ruby : fix build * ggml : fix 32-bit ARM build * ggml : fix MIN / MAX macro collisions + update ios bindings * ggml : fix ifdefs and MIN / MAX again * exampels : fix Obj-C and Swift examples * ggml : fix 32-bit ARM compatibility * ggml : one more attempt to fix 32-bit ARM compat * whisper : fix support for larger graphs --------- Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>	2023-11-03 21:35:05 +02:00

1 2 3 4 5

225 Commits