Commit Graph

243 Commits

Author SHA1 Message Date
Georgi Gerganov
02fc147a0b examples : adapt to new ggml_concat (ggml/0) 2024-06-16 18:19:48 +03:00
Borislav Stanimirov
b29b3b2924
whisper : use ggml-cuda in mel calc, set appropriate device (#2236)
* whisper : use ggml-cuda in mel calc, set appropriate device

* whisper : forbid cuda mel calc on devices with compute < 600, workaround for #2230
2024-06-13 13:16:07 +03:00
Georgi Gerganov
420b6abc54
cuda : fix HIPBLAS build (#2234) 2024-06-11 19:14:38 +03:00
Borislav Stanimirov
20c542c713
whisper : auto-grow working areas for mel_calc_cuda (#2227)
* whisper : auto-grow working areas for mel_calc_cuda, fixes #2226

* whisper : only calculate mel spectrogram on GPU if audio is <= 5 min
2024-06-10 21:51:32 +03:00
Georgi Gerganov
c2bdb960cd
whisper : free whisper_mel instances (#2220) 2024-06-10 11:00:15 +03:00
Georgi Gerganov
87acd6d629
whisper : whisper_state/backend fixes (#2217)
* whisper : fixes

* ci : WHISPER_CUBLAS -> WHISPER_CUDA
2024-06-06 18:51:36 +03:00
Borislav Stanimirov
f842d31171
whisper : calculate mel spectrogram directly into a ggml_tensor (#2208)
* whisper : calculate mel spectrogram directly into a ggml_tensor

* whisper : remove unused temp buffer from state

* whisper : fix not initializing wstate.embd_enc
2024-06-06 16:20:46 +03:00
Borislav Stanimirov
ffef323c4c
whisper : add CUDA-specific computation mel spectrograms (#2206)
* whisper : use polymorphic class to calculate mel spectrogram

* whisper : add cuda-specific mel spectrogram calculation

* whisper : conditionally compile cufftGetErrorString to avoid warnings

* build : add new files to makefile

* ruby : add new files to conf script

* build : fix typo in makefile

* whisper : suppress cub warning for deprecated C++ std in whisper-mel-cuda
2024-06-04 09:32:23 +03:00
Borislav Stanimirov
af5833e298
whisper : remove speed_up and phase_vocoder* functions (#2198)
* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195
2024-05-31 11:37:29 +03:00
Borislav Stanimirov
e130b66642
whisper: use global cache for sin/cos vals and Hann window (#2194)
- also rename Hanning to Hann as it's named after Julius von Hann
 as per Wikipedia
2024-05-29 19:09:21 +03:00
Georgi Gerganov
05042a782d
Revert "whisper : remove extra backend instance (huh?)" (#2182)
This reverts commit 4caa64b73ed4c0e71097c865b0f6a9c136b007c6.
2024-05-27 10:20:25 +03:00
Georgi Gerganov
7094ea5e75
whisper : use flash attention (#2152)
* whisper : use flash attention in the encoder

* whisper : add kv_pad

* whisper : remove extra backend instance (huh?)

* whisper : use FA for cross-attention

* whisper : use FA for self-attention

* whisper : simplify encoder FA

* whisper : add flash_attn runtime parameter

* scripts : add bench log

* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
thewh1teagle
d8356a1cc2
whisper : fix model path encoding in windows (#2086)
* fix: model path encoding in windows

* fix: convert model path to wide string only for MSVC compiler
2024-05-14 09:43:41 +03:00
Georgi Gerganov
2b434c449e
whisper : switch back to F32 mask (#0) 2024-05-13 14:43:43 +03:00
Georgi Gerganov
2c81e6fd51 whisper : remove old flash attn code (#0) 2024-05-13 11:02:26 +03:00
goldwaving
22b6598cc9
Remove unnecessary memory reallocation in fft (#2080)
fft_out needs to be twice the frame_size, not the frame_step.  It is resized in fft() anyway, but this change prevents an unnecessary reallocation.

n_fft must match the mel filter size, so it is best not to calculate it from the framesize.

We only need to get the magnitudes for half the spectrum since the other half is a mirror and not used in the mel filter loop later.
2024-04-28 18:36:12 +01:00
Georgi Gerganov
7f85e1d7fd
whisper : more prominent log message for sub-1s audio (#2065) 2024-04-24 14:46:06 +03:00
Brad Murray
5275074d37
whisper : fix DTW memory access (#2012)
* Fix DTW memory access

* Memory fix - Apply changes from denersc
2024-04-09 18:38:19 +03:00
ulatekh
c8eeb93a6a
whisper : suppress tokens with a regex (#1997)
* Allow a regular expression to describe tokens to suppress.

Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by https://github.com/openai/whisper/discussions/1041

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Blind change to fix Java test.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 18:27:28 +03:00
Georgi Gerganov
2948c740a2
sync : ggml (#2001)
* sync : update scripts

* sync : ggml

* talk-llama : sync llama.cpp

* make : WHISPER_CUBLAS -> WHISPER_CUDA

* ci : try to fix sycl build

* talk-llama : fix make build
2024-03-27 18:55:10 +02:00
Georgi Gerganov
1558ec5a16
whisper : improve handling of prompts (#1981)
* whisper : improve handling of prompts

* whisper : add whisper_token_count helper
2024-03-25 14:48:19 +02:00
Sanchit Gandhi
fff24a0148
whisper : improve support for distil-large-v3 (#1982) 2024-03-21 18:53:30 +02:00
denersc
741abb162c
whisper : token-level timestamps with DTW (#1485)
* whisper.cpp: impl dtw algo

* WIP: producing and placing DTW timestamps on tokens

* Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false.

* Fix mistake causing incorrect alignment of dtw timestamps

* implement N_TOP_MOST and CUSTOM alignment heads setting

* whisper: fix typo on alignment heads enum

* Fix issues related to changes in whisper.cpp

* Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function

* decoder: save cross QKs only if requested

* Calling median filter with ggml_map_custom1

* Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads

* Copying cross QKs from decoder backend correctly

* dtw: cleanup

* Fix incorrect n_frames passed to dtw when near end of audio

* Fix aheads_masks_init for backend != CPU

* whisper : minor style

* main : add dtw (wip)

* whisper: fix invalid memory access in aheads_masks_init

* main : add dtw (cont)

* whisper : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 18:25:26 +02:00
Georgi Gerganov
725350d4ea
whisper : set outputs from conv graph (#1959) 2024-03-16 17:30:55 +02:00
Josh Bleecher Snyder
a56f435fd4
whisper : document whisper_batch.n_seq_id (#1942)
To prevent other people from attempting to remove it, as I did.
2024-03-10 16:55:22 +02:00
Josh Bleecher Snyder
ec166499d8
whisper : improve beam search candidate diversity (#1947)
As of #1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in #1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes #1941
2024-03-10 16:54:43 +02:00
Josh Bleecher Snyder
2852e1af55
whisper : make beam candidate sort more stable (#1943)
All else being otherwise equal, this encourages the beam candidate
selection to re-use the same decoder, which slightly
reduces the cache size.

I wouldn't expect it to make much of a performance difference,
but it helps when debug printing the cache and beam.

Added as part of understanding #1941.
2024-03-09 18:50:03 +02:00
Georgi Gerganov
ed76818700
whisper : fix compute helper return (ggml/750) 2024-03-08 11:38:32 +02:00
zhouwg
897412b5b6
whisper : fix typo (#1925) 2024-03-05 17:06:31 +02:00
Abhilash Majumder
a0ddd8392c
whisper : add SYCL support (#1863)
* add changes from llama upstream

* add sycl abstraction

* add sycl build

* update cmake

* add sycl build config

* fix bug

* fix bug

* refactor build

* fix bug

* update build

* call build

* use sycl header

* add examples

* add target

* fix typecast in quant.c

* readd fp16 and readme

* fix quant typecast

* add sample

* add readme

* remove cxx file check
2024-02-23 09:22:24 +02:00
Georgi Gerganov
65faae0b6a
build : update CBLAS flags + fix unused var warning (#0) 2024-02-19 14:44:46 +02:00
Georgi Gerganov
e3c5e2cba8
whisper : fix external encoder (#1860) 2024-02-12 19:53:51 +02:00
slaren
1d3270cc8f
ggml-alloc : v3 (ggml/727)
* ggml-alloc v3

ggml-ci

* fix ci

ggml-ci

* whisper : check for backend buffer allocation failures

* whisper : avoid leaks when initialization fails

* cleanup

ggml-ci

* style fixes

ggml-ci
2024-02-12 09:31:11 +02:00
Michael Podvitskiy
f75e1197f1
ggml : add abort_callback for cpu backend (ggml/725)
* a way to use abort_callback with the cpu backend

* whisper update
2024-02-10 09:55:46 +02:00
Didzis Gosko
0f80e5a80a
whisper : expose CUDA device setting in public API (#1840)
* Makefile : allow to override CUDA_ARCH_FLAG

* whisper : allow to select GPU (CUDA) device from public API
2024-02-09 17:27:47 +02:00
Georgi Gerganov
d839dd0242
examples : adapt to metal API 2024-01-14 00:11:45 +02:00
Georgi Gerganov
519f8e8684
whisper : load the model into multiple buffers of max size 1GB (#1763) 2024-01-13 17:47:40 +02:00
Georgi Gerganov
6b01e3fedd
whisper : fix segment length with params.no_timestamps == true 2024-01-12 13:37:38 +02:00
Georgi Gerganov
29f78392c1
main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00
Georgi Gerganov
668ffc9b23
whispser : reset the "batched" timings (#1721) 2024-01-04 13:38:39 +02:00
Finn Voorhees
a3d0aa73d1
ggml : add error handling to graph_compute (#1714) 2024-01-03 15:39:43 +02:00
bobqianic
37a709f655
whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG (#1681) 2023-12-23 12:02:58 +00:00
Georgi Gerganov
3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)
* sync : ggml

* sync : llama.cpp

* talk-llama : fix obsolete param

* ggml-alloc : fix ggml_tallocr_is_own

* talk.wasm : update to new ggml

* ggml : fix type punning in ggml_scale

* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Georgi Gerganov
29511d33c7
whisper : more debug messages + fix fallback logic 2023-12-08 13:43:12 +02:00
Georgi Gerganov
afce6fa113
sync : ggml (new ops, new backend, etc) (#1602)
* sync : ggml (new ops, new backend, etc)

* whisper : remove obsolete broadcasting code

* ggml : remove backend self-registers + fix ggml_concat + n_task logic

* metal : fix assert

* metal : print resource path

* whisper : fix bug if metal init fails
2023-12-07 22:27:19 +02:00
Georgi Gerganov
0ba365f958
metal : add backend function to check device family support (#1547) 2023-11-24 12:37:08 +02:00
Georgi Gerganov
ffdb5c4735
whisper : fix typo 2023-11-24 09:45:10 +02:00
bradmit
34f70b3a56
whisper : add whisper_lang_str_full (#1546)
* Update whisper.h

add whisper_lang_fullstr to retrieve the full language name

* Update whisper.cpp

add whisper_lang_fullstr to return the full language name

* fullstr -> str_full

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24 09:33:13 +02:00
Georgi Gerganov
146169ec38
bench : pass memcpy threads from cli 2023-11-21 22:27:22 +02:00
Georgi Gerganov
9befab5ab9
bench : multi-thread memcpy (#1534) 2023-11-21 22:07:30 +02:00