Commit Graph

233 Commits

Author SHA1 Message Date
Xiao-Yong Jin
0de8582f65
coreml : use the correct n_mel value (#1458) 2023-11-08 20:01:41 +00:00
Ben Nortier
baeb733691
whisper : reset mel time when resetting timings (#1452)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-08 15:52:23 +02:00
Georgi Gerganov
2cdfc4e025
whisper : add support for large v3 (#1444)
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
2023-11-07 15:30:18 +02:00
Ben Nortier
11b503055e
whisper : reset ctx->t_start_us when calling whisper_reset_timings() (#1434)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-07 11:04:32 +02:00
Georgi Gerganov
0c91aef2d8
whisper : add missing about callback initializers 2023-11-07 10:49:51 +02:00
Jhen-Jie Hong
0463028bc2
whisper : add context param to disable gpu (#1293)
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-06 11:04:24 +02:00
Georgi Gerganov
39cfad0dee
whisper : add support for new distilled Whisper models (#1424)
* whisper : add support for new distilled Whisper models

* whisper : print log when using distilled models
2023-11-05 19:43:45 +02:00
Georgi Gerganov
f96e1c5b78
sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422)
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)

* metal : allow env metal variable to override resource path (#1415)

* Allow env variable to override resource path

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* sync : restore common / main from `master`

* sync : restore whisper from `master`

* talk-llama : update to latest llama.cpp

* ruby : fix build

* ggml : fix 32-bit ARM build

* ggml : fix MIN / MAX macro collisions + update ios bindings

* ggml : fix ifdefs and MIN / MAX again

* exampels : fix Obj-C and Swift examples

* ggml : fix 32-bit ARM compatibility

* ggml : one more attempt to fix 32-bit ARM compat

* whisper : fix support for larger graphs

---------

Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
mkiol
940cdb1396
whisper : abort callback improvements (#1345)
* whisper : initialize abort_callback to null

* whisper : add example how to use abort_callback
2023-10-08 17:22:24 +03:00
mkiol
2f668c330e
whisper : add abort callback (#1335) 2023-10-04 11:57:55 +03:00
Didzis Gosko
4037705531
whisper : add missing speaker turn API function for whisper_state (#1330) 2023-10-03 22:55:48 +03:00
Georgi Gerganov
951a119926
whisper : increase tokenizer buffer (close #1259) 2023-09-15 21:11:43 +03:00
Georgi Gerganov
b8432f28f4
metal : add F32 support + update bench output 2023-09-15 13:56:08 +03:00
Georgi Gerganov
93935980f8
whisper : Metal and ggml-alloc support (#1270)
* metal : init

* whisper : factor out graph builds

* whisper : allocate encoder and decoder using ggml-alloc

* whisper : ggml-alloc is now supported

* whisper : CoreML support ggml-alloc

* build : fix ggml-alloc

* ios : update submodule

* extra : update sync-ggml.sh script to also sync ggml-alloc

* ci : see if this is causing the crash

* whisper : refactor ggml-alloc init

* whisper.android : try to fix build

* whisper : initial Metal version

* ci : try to debug vmem issue

* metal : decoder works on GPU!

* metal : add multi-decoder support

* ggml : fix ggml_nbytes (probably temp solution)

* metal : run "cross" step on the GPU

* whisper : remove ggml_repeat in the encoder

* whisper : offload the Encoder to Metal

* ggml : use simpler ggml_bytes() implementation

* ggml-alloc : try to make CI happy by reducing vram to 128GB

* whisper : add whisper_allocr to wrap ggml_allocr

* whisper : factor out alloc init in a function

* cmake : update to support Metal build

* whisper : add <functional> header

* objc : fix build (no Metal yet)

* ios : add Metal support

* swiftui : fix build

* metal : speed-up KQ multiplication

* metal : sync latest llama.cpp kernels

* readme : add Metal info

* ios : update submodule

* coreml : add code to toggle Core ML config (CPU, ANE, GPU)

* bench : fix timings by running a pre-heat

* bench : start benching the decoder

* whisper : add ggml_mul_mat_pad

* bench : fix uninitialized vars

* whisper : add comment for disabling mul-mat padding

* whisper : add description of ggml_mul_mat_pad

* whisper : clean-up ggml_mul_mat_pad

* metal : remove the "concurrent" flag

* bench : variable n_past

* ios : update SPM package
2023-09-15 12:18:18 +03:00
Georgi Gerganov
3fec2119e6
whisper : fix bench regression + fix performance when using CPU BLAS (#1275)
* whisper : fix bench regression

* ggml : use sched_yield when using BLAS + add comment
2023-09-12 13:54:04 +03:00
bobqianic
9b14418863
whisper : faster beam_search sampling via reduced KV cache copies (#1243)
* Faster `beam_search` sampling

Refine the KV cache update logic for more intelligent and efficient updating.

* Faster `whisper_sample_token_topk`

* Update whisper.cpp

* Update whisper.cpp

* Update whisper.cpp

* Reduce `memory allocation`

* Add `pointer swapping`

* Fixed some bugs

* Update whisper.cpp

* Apply suggestions from code review

* Updated the logic for determining `two-copy`

* Updated the logic for determining `two-copy` v2

* whisper : add debug logs + coding style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-10 16:04:27 +03:00
Georgi Gerganov
59a3d0cb57
ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220)
* ggml : sync (ggml-alloc, GPU, eps, etc.)

* ggml : fix build

* wasm : fix build
2023-09-05 13:54:40 +03:00
ChangSeok Oh
8e30bf3c02
ggml : fix compilation errors incurred by -Werror (#1227)
The -Werror warning option turns all warnings into errors. This PR makes
the compiler happy to build ggml.c and whisper.cpp with the stricter option.
2023-08-30 22:09:15 +03:00
Przemysław Pawełczyk
601c2d2181
ggml : detect SSSE3 (#1211)
* ggml : add ggml_cpu_has_ssse3

* whisper : show SSSE3 in system info

* make : detect SSSE3 via cpuinfo
2023-08-27 21:36:41 +03:00
Georgi Gerganov
b5bb5c85d4
whisper : allow whisper_full from mel spectrogram - no audio (#1214)
Co-authored-by: jbrough <jamie1612@gmail.com>
2023-08-27 20:02:57 +03:00
bobqianic
7e54df414e
whisper : significantly improve the inference quality (#1148)
* Fix MSVC compile error C3688

Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.

* Significantly improve inference quality

In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference.

* Significantly improve inference quality

At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue.

* Addressed a few minor issues

Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`.

* Significantly improve inference quality 

Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information.

* Add annotation and performance improvement

* Calculate FFT only when fft_in are not all zero

* Some minor performance improvement

* Fixed a bug impacting inference quality

* The first version after all the analysis is completed.

* Fix some bugs and add debug mode

* Fixed several bugs

* Temporarily disable speed-up mode and add debug mode.

* Add debug mode

* Disable speed-up mode and add debug mode

* Fix CI error (#1)

* Fix error

* Fix error

* Fixed several bugs including [BLANK_AUDIO] problem

* Remove Hard-coded hann window

* Some Final Fix (#2)

* Fix error

* Fix error

* Probably the last commit

* Probably the last commit

* whisper : minor coding style changes

* whisper : remove debug from public API

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-27 19:51:33 +03:00
Fangjun Kuang
aad2dad38a
whisper : minor fixes (#1154) 2023-08-27 19:02:00 +03:00
Alexandr Graschenkov
c84cf87261
whisper : add precalculated values of sin/cos for speeding up FFT (#1142)
* Add sin/cos precalculated values to speedup FFT

* Update whisper.cpp

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>

* Update whisper.cpp

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>
2023-08-25 15:51:14 +03:00
Evan Martin
fabf79fc67
whisper : expose API to let user control log output (#1060)
* expose api to let user control log output

Add
  whisper_set_log_callback()
that lets user set a callback for log messages.

Change all the
  fprintf(stderr, ...)
to call via the above.

* whisper : add <cstdarg>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 18:58:25 +03:00
Hrishikesh Barman
925915ae37
whisper : move progress calculation out of whisper.cpp (#1081)
Current `progress_step` was hardcoded into whisper.cpp, this resulted in
bindings having to access progress only at that step even if progress
callback was being called at every iteration.

With this change we get greater granularity progress reporting from
whisper.cpp and bindings/implementations can define their own progress step.
2023-07-25 18:53:34 +03:00
Georgi Gerganov
4774d2feb0
whisper : minor OpenVINO refactoring (#1037)
Hopefully I didn't break something - haven't tested
2023-07-04 20:28:27 +03:00
Ryan Metcalfe
62b81276e0
whisper : add OpenVINO support (#1037)
* openvino: use OpenVINO encoder inference

* openvino: add python script for OpenVINO model generation

* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper: Fix compilation error

* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures

* cmake: Add openvino-encoder as separate object target

* whisper : minor style fixes

* minor : indentation fixes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 15:56:11 +03:00
Akash Mahajan
c8d0f5fe98
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058)
* add HuggingFace mirror to download  ggml model

* support tdrz via simple hack overriding solm tokens

* fix incorrect translate/transcribe token_ids that are not static const

* add apollo 13 sample for tdrz demo

* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token

* extend whisper_segment with speaker_turn_next field and save in json output

* fix failing go build

* slipped in some python syntax whoops

* whisper : finalize tinydiarize support (add flag + fixes)

* whisper : tdrz support for word-level timestamps (respect max_len)

* java : try to fix tests after adding tdrz_enable flag

* main : remove TODO leftover

* java : fix params order list after adding "tdrz_enable"

* whisper : fix solm and add nosp token

* main : print tinydiarize help

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 09:45:00 +03:00
Georgi Gerganov
d6509bf78d
ggml : sync latest repo (mostly refactoring changes) 2023-07-02 21:46:09 +03:00
Georgi Gerganov
72deb41eb2
whisper : split_on_word no longer trims (#1046) 2023-06-25 23:51:01 +03:00
Philippe Normand
44cb044e66
whisper : fix build with -Werror=undef (#1045) 2023-06-25 15:30:39 +03:00
Georgi Gerganov
5feb0dffba
ggml : sync latest ggml lib 2023-06-25 14:30:44 +03:00
Nicholas Albion
d7c936b44a
Feature/java bindings2 (#944)
* Java needs to call `whisper_full_default_params_by_ref()`, returning struct by val does not seem to work.
* added convenience methods to WhisperFullParams
* Remove unused WhisperJavaParams
2023-05-29 09:38:58 +10:00
Elkana Bardugo
56a87ba45d
whisper : fix hebrew language code (#935) 2023-05-20 18:17:54 +03:00
Georgi Gerganov
fd01209d09
coreml : support quantized model files 2023-05-14 18:09:44 +03:00
Georgi Gerganov
e693074aa6
ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
CRD716
b806420873
whisper : add detect-language mode (#853)
* add detectlanguage flag

* renaming and help

* no idea why that last one didn't commit

* run language detection if dl is set

* help message fix

* various fixes

* fix quitting

* fix language being english on print
2023-05-02 19:51:52 +03:00
Georgi Gerganov
d375d73b2e
bench : improve benchmarks 2023-05-01 14:44:39 +03:00
Georgi Gerganov
7765770f89
whisper : add memory sizes for Q8_0 (close #846) 2023-05-01 10:03:56 +03:00
Georgi Gerganov
c94c469592
whisper : fix quantize bug (#842)
* whisper : debug

* whisper : fix bug during quantization
2023-04-30 22:50:04 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
5fd1bdd7fc
whisper : add GPU support via cuBLAS (#834)
* make : add WHISPER_CUBLAS

* make : fix CUBLAS build

* whisper : disable Flash Attention + adjust memory buffers

* whisper : remove old commented code

* readme : add cuBLAS instructions

* cmake : add WHISPER_CUBLAS option

* gitignore : ignore build-cublas
2023-04-30 12:14:33 +03:00
Thijs Raymakers
6108d3cc58
whisper : use correct seek_end when offset is used (#833)
Whenever an `offset_ms` is provided, the value of `seek_end` is
calculated incorrectly. This causes Whisper to keep transcribing
after the end of the file.

The current behavior looks like
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
[00:34:51.000 --> 00:34:52.000]   ***
[00:34:52.000 --> 00:34:53.000]   ***
[00:34:53.000 --> 00:34:54.000]   ***
[00:34:55.000 --> 00:34:56.000]   ***
...
```

The expected behavior should be
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
- end of program -
```

This commit changes the calculation of the `seek_end` variable to
only add `seek_start` if a custom `duration_ms` is provided.
Otherwise, it defaults to the end of the file.

Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>
2023-04-29 18:55:37 +03:00
Georgi Gerganov
3efb81dec6
build : add WHISPER_COREML_ALLOW_FALLBACK to make / CMake (#812) 2023-04-29 10:55:24 +03:00
Canis Lupus
94a7cd2a07
whisper : allow non-CoreML fallback when Core ML cannot be loaded (#812)
if the Core ML model cannot be loaded, continue without Core ML instead of
returning. This allows a single build to transcribe using Core ML models
where available, and regular models when not.
2023-04-29 10:49:02 +03:00
Georgi Gerganov
3e82ff4747
whisper : fix bug from previous commit 2023-04-29 10:42:14 +03:00
Georgi Gerganov
b5bd2f43c5
whisper : avoid designated initializers 2023-04-29 10:36:50 +03:00
AsukaMinato
94aa56f19e
minor : improve C++ and Python style (#768)
* use some STL functions

* use self.field than setattr, use pathlib.Path

* recover some format

* const some iter

* Keep the original

* 2 space
2023-04-29 10:06:25 +03:00
Georgi Gerganov
5108b30e6d
whisper : pad audio instead of spectrogram (#579)
Also, fallback only if more temperatures are available and if we are
at least 3 seconds before the end of the audio
2023-04-15 17:19:19 +03:00
Georgi Gerganov
f19e23fbd1
whisper : restore decoder temperature fallbacks
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close #471 #477 #508 #612 #719 #731
2023-04-15 16:12:55 +03:00
Georgi Gerganov
3dead611bb
whisper : slightly faster Log Mel computation + n-1 FFT threads (#568) 2023-04-15 14:18:46 +03:00
Georgi Gerganov
5e47e223bd
whisper : add Core ML support (#566)
* coreml : use Core ML encoder inference

* coreml : simlpify whisper_encode + log messages

* whisper : resolve rebase conflicts

* coreml : add scripts for CoreML model generation

* bench-all : recognize COREML flag
2023-04-15 13:21:27 +03:00
Maximiliano Levi
794ff3074a
whisper : do not launch log_mel threads when n_thread is 1 (#763) 2023-04-14 22:35:34 +03:00
AfryMask
7e2afa4384
whisper : fix the bug related to word splitting errors in the "tokenize" function. (#760)
Co-authored-by: AfryMask <afrymask@gmail.com>
2023-04-14 20:35:03 +03:00
Bader-eddine Ouaich
2c856fb9e5
whisper : fix potential memory leaks (#740)
* fix potential memory leak if whisper_init_state failed

* fix potential memory leak if gpt2_init failed
2023-04-14 20:05:56 +03:00
Georgi Gerganov
514cd04452 whisper : fix bug in prompt processing (close #705)
Was dereferencing a dangling pointer
2023-04-14 19:17:07 +03:00
Georgi Gerganov
69b8503935
ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
pajowu
0a2d1210bc
whisper : add progress callback (#600) 2023-03-30 20:29:29 +03:00
Jhen-Jie Hong
eefed45e37
whisper : add initial_prompt param (#645) 2023-03-29 23:23:23 +03:00
Georgi Gerganov
42c6855103
whisper : bump "large" scratch buffer even mode (close #671) 2023-03-28 10:50:49 +03:00
Georgi Gerganov
0be9cd3497
whisper : increase scratch buffers after recent change (#671)
Should fix the error:

ggml_new_tensor_impl: not enough space in the scratch memory
2023-03-28 10:36:16 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov
8e361d90d7
whisper : disable fallbacks until the performance is improved (#588) 2023-03-22 22:34:39 +02:00
sandrohanea
d4fa0d92ad
fixed language auto-detection for state provided processing (#627)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-03-22 21:47:09 +02:00
Leo Moll
8fcd1a3b32
main : provide option for creating JSON output (#615)
* examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614)

* main : remove leftovers

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-22 21:37:36 +02:00
Georgi Gerganov
1beff6f66d
models : change HF hosting from dataset to model 2023-03-22 20:44:56 +02:00
Takeshi Inoue
09e9068007
whisper.android : support benchmark for Android example. (#542)
* whisper.android: Support benchmark for Android example.

* whisper.android: update screenshot in README.

* update: Make text selectable for copy & paste.

* Update whisper.h to restore API name

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper.android: Restore original API names.

---------

Co-authored-by: tinoue <tinoue@xevo.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-07 21:36:30 +02:00
sandrohanea
59fdcd19c8
whisper : add whisper_state + default state on the whisper_context (#523)
* Added whisper state + default state on the whisper_context

* Fixed some examples and bindings

* Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state

* Fixed comments

* whisper : reuse kv_cache_free() and fix compiler warnings

* whisper : clean-up the API comments

---------

Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-05 21:42:19 +02:00
Georgi Gerganov
478289a4b3
whisper : set no_context == true by default (#537) 2023-03-05 20:53:43 +02:00
Georgi Gerganov
373043cabe
whisper : zero-initialize some more context variables
Just in case
2023-02-21 19:00:42 +02:00
Finn Voorhees
fb4d0d470f whisper : fix uninitialized exp_n_audio_ctx 2023-02-21 18:58:08 +02:00
Georgi Gerganov
0d229163bb
whisper : add API for applying custom logits filters during decoding 2023-02-19 18:35:01 +02:00
Georgi Gerganov
a94897bcde
whisper : by default disable non-speech tokens suppression (#473)
This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..
2023-02-15 21:48:49 +02:00
shikokuchuo
0336161b7d
whisper : fix signedness compiler warning (#506) 2023-02-15 19:08:25 +02:00
shibukazu
cfc06bf8df
whisper : suppress non-speech-related token outputs (#473)
* add non-speech-token suppression

* add suppress non-speech_tokens param
2023-02-08 09:05:34 +02:00
sandrohanea
2bfe0ebc0f
whisper : fixed Beam Search Strategy and exposed whisper_pcm_to_mel_phase_vocoder (#474)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-02-08 09:01:47 +02:00
boolemancer
4dd7119deb
whisper : only trim if split_on_word is true (#476) 2023-02-08 08:43:23 +02:00
kamranjon
a1c1583cc7
whisper : add whisper_full_lang_id() for getting the context lang (#461) 2023-02-05 14:46:26 +02:00
Matija Pevec
d012b5c7e4
whisper : add "split_on_word" flag when using using "max_len" option (#455)
* Update whisper.cpp

* fix: trim function

* feat: added flag to split on word

* fix: arguments for main
2023-02-05 14:44:23 +02:00
Georgi Gerganov
f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
Georgi Gerganov
291980369c
whisper : suppress task tokens (#442) 2023-02-04 09:03:14 +02:00
Georgi Gerganov
b992f3709e
whisper : do not provide past prompt when n_max_text_ctx == 0 2023-01-25 20:01:00 +02:00
Georgi Gerganov
b5ddb16ec7
whisper : condition timestamps to be monotonically increasing (#425) 2023-01-23 20:48:26 +02:00
fitzsim
ae16c21e9c
whisper : PPC64 big-endian support (#398)
* ggml : set cache line size to 128 on POWER9

* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov
78f166174f
whisper : fix condition for providing past prompt (critical)
This bug has been present since v1.1.0.

Effectively, the past transcribed text wasn't being used for following
transcriptions, which likely significantly reduces the transcription
quality.

Likely related to #419
2023-01-22 10:47:01 +02:00
Georgi Gerganov
21c569ba4a
whisper : extend information in whisper_print_timings() 2023-01-19 18:50:33 +02:00
Georgi Gerganov
1a91c19af9
whisper : perform entropy check only when we have at least 32 tokens (#412) 2023-01-18 22:52:18 +02:00
Georgi Gerganov
a6cf6f4c4a
bench : minor fixes 2023-01-18 21:40:10 +02:00
Georgi Gerganov
1ccb8a46a5
bench : fix Windows linkage by moving ggml benches in whisper lib .. 2023-01-18 21:19:50 +02:00
Georgi Gerganov
8088a977af
whisper : fix possible uninitialized variables (#291) 2023-01-16 21:44:40 +02:00
Georgi Gerganov
00ea21668b
whisper : account speed_up flag for short audio (close #405) 2023-01-15 12:42:15 +02:00
Georgi Gerganov
8de452c18b
Improve decoding (#291)
* whisper : prepare infra for new decoding strategies

* whisper : apply logit filters and compute logprobs

* whisper : add whisper_get_logits()

* whisper : separate self and cross attention memory

Initial step needed for supporting parallel decoders

* whisper : move probs_id buffer to whisper_context

* whisper : refactor kv cache into separate struct

* whisper : move self-attention kv cache to whisper_decoder

* whisper : wip decoding parameters + strategies

* whisper : wip decoding parameters + strategies (part 2)

* whisper : wip decoding parameters + strategies (part 3)

* whisper : wip decoding parameters + strategies (part 4)

* whisper : fix prompt_past update to not include prompt_init

* whisper : temperature + best_of support

* whisper : support for compression_ration_threshold

We actually use entropy, but it is similar

* command : fix example to use logits instead of obsolete probs

* whisper : handle empty sequence ranking

* whisper : add WHISPER_DEBUG + diagnostic prints + new main args

* whisper : minor fixes

* whisper : add beam-search support

* whisper : bug fix when there no previous context

* whisper : add comments

* stream : disable temperature fallback

For real-time processing, we always want a single decoder running at T=0

* whisper.swiftui : update example - fix paths + add empty folders
2023-01-15 11:29:57 +02:00
Georgi Gerganov
4ef3398e8f
ggml : remove obsolete zeroing + comment fixes (#390) 2023-01-08 20:21:03 +02:00
boolemancer
08dc705a69
whisper : fix sample_to_timestamp calculation with 64 bit precision to avoid overflow (#388)
* Do calculation with 64 bit precision to avoid overflow

* Update whisper.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 15:08:45 +02:00
Syahmi Azhar
1512545149
whisper : add loader class to allow loading from buffer and others (#353)
* whisper : add loader to allow loading from other than file

* whisper : rename whisper_init to whisper_init_from_file

* whisper : add whisper_init_from_buffer

* android : Delete local.properties

* android : load models directly from assets

* whisper : adding <stddef.h> needed for size_t + code style

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 13:03:33 +02:00
Georgi Gerganov
65fdcbbbbb
whisper : revert accidental MB change 2023-01-07 16:18:21 +02:00
Georgi Gerganov
d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll 2023-01-07 16:16:42 +02:00
Abitofevrything
a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons
1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Georgi Gerganov
ad2a4ffa03
whisper : do not use F16 tensors when in F32 mode (#369) 2023-01-05 22:56:25 +02:00