* whisper : check state->ctx_metal not null
* whisper : add whisper_context_params { use_gpu }
* whisper : new API with params & deprecate old API
* examples : use no-gpu param && whisper_init_from_file_with_params
* whisper.objc : enable metal & disable on simulator
* whisper.swiftui, metal : enable metal & support load default.metallib
* whisper.android : use new API
* bindings : use new API
* addon.node : fix build & test
* bindings : updata java binding
* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java
* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load
* metal : move bundle var into block
* metal : use SWIFT_PACKAGE instead of GGML_SWIFT
* style : minor updates
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)
* metal : allow env metal variable to override resource path (#1415)
* Allow env variable to override resource path
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : restore common / main from `master`
* sync : restore whisper from `master`
* talk-llama : update to latest llama.cpp
* ruby : fix build
* ggml : fix 32-bit ARM build
* ggml : fix MIN / MAX macro collisions + update ios bindings
* ggml : fix ifdefs and MIN / MAX again
* exampels : fix Obj-C and Swift examples
* ggml : fix 32-bit ARM compatibility
* ggml : one more attempt to fix 32-bit ARM compat
* whisper : fix support for larger graphs
---------
Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
* metal : init
* whisper : factor out graph builds
* whisper : allocate encoder and decoder using ggml-alloc
* whisper : ggml-alloc is now supported
* whisper : CoreML support ggml-alloc
* build : fix ggml-alloc
* ios : update submodule
* extra : update sync-ggml.sh script to also sync ggml-alloc
* ci : see if this is causing the crash
* whisper : refactor ggml-alloc init
* whisper.android : try to fix build
* whisper : initial Metal version
* ci : try to debug vmem issue
* metal : decoder works on GPU!
* metal : add multi-decoder support
* ggml : fix ggml_nbytes (probably temp solution)
* metal : run "cross" step on the GPU
* whisper : remove ggml_repeat in the encoder
* whisper : offload the Encoder to Metal
* ggml : use simpler ggml_bytes() implementation
* ggml-alloc : try to make CI happy by reducing vram to 128GB
* whisper : add whisper_allocr to wrap ggml_allocr
* whisper : factor out alloc init in a function
* cmake : update to support Metal build
* whisper : add <functional> header
* objc : fix build (no Metal yet)
* ios : add Metal support
* swiftui : fix build
* metal : speed-up KQ multiplication
* metal : sync latest llama.cpp kernels
* readme : add Metal info
* ios : update submodule
* coreml : add code to toggle Core ML config (CPU, ANE, GPU)
* bench : fix timings by running a pre-heat
* bench : start benching the decoder
* whisper : add ggml_mul_mat_pad
* bench : fix uninitialized vars
* whisper : add comment for disabling mul-mat padding
* whisper : add description of ggml_mul_mat_pad
* whisper : clean-up ggml_mul_mat_pad
* metal : remove the "concurrent" flag
* bench : variable n_past
* ios : update SPM package
* Fix MSVC compile error C3688
Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.
* Significantly improve inference quality
In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference.
* Significantly improve inference quality
At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue.
* Addressed a few minor issues
Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`.
* Significantly improve inference quality
Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information.
* Add annotation and performance improvement
* Calculate FFT only when fft_in are not all zero
* Some minor performance improvement
* Fixed a bug impacting inference quality
* The first version after all the analysis is completed.
* Fix some bugs and add debug mode
* Fixed several bugs
* Temporarily disable speed-up mode and add debug mode.
* Add debug mode
* Disable speed-up mode and add debug mode
* Fix CI error (#1)
* Fix error
* Fix error
* Fixed several bugs including [BLANK_AUDIO] problem
* Remove Hard-coded hann window
* Some Final Fix (#2)
* Fix error
* Fix error
* Probably the last commit
* Probably the last commit
* whisper : minor coding style changes
* whisper : remove debug from public API
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* expose api to let user control log output
Add
whisper_set_log_callback()
that lets user set a callback for log messages.
Change all the
fprintf(stderr, ...)
to call via the above.
* whisper : add <cstdarg>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Current `progress_step` was hardcoded into whisper.cpp, this resulted in
bindings having to access progress only at that step even if progress
callback was being called at every iteration.
With this change we get greater granularity progress reporting from
whisper.cpp and bindings/implementations can define their own progress step.
* add HuggingFace mirror to download ggml model
* support tdrz via simple hack overriding solm tokens
* fix incorrect translate/transcribe token_ids that are not static const
* add apollo 13 sample for tdrz demo
* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token
* extend whisper_segment with speaker_turn_next field and save in json output
* fix failing go build
* slipped in some python syntax whoops
* whisper : finalize tinydiarize support (add flag + fixes)
* whisper : tdrz support for word-level timestamps (respect max_len)
* java : try to fix tests after adding tdrz_enable flag
* main : remove TODO leftover
* java : fix params order list after adding "tdrz_enable"
* whisper : fix solm and add nosp token
* main : print tinydiarize help
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Java needs to call `whisper_full_default_params_by_ref()`, returning struct by val does not seem to work.
* added convenience methods to WhisperFullParams
* Remove unused WhisperJavaParams
* add detectlanguage flag
* renaming and help
* no idea why that last one didn't commit
* run language detection if dl is set
* help message fix
* various fixes
* fix quitting
* fix language being english on print
Whenever an `offset_ms` is provided, the value of `seek_end` is
calculated incorrectly. This causes Whisper to keep transcribing
after the end of the file.
The current behavior looks like
```
[00:34:40.000 --> 00:34:47.000] This is an example audio file.
[00:34:47.000 --> 00:34:49.000] The text has been redacted
[00:34:49.000 --> 00:34:51.000] This is the end of the audio.
[00:34:51.000 --> 00:34:52.000] ***
[00:34:52.000 --> 00:34:53.000] ***
[00:34:53.000 --> 00:34:54.000] ***
[00:34:55.000 --> 00:34:56.000] ***
...
```
The expected behavior should be
```
[00:34:40.000 --> 00:34:47.000] This is an example audio file.
[00:34:47.000 --> 00:34:49.000] The text has been redacted
[00:34:49.000 --> 00:34:51.000] This is the end of the audio.
- end of program -
```
This commit changes the calculation of the `seek_end` variable to
only add `seek_start` if a custom `duration_ms` is provided.
Otherwise, it defaults to the end of the file.
Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>
if the Core ML model cannot be loaded, continue without Core ML instead of
returning. This allows a single build to transcribe using Core ML models
where available, and regular models when not.
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.
However, now there are even more complaints about wrong decodings and
repetition.
So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.
Also, the stream example now has fallbacks enabled by default.
close#471#477#508#612#719#731
* examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614)
* main : remove leftovers
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* whisper.android: Support benchmark for Android example.
* whisper.android: update screenshot in README.
* update: Make text selectable for copy & paste.
* Update whisper.h to restore API name
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* whisper.android: Restore original API names.
---------
Co-authored-by: tinoue <tinoue@xevo.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Added whisper state + default state on the whisper_context
* Fixed some examples and bindings
* Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state
* Fixed comments
* whisper : reuse kv_cache_free() and fix compiler warnings
* whisper : clean-up the API comments
---------
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This bug has been present since v1.1.0.
Effectively, the past transcribed text wasn't being used for following
transcriptions, which likely significantly reduces the transcription
quality.
Likely related to #419
* Do calculation with 64 bit precision to avoid overflow
* Update whisper.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Improves WASM performance:
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome
* Add support for SSE3 SIMD
* Add SSE3 to system information
* Add Imath support for fp16-fp32 conversions
* Add Imath to system information
* Wrap Imath calls to avoid static function warnings
* Drop Imath; Add lookup table for f16 -> f32 conversions
* Remove TODO comments
* Update SSE3 to new macro arguments
* Correct updated macro definitions
* Prefer static inline where possible
* ggml : static inlines + add public f16 <-> f32 conversions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>