Andy Maloney
dd6d582977
whisper : use ranged-based for loops for readability
2023-01-05 21:20:44 +02:00
Georgi Gerganov
d51c5eb906
ggml : define MIN / MAX only if not defined (minor)
2023-01-05 21:16:52 +02:00
Georgi Gerganov
d97e6005e9
whisper : add whisper_n_audio_ctx and check for invalid audio_ctx
...
closes #344
2022-12-31 09:57:19 +02:00
Georgi Gerganov
68daf6e487
whisper : avoid some memory allocations
2022-12-30 13:43:48 +02:00
Georgi Gerganov
ac521a566e
ggml : simplify the SIMD code ( #324 )
...
* ggml : simplify the SIMD code
* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Andy Maloney
543bd5627e
whisper : use emplace_back in place of push_back ( #319 )
...
This avoids potential construction of temporaries.
2022-12-23 11:07:19 +02:00
Andy Maloney
62fee9a9cc
whisper : fix mem leak on failure to load model ( #318 )
2022-12-23 11:06:17 +02:00
Andy Maloney
fa463313ad
minor : small code cleanups ( #302 )
...
* Small code cleanups
- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""
* minor : switch case always break
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-22 17:06:19 +02:00
Georgi Gerganov
501a6b455c
minor : flag "ARM FMA" -> "ARM_FMA"
2022-12-22 16:47:54 +02:00
Kevin Brothaler
e1432dd91a
Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
...
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Andy Maloney
42c6730732
whisper : use nullptr (C++11) instead of NULL macro ( #299 )
2022-12-22 16:35:18 +02:00
Georgi Gerganov
99da1e5cc8
cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings
2022-12-19 20:45:08 +02:00
Matheus de Sousa
8e3f129b4d
minor : resolves some of warnings when compiling with clang/clang++ ( #294 )
...
* Resolves some of warnings when compiling with clang/clang++
Mostly nit stuff that clang catches when compiling with -Wall -Wextra
-pedantic.
- Fix comparison between sign/unsigned integers.
- Passes a constant reference (const&) instead of copying each time.
* minor : normalize coding style
* minor : fix warning
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-19 20:19:01 +02:00
Georgi Gerganov
fba10a4c68
whisper : language auto-detect ( #59 )
2022-12-17 18:49:44 +02:00
Georgi Gerganov
6a69e3ae27
command : adding guided mode
2022-12-16 19:38:18 +02:00
Georgi Gerganov
bf69b669a0
whisper : add whisper_tokenize()
...
Tokenizes a string into a list of vocabulary tokens
2022-12-16 19:38:18 +02:00
Georgi Gerganov
6a7c82501e
whisper : improve decoding strategy ( #244 )
...
- Clear past prompt when there is very short audio left for processing.
My observation is that in these cases the decoding tends to repeat and
hallucinate stuff and I think this is induced by the existing prompt
- When we fail to sample timestamp token, retry by clearing the past
prompt. If it fails again, then we advance the window by 1 second
2022-12-16 18:34:35 +02:00
Georgi Gerganov
124c718c73
whisper : fix UB when reading buffer of length 0 bytes ( #265 )
2022-12-13 23:14:47 +02:00
Roland Rabien
e70d47baab
Remove C++20 requirement ( #257 )
...
* Remove C++20 requirement
* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
bert hubert
d1da35de06
fix potential bug reading model data into a small size optimized string which could lead to memory corruption. In an SSO string, you can't write data to &str[0] and expect it to work well.
...
Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.
2022-12-10 16:20:48 +02:00
Georgi Gerganov
603f97ba11
whisper : minor improvemnt in decoding strategy ( #244 )
...
Do not allow for text segments to go beyond end of audio.
This partially mitigates some issues when the last audio window is 1-2
seconds just before the end of the audio file and the decoding spirals
into a repetition of the last transcribed phrase.
2022-12-10 13:38:26 +02:00
Georgi Gerganov
f8ec718b76
ggml : add F16C CPU flag check
2022-12-06 21:56:56 +02:00
Georgi Gerganov
78d13257be
Try to improve the token sampling strategy ( #193 )
...
* whisper : try to improve the token sampling strategy
- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past
* whisper : fix the max initial timestamp logic + fallback decoding
2022-12-02 21:51:50 +02:00
Georgi Gerganov
4698dcdb52
whisper : add mechanism for aborting the whisper_full() computation
2022-11-27 20:42:45 +02:00
Georgi Gerganov
e266cb0723
whisper.objc : add real-time processing ( #97 )
...
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
Georgi Gerganov
c207eed431
whisper.objc : fix build warnings
2022-11-26 16:27:04 +02:00
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing ( close #86 )
2022-11-25 23:07:42 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2022-11-25 19:28:04 +02:00
Georgi Gerganov
128aaadb93
whisper : improve printfs
2022-11-24 17:54:16 +02:00
katsu560
83456076f0
add AVX support
2022-11-23 22:16:33 +02:00
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm
2022-11-23 17:19:21 +02:00
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly ( #155 )
...
* talk : initial real-time transcription in the browser
* talk : polishing the UI
* talk : ready for beta testing
* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2022-11-20 21:22:41 +02:00
greeshmay
2ba66360c9
fix: free ggml_context ( close #149 ) ( #150 )
...
* fix: free ggml_context
* ggml : free the model's contexts in whisper_free()
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-11-17 22:12:51 +02:00
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov
02dfd5b8c3
whisper : fix extra memory usage after recent processor changes
...
Had increased the memory buffer to the size of the model and forgot to
bring it down.
2022-11-02 18:31:18 +02:00
Georgi Gerganov
57fb46f307
main : add option for word-leve timestamps (very experimental)
2022-10-30 17:06:57 +02:00
Georgi Gerganov
eba62e0fa1
close #113 : fix struct whisper_token_data
2022-10-30 08:23:52 +02:00
Georgi Gerganov
014a119052
minor : fix multiple definitions of to_timestamp()
2022-10-29 19:37:19 +03:00
Georgi Gerganov
dec40be58f
parallel : print time of audio boundaries + fix timings
2022-10-29 19:37:19 +03:00
Georgi Gerganov
0b2dc3c82c
parallel : working
2022-10-29 19:37:19 +03:00
Georgi Gerganov
85d6e1e1e7
main : fix sampling time + add max_context parameter
2022-10-29 19:37:19 +03:00
Georgi Gerganov
72e9cdd6bf
parallel : adding tool for parallel transformer inference
2022-10-29 19:37:19 +03:00
Borislav Stanimirov
c565c569e7
Define WHISPER_BUILD so as to export symbols on Windows
2022-10-29 13:23:09 +03:00
Georgi Gerganov
34bb3ab0cf
ggml : add system info functions
2022-10-25 20:53:48 +03:00
Georgi Gerganov
5f7e9fa2dc
ref #68 , #79 : fix segment time output
2022-10-23 13:30:30 +03:00
Georgi Gerganov
7affd309d3
whisper : add new-segment callback
...
Can be used to process new segments as they are being generated.
Sample usage in main, for printing the resulting segments during the
inference.
2022-10-22 21:17:21 +03:00
Georgi Gerganov
31ff0c6a1f
wip : experimental color coding of tokens based on probabilities
2022-10-22 21:17:21 +03:00
Georgi Gerganov
8d15a1c635
ci : fix and re-enable tests (2nd try)
2022-10-21 15:57:20 +03:00
Georgi Gerganov
692aa0784f
Revert "ci : fix and re-enable tests"
...
This reverts commit 80aefc9514
.
2022-10-21 15:36:19 +03:00
Georgi Gerganov
80aefc9514
ci : fix and re-enable tests
2022-10-21 15:27:30 +03:00
Georgi Gerganov
7eeef0358a
ref #52 : improve greedy sampling strategy
...
Force timestamp token to be sampled if the probability sum over all
timestamp tokens is above the probability of any other token
2022-10-18 19:48:15 +03:00
Georgi Gerganov
e30cf83158
ref #57 , #62 , #63 : remove unions in C-api + remove designated initializers
...
We are not ready for designated initializers - many compilers do not
support this C++ feature yet, so removing it's non-trivial usages.
2022-10-18 18:17:24 +03:00
Georgi Gerganov
d6b84b2a23
ref #62 : fix build for some compilers
...
For some reason, new version of GCC panic when the struct type is not
specified explicitly
2022-10-18 10:57:03 +03:00
Georgi Gerganov
b4a3875b2c
Revert recent sampling change
...
It does not actually help and seems to produce worse results on some of
the samples
2022-10-18 08:26:16 +03:00
Georgi Gerganov
cf67bfffa0
Fix EOT token handling
...
If it is the end of the audio, pick all sampled tokens.
Otherwise, print error message.
2022-10-18 00:53:06 +03:00
Georgi Gerganov
d14823582d
Try to improve the sampling strategy a bit
...
It sill fails sometimes when it does not sample a timestamp token for
the entire segment. We now print a message in such cases
2022-10-18 00:12:51 +03:00
Georgi Gerganov
20d8e7a309
Fix memory sizes
2022-10-18 00:12:51 +03:00
Georgi Gerganov
72d967bce4
Use Accelerate framework on Apple silicon
...
Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro)
Also various extra optimizations:
- Multi-threaded NORM operator
- Faster GELU via F16 cast
2022-10-18 00:12:51 +03:00
Georgi Gerganov
0ad085f5e8
ref #48 : clear results at the start of whisper_full
...
This way, even if the input audio is empty, the previous results will be
removed.
2022-10-15 09:55:28 +03:00
0/0
b799226973
check if spectogram length is <100 before doing anything else
...
fixes #39
2022-10-12 07:32:42 +03:00
Borislav Stanimirov
0b45d25151
Building with MSVC
2022-10-11 21:40:46 +03:00
Georgi Gerganov
63b6786767
Minor
2022-10-10 22:06:27 +03:00
lnyan
4bbb8a587b
Add MinGW support
2022-10-09 22:26:37 +08:00
Georgi Gerganov
2ca8cc77b2
ref #17 : print whisper logs to stderr
...
Only the transcribed/translted text is printed to stdout.
This way, one can redirect the result to a file.
2022-10-08 17:28:06 +03:00
Georgi Gerganov
8c7c018893
ref #17 : add options to output result to file
...
Support for:
- plain text
- VTT
- SRT
2022-10-08 17:22:22 +03:00
Georgi Gerganov
b43b36e006
Update tests
2022-10-08 11:43:42 +03:00
Georgi Gerganov
2f069335ab
Adding sanitizer tests
2022-10-08 11:43:42 +03:00
Georgi Gerganov
332c9d77fe
whisper : fix bug in token sampling logic
...
Could overflow buffer
2022-10-08 09:02:41 +03:00
Georgi Gerganov
481cd685d5
ref #10 : option to keep context in "stream" example
...
Seems the results become worse when we keep the context, so by default
this is not enabled
2022-10-07 22:30:44 +03:00
Georgi Gerganov
7787b878e1
ref #16 , #22 : add "offset" argument
...
Allows to start processing the input audio at some offset from the
beginning. Useful for splitting a long job into multiple tasks.
2022-10-07 22:00:40 +03:00
Georgi Gerganov
167324584b
wip : rpi4 support
2022-10-05 23:03:46 +03:00
Georgi Gerganov
ce1fe95902
wip : improve makefile
2022-10-05 23:03:46 +03:00
Georgi Gerganov
6814cc9b02
Improve result printing
2022-10-04 23:18:15 +03:00
Georgi Gerganov
eba33adadd
Extend C-style API with full inference methods
2022-10-04 23:18:15 +03:00
Georgi Gerganov
6b77124e01
Initial C-style interface for whisper.cpp
2022-10-04 23:18:15 +03:00