docs : make model options / model install methods clearer (#1806)

* Make models more "discoverable"

* Clean up code block language identifiers

* make 3 options clearer

* undo Prettier formatter change

* docs: `$` shell prompt, consistently

* docs: minor changes
This commit is contained in:
Michael Rienstra 2024-01-26 07:39:54 -08:00 committed by GitHub
parent 1cf679dec4
commit 4bbb60efce
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 136 additions and 112 deletions

102
README.md
View File

@ -36,7 +36,7 @@ Supported platforms:
- [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp) - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp). The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library. The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc) As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
@ -61,22 +61,22 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream) - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
- Various other examples are available in the [examples](examples) folder - Various other examples are available in the [examples](examples) folder
The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
## Quick start ## Quick start
First clone the repository. First clone the repository:
Then, download one of the Whisper models converted in [ggml format](models). For example: ```bash
git clone https://github.com/ggerganov/whisper.cpp.git
```
Then, download one of the Whisper [models](models/README.md) converted in [`ggml` format](#ggml-format). For example:
```bash ```bash
bash ./models/download-ggml-model.sh base.en bash ./models/download-ggml-model.sh base.en
``` ```
If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md).
Now build the [main](examples/main) example and transcribe an audio file like this: Now build the [main](examples/main) example and transcribe an audio file like this:
```bash ```bash
@ -91,7 +91,7 @@ make
For a quick demo, simply run `make base.en`: For a quick demo, simply run `make base.en`:
```java ```text
$ make base.en $ make base.en
cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
@ -207,7 +207,7 @@ For detailed usage instructions, run: `./main -h`
Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool. Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
For example, you can use `ffmpeg` like this: For example, you can use `ffmpeg` like this:
```java ```bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
``` ```
@ -240,7 +240,7 @@ make large-v3
## Memory usage ## Memory usage
| Model | Disk | Mem | | Model | Disk | Mem |
| --- | --- | --- | | ------ | ------- | ------- |
| tiny | 75 MiB | ~273 MB | | tiny | 75 MiB | ~273 MB |
| base | 142 MiB | ~388 MB | | base | 142 MiB | ~388 MB |
| small | 466 MiB | ~852 MB | | small | 466 MiB | ~852 MB |
@ -304,8 +304,8 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
- Run the examples as usual. For example: - Run the examples as usual. For example:
```bash ```text
./main -m models/ggml-base.en.bin -f samples/jfk.wav $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
... ...
@ -333,7 +333,8 @@ This can result in significant speedup in encoder performance. Here are the inst
- First, setup python virtual env. and install python dependencies. Python 3.10 is recommended. - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
Windows: Windows:
```
```powershell
cd models cd models
python -m venv openvino_conv_env python -m venv openvino_conv_env
openvino_conv_env\Scripts\activate openvino_conv_env\Scripts\activate
@ -342,7 +343,8 @@ This can result in significant speedup in encoder performance. Here are the inst
``` ```
Linux and macOS: Linux and macOS:
```
```bash
cd models cd models
python3 -m venv openvino_conv_env python3 -m venv openvino_conv_env
source openvino_conv_env/bin/activate source openvino_conv_env/bin/activate
@ -356,7 +358,7 @@ This can result in significant speedup in encoder performance. Here are the inst
python convert-whisper-to-openvino.py --model base.en python convert-whisper-to-openvino.py --model base.en
``` ```
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as `ggml` models, as that
is the default location that the OpenVINO extension will search at runtime. is the default location that the OpenVINO extension will search at runtime.
- Build `whisper.cpp` with OpenVINO support: - Build `whisper.cpp` with OpenVINO support:
@ -366,24 +368,28 @@ This can result in significant speedup in encoder performance. Here are the inst
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example: After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
Linux: Linux:
```bash ```bash
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
``` ```
Windows (cmd): Windows (cmd):
```
```powershell
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
``` ```
And then build the project using cmake: And then build the project using cmake:
```bash ```bash
cmake -B build -DWHISPER_OPENVINO=1 cmake -B build -DWHISPER_OPENVINO=1
cmake --build build -j --config Release cmake --build build -j --config Release
``` ```
- Run the examples as usual. For example: - Run the examples as usual. For example:
```bash
./main -m models/ggml-base.en.bin -f samples/jfk.wav ```text
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
... ...
@ -434,7 +440,6 @@ cmake -B build -DWHISPER_CLBLAST=ON
cmake --build build -j --config Release cmake --build build -j --config Release
``` ```
Run all the examples as usual. Run all the examples as usual.
## BLAS CPU support via OpenBLAS ## BLAS CPU support via OpenBLAS
@ -452,10 +457,12 @@ WHISPER_OPENBLAS=1 make -j
## Docker ## Docker
### Prerequisites ### Prerequisites
* Docker must be installed and running on your system.
* Create a folder to store big models & intermediate files (ex. /whisper/models) - Docker must be installed and running on your system.
- Create a folder to store big models & intermediate files (ex. /whisper/models)
### Images ### Images
We have two Docker images available for this project: We have two Docker images available for this project:
1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`) 1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
@ -491,7 +498,7 @@ in about half a minute on a MacBook M1 Pro, using `medium.en` model:
<details> <details>
<summary>Expand to see the result</summary> <summary>Expand to see the result</summary>
```java ```text
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8 $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
whisper_init_from_file: loading model from 'models/ggml-medium.en.bin' whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
@ -563,6 +570,7 @@ whisper_print_timings: encode time = 18665.10 ms / 9 runs ( 2073.90 ms per
whisper_print_timings: decode time = 13090.93 ms / 549 runs ( 23.85 ms per run) whisper_print_timings: decode time = 13090.93 ms / 549 runs ( 23.85 ms per run)
whisper_print_timings: total time = 32733.52 ms whisper_print_timings: total time = 32733.52 ms
``` ```
</details> </details>
## Real-time audio input example ## Real-time audio input example
@ -571,7 +579,7 @@ This is a naive example of performing real-time inference on audio from your mic
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously. The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10). More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
```java ```bash
make stream make stream
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
``` ```
@ -583,7 +591,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
to highlight words with high or low confidence: to highlight words with high or low confidence:
```java ```bash
./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors ./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
``` ```
@ -593,8 +601,8 @@ to highlight words with high or low confidence:
For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`: For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
```java ```text
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16 $ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
whisper_model_load: loading model from './models/ggml-base.en.bin' whisper_model_load: loading model from './models/ggml-base.en.bin'
... ...
@ -617,8 +625,8 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`: The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
```java ```text
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1 $ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
whisper_model_load: loading model from './models/ggml-base.en.bin' whisper_model_load: loading model from './models/ggml-base.en.bin'
... ...
@ -688,7 +696,7 @@ This requires to have `ffmpeg` installed.
Here are a few *"typical"* examples: Here are a few *"typical"* examples:
```java ```bash
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
source ./samples/jfk.wav.wts source ./samples/jfk.wav.wts
ffplay ./samples/jfk.wav.mp4 ffplay ./samples/jfk.wav.mp4
@ -698,7 +706,7 @@ https://user-images.githubusercontent.com/1991296/199337465-dbee4b5e-9aeb-48a3-b
--- ---
```java ```bash
./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts ./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
source ./samples/mm0.wav.wts source ./samples/mm0.wav.wts
ffplay ./samples/mm0.wav.mp4 ffplay ./samples/mm0.wav.mp4
@ -708,7 +716,7 @@ https://user-images.githubusercontent.com/1991296/199337504-cc8fd233-0cb7-4920-9
--- ---
```java ```bash
./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts ./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
source ./samples/gb0.wav.wts source ./samples/gb0.wav.wts
ffplay ./samples/gb0.wav.mp4 ffplay ./samples/gb0.wav.mp4
@ -722,7 +730,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format: Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
```java ```bash
./extra/bench-wts.sh samples/jfk.wav ./extra/bench-wts.sh samples/jfk.wav
ffplay ./samples/jfk.wav.all.mp4 ffplay ./samples/jfk.wav.all.mp4
``` ```
@ -751,8 +759,7 @@ It is written in python with the intention of being easy to modify and extend fo
It outputs a csv file with the results of the benchmarking. It outputs a csv file with the results of the benchmarking.
## `ggml` format
## ggml format
The original models are converted to a custom binary format. This allows to pack everything needed into a single file: The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
@ -767,28 +774,27 @@ or manually from here:
- https://huggingface.co/ggerganov/whisper.cpp - https://huggingface.co/ggerganov/whisper.cpp
- https://ggml.ggerganov.com - https://ggml.ggerganov.com
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or the README For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
in [models](models).
## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings) ## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
- [X] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310) - [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
- [X] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309) - [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn) - React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
- [X] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312) - [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
- [X] Java: - [x] Java:
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni) - [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
- [X] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507) - [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
- [X] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313) - [x] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper) - [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
- [X] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422) - [x] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net) - [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper) - [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
- [X] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9) - [x] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython) - [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11) - [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
- [X] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper) - [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
- [X] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity) - [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
## Examples ## Examples
@ -796,7 +802,7 @@ There are various examples of using the library for different projects in the [e
Some of the examples are even ported to run in the browser using WebAssembly. Check them out! Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
| Example | Web | Description | | Example | Web | Description |
| --- | --- | --- | | --------------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper | | [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
| [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine | | [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
| [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture | | [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |

View File

@ -41,7 +41,7 @@ make publish-npm
## Sample run ## Sample run
```java ```text
$ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js $ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
whisper_model_load: loading model from 'whisper.bin' whisper_model_load: loading model from 'whisper.bin'

View File

@ -4,7 +4,7 @@ This is a naive example of performing real-time inference on audio from your mic
The `stream` tool samples the audio every half a second and runs the transcription continously. The `stream` tool samples the audio every half a second and runs the transcription continously.
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10). More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
```java ```bash
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
``` ```
@ -14,7 +14,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
Setting the `--step` argument to `0` enables the sliding window mode: Setting the `--step` argument to `0` enables the sliding window mode:
```java ```bash
./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6 ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
``` ```

View File

@ -11,11 +11,11 @@ https://user-images.githubusercontent.com/1991296/204126266-ce4177c6-6eca-4bd9-b
## Usage ## Usage
```java ```bash
git clone https://github.com/ggerganov/whisper.cpp git clone https://github.com/ggerganov/whisper.cpp
open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/ open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/
// If you don't want to convert a Core ML model, you can skip this step by create dummy model # if you don't want to convert a Core ML model, you can skip this step by create dummy model
mkdir models/ggml-base.en-encoder.mlmodelc mkdir models/ggml-base.en-encoder.mlmodelc
``` ```

View File

@ -1,19 +1,16 @@
## Whisper model files in custom ggml format ## Whisper model files in custom `ggml` format
The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27) The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30)
are converted to custom `ggml` format in order to be able to load them in C/C++. are converted to custom `ggml` format in order to be able to load them in C/C++.
Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
You can either obtain the original models and generate the `ggml` files yourself using the conversion script, There are three ways to obtain `ggml` models:
or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
Currently, they are hosted on the following locations:
- https://huggingface.co/ggerganov/whisper.cpp ### 1. Use [download-ggml-model.sh](download-ggml-model.sh) to download pre-converted models
- https://ggml.ggerganov.com
Sample download: Example download:
```java ```text
$ ./download-ggml-model.sh base.en $ ./download-ggml-model.sh base.en
Downloading ggml model base.en ... Downloading ggml model base.en ...
models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s
@ -23,35 +20,46 @@ You can now use it like this:
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
``` ```
To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage. ### 2. Manually download pre-converted models
The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source: `ggml` models are available from the following locations:
```
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
- https://ggml.ggerganov.com
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
Download one of the [models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30) and generate the `ggml` files using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
Example conversion, assuming the original PyTorch files have been downloaded into `~/.cache/whisper`. Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
```bash
mkdir models/whisper-medium mkdir models/whisper-medium
python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
rmdir models/whisper-medium rmdir models/whisper-medium
``` ```
A third option to obtain the model files is to download them from Hugging Face:
https://huggingface.co/ggerganov/whisper.cpp/tree/main
## Available models ## Available models
| Model | Disk | SHA | | Model | Disk | SHA |
| --- | --- | --- | | ------------- | ------- | ------------------------------------------ |
| tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` | | tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
| tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` | | tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
| base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` | | base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
| base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` | | base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
| small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` | | small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
| small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` | | small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
| small.en-tdrz | 465 MiB | `b6c6e7e89af1a35c08e6de56b66ca6a02a2fdfa1` |
| medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` | | medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
| medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` | | medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
| large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` | | large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
| large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` | | large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
| large-v2-q5_0 | 1.1 GiB | `00e39f2196344e901b3a2bd5814807a769bd1630` |
| large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` | | large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
| large-v3-q5_0 | 1.1 GiB | `e6e2ed78495d403bef4b7cff42ef4aaadcfea8de` |
Models are multilingual unless the model name includes `.en`. Models ending in `-q5_0` are [quantized](../README.md#quantization). Models ending in `-tdrz` support local diarization (marking of speaker turns) using [tinydiarize](https://github.com/akashmjn/tinydiarize). More information about models is available [upstream (openai/whisper)](https://github.com/openai/whisper#available-models-and-languages). The list above is a subset of the models supported by the [download-ggml-model.sh](download-ggml-model.sh) script, but many more are available at https://huggingface.co/ggerganov/whisper.cpp/tree/main and elsewhere.
## Model files for testing purposes ## Model files for testing purposes

View File

@ -9,6 +9,9 @@
src="https://huggingface.co/ggerganov/whisper.cpp" src="https://huggingface.co/ggerganov/whisper.cpp"
pfx="resolve/main/ggml" pfx="resolve/main/ggml"
BOLD="\033[1m"
RESET='\033[0m'
# get the path of this script # get the path of this script
get_script_path() { get_script_path() {
if [ -x "$(command -v realpath)" ]; then if [ -x "$(command -v realpath)" ]; then
@ -22,17 +25,17 @@ get_script_path() {
models_path="${2:-$(get_script_path)}" models_path="${2:-$(get_script_path)}"
# Whisper models # Whisper models
models="tiny.en models="tiny
tiny tiny.en
tiny-q5_1 tiny-q5_1
tiny.en-q5_1 tiny.en-q5_1
base.en
base base
base.en
base-q5_1 base-q5_1
base.en-q5_1 base.en-q5_1
small
small.en small.en
small.en-tdrz small.en-tdrz
small
small-q5_1 small-q5_1
small.en-q5_1 small.en-q5_1
medium medium
@ -41,14 +44,21 @@ medium-q5_0
medium.en-q5_0 medium.en-q5_0
large-v1 large-v1
large-v2 large-v2
large-v2-q5_0
large-v3 large-v3
large-v3-q5_0" large-v3-q5_0"
# list available models # list available models
list_models() { list_models() {
printf "\n" printf "\n"
printf " Available models:" printf "Available models:"
model_class=""
for model in $models; do for model in $models; do
this_model_class="${model%%[.-]*}"
if [ "$this_model_class" != "$model_class" ]; then
printf "\n "
model_class=$this_model_class
fi
printf " %s" "$model" printf " %s" "$model"
done done
printf "\n\n" printf "\n\n"
@ -57,6 +67,8 @@ list_models() {
if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
printf "Usage: %s <model> [models_path]\n" "$0" printf "Usage: %s <model> [models_path]\n" "$0"
list_models list_models
printf "___________________________________________________________\n"
printf "${BOLD}.en${RESET} = english-only ${BOLD}-q5_[01]${RESET} = quantized ${BOLD}-tdrz${RESET} = tinydiarize\n"
exit 1 exit 1
fi fi
@ -98,14 +110,12 @@ else
exit 1 exit 1
fi fi
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
printf "Failed to download ggml model %s \n" "$model" printf "Failed to download ggml model %s \n" "$model"
printf "Please try again later or download the original Whisper model files and convert them yourself.\n" printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
exit 1 exit 1
fi fi
printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model" printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model"
printf "You can now use it like this:\n\n" printf "You can now use it like this:\n\n"
printf " $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model" printf " $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model"