tests : add script to benchmark whisper.cpp on LibriSpeech corpus#2999
tests : add script to benchmark whisper.cpp on LibriSpeech corpus#2999ggerganov merged 3 commits intoggml-org:masterfrom
Conversation
LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
| @@ -0,0 +1,25 @@ | |||
| Code in this directory is adapted from OpenAI Whisper project | |||
| (https://github.com/openai/whisper) and carries the following | |||
| copyright and license. | |||
There was a problem hiding this comment.
As I mentioned in LICENSE, the normalizer implementation in the
tests/normalizer/ subfolder was ported from the upstream.
-
We need this to get a comparable WER score. See this notebook
about how OpenAI evaluate their speech recognition models. -
The reason why I commited these files to this reposjitory is to minimize the
dependencies we need to run the benchmark script.
pip install openai-whisper requires a full PyTorch libraries, so it's heavy.
| WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt | ||
| ``` | ||
|
|
||
| Check out `eval.mk` for more details. |
There was a problem hiding this comment.
This README file describes how to perform the benchmark tests.
Confirmed to work on Ubuntu 24.04 and Amazon Linux 2023.
Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
ggerganov
left a comment
There was a problem hiding this comment.
This is really great!
There are 2 things that we can improve on:
-
The dataset seems to contain only relatively short speech segments. I think it would be good to have a dataset with a bit longer samples (i.e. a few minutes) in order to exercise the rolling window transcription that Whisper does
-
The current implementation loads and unloads the entire model for each sample. This is very inefficient. Instead, it should utilize the
whisper-serverto start it once and send all the samples via HTTP request. This will make the benchmark much faster.
For now we can merge and improve on these later.
|
@ggerganov @danbev Thank you! I'm glad that it helps this project. |
…ml-org#2999) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
* ggerganov/master: (25 commits) examples : add HEAPU8 to exported runtime methods (ggml-org#3062) ruby : make Ruby bindings installed with build options (ggml-org#3056) whisper : add no_context parameter to whisper_params (ggml-org#3045) examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038) ruby: use CMake in build process (ggml-org#3043) docs : update README.md to note newer nvidia gpus (ggml-org#3031) addon.node : support max_context api for addon.node (ggml-org#3025) whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028) docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029) docs : fix README.md (ggml-org#3024) xcf : use check for visionos build version (ggml-org#3021) ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022) ruby : Update uri.rb (ggml-org#3016) models : fix dead link to models in readme (ggml-org#3006) ruby : change homepage URI in Ruby gemspec (ggml-org#3007) tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999) whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002) rename : ggerganov -> ggml-org (ggml-org#3005) examples : update server.py to match github pages app [no ci] (ggml-org#3004) whisper.wasm : fix unknown language issue (ggml-org#3000) ...
* ggerganov/master: (25 commits) examples : add HEAPU8 to exported runtime methods (ggml-org#3062) ruby : make Ruby bindings installed with build options (ggml-org#3056) whisper : add no_context parameter to whisper_params (ggml-org#3045) examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038) ruby: use CMake in build process (ggml-org#3043) docs : update README.md to note newer nvidia gpus (ggml-org#3031) addon.node : support max_context api for addon.node (ggml-org#3025) whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028) docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029) docs : fix README.md (ggml-org#3024) xcf : use check for visionos build version (ggml-org#3021) ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022) ruby : Update uri.rb (ggml-org#3016) models : fix dead link to models in readme (ggml-org#3006) ruby : change homepage URI in Ruby gemspec (ggml-org#3007) tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999) whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002) rename : ggerganov -> ggml-org (ggml-org#3005) examples : update server.py to match github pages app [no ci] (ggml-org#3004) whisper.wasm : fix unknown language issue (ggml-org#3000) ...
LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.
This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.