vamsin07 posts vanilla Whisper fine-tune starter for FLEURS
TL;DR
- The MIT-licensed repo ships a training script, FLEURS evaluation, W&B sweep configs, and Kubernetes manifests for Nautilus deployment.
- Default training uses AdamW with a 0.3 encoder/decoder learning-rate ratio, a 150-step cosine warmup, and a maximum of 6 epochs.
- The base Whisper model reportedly fine-tunes on a local GPU in 30 to 60 minutes; the repo currently shows 0 stars, 0 forks, and 3 commits.
There is a small GitHub repo worth flagging even though it has zero stars: vamsin07/whisper-simple-finetune, a clean starter for vanilla fine-tuning of OpenAI's Whisper on the FLEURS multilingual speech-recognition benchmark. The interesting thing is not the code itself, it is the shape of the on-ramp. The repo ships a training script optimized for validation WER, an evaluation script against the FLEURS test set, W&B sweep configs, and Kubernetes manifests for Nautilus deployment.
The default recipe is documented rather than hidden. AdamW with separate encoder and decoder learning rates at a 0.3 default ratio, a cosine schedule with a 150-step warmup, gradient accumulation and checkpointing for memory, early stopping on validation WER, and a maximum of 6 epochs for larger runs. Inference is greedy decoding with `no_repeat_ngram_size=3` and `repetition_penalty=1.2` to keep the model from stuttering. Both base and large-v3 Whisper variants are supported, audio is resampled to 16kHz with language-specific tokenization, and the quickstart is a Spanish demo. The README's claim is that the base model fine-tunes on a local GPU in 30 to 60 minutes.
For a student or a small lab getting into speech, that is a lower barrier than assembling the pieces yourself. Whisper fine-tuning tutorials exist in scattered notebooks, but a repo that also handles the Nautilus containerization side, under an MIT license, is unusual, and code you can lift and adapt is more useful than a blog post.
The honest caveat is the numbers that are not in the README. There are no published FLEURS WER results for the default recipe, so you cannot tell yet whether these hyperparameters actually beat Whisper's zero-shot baseline on the languages you care about. Three commits, zero stars, no external eyes on the code, take this as a template to read and adapt rather than a validated recipe. What the reporting does not give you is the delta between the base and large-v3 defaults, or the disk footprint for a full FLEURS run on the larger model. For the target audience of laptop and campus-server users who want a working starting point rather than a leaderboard entry, this is the kind of small, generous piece of open-source work the ecosystem quietly needs more of.
Shared on Bluesky by 2 AI experts
-
Website: vamsin07.github.io/buzzasr-docs/ Tokenizers: github.com/vamsin07/mul... SFT models: github.com/vamsin07/whi...
View on Bluesky →
Originally reported by github.com
Read the original article →Original headline: GitHub - vamsin07/whisper-simple-finetune: Vanilla Whisper fine-tuning on FLEURS — clean starter repo with Nautilus deployment · GitHub