So I recently discovered that you can now autogenerate pretty good subtitles for films offline at home using open source software. This was really exciting for me as I watch a lot of old or obscure films for which subtitles aren't available either on the DVD or online, and I often watch while on a noisy exercise machine so I really need subtitles for intelligibility. OpenAI (of ChatGPT fame) makes their Whisper speech to text engine available open source, and there's now open source tooling to let you use it at home offline. I'm using whisper.cpp:
https://github.com/ggerganov/whisper.cpp. It's much easier to install on Linux than windows, but there are installation instructions for windows too. Once you've got it setup you just feed it appropriately formatted audio and it spits out subtitles. You can even tell it to output in .srt format!
I've watched about six or seven films with these auto subtitles at this point. The quality is surprisingly good, certainly not perfect, but I'd say better than 95% correct. When it has errors it's mostly of the "misheard words" variety, although it will occasionally get "stuck" on a line of dialog or musical cue and keep repeating for a few seconds, but you can mitigate that with settings. The only real problem I haven't solved yet is that sometimes subtitles precede their dialog by a few seconds, but they stay on the screen until the dialog actually happens so there's no confusion.
Once I got everything setup, it's basically just a little three-line script to generate subtitles for a film and drop them in the film directory where JRiver can pick them up. It's like magic!
Here's my (quite crude) script, I pass it the path to the film as a parameter. The ffmpeg line is because whisper.cpp needs a specifically formatted wav file as an input, and the parameters passed to whisper.cpp tell it to both output in srt format and also significantly reduce the issues with it repeating (by reducing the context window, which slightly hurts accuracy, but dramatically improves the sometime repetition issue):
#!/bin/bash
ffmpeg -i "$1" -ar 16000 -ac 2 -c:a pcm_s16le /tmp/audio.wav
/path/to/whisper.cpp/main -m /path/to/downloaded/whisper/model.bin -t 1 --max-context 8 -et 2.8 -osrt -f /tmp/audio.wav
cp /tmp/audio.wav.srt "$1".srt
I hope someone else might find this as exciting as I did. Note that it will go much faster if you have a beefy video card and build whisper.cpp with appropriate acceleration options.
Also, I'm not sure I posted this in the right forum, but I figured the script was Linux-centric so I dropped it here. Feel free to move the post somewhere else if it's in the wrong spot!