How to use Whisper to get high-quality accurate subtitles on any video in four easy steps

Whisper is the latest deep-learning speech recognition technology. The largest Whisper models work amazingly in all major languages, better than most human-written subtitles you'll find on Netflix (which often don't match the audio), and better than YouTube's auto-subtitles too. Unlike YouTube's auto-subtitles, it also includes full correct punctuation. I've tried it and been astonished by it.

But how to actually use it? Follow these steps:

  1. Download the audio of the video that you want to watch. For YouTube, you can use a site like https://www.keepvid.to/ and then scroll down to "Audio Only" and click "download" (watch out for ads, if it pops up an ad close the tab and try again). If it's another service like Netflix, you can use FlixGrab+. For FlixGrab+, before you download a video, click the gear icon, make sure you're downloading the lowest-quality video track and check only the audio track you want (select stereo audio, highest bitrate). It will save in your Videos folder.
  2. Replicate uses Github login so create a Github account at https://github.com/join if you don't have one. Then sign in here https://replicate.com/signin?next=/m1guelpf/whisper-subtitles and visit this Replicate page https://replicate.com/m1guelpf/whisper-subtitles and under audio_path upload your audio/video file. For model_name pick "large". For format, choose "srt". Click Submit. It may take a little while to start and to run (time taken should be about the same as the runtime of the show). When it's done, copy the contents of the "subtitles" text field and save them to a temp.srt file using Notepad or whatever. (Note: Replicate gives limited free credits and you will have to buy more if you continue using it after that, but they are very cheap. You can also set up the large Whisper model on your local system, it can run on a GPU with 10 GB of VRAM, but that's more complicated, see this guide. Use the command line whisper tool and pass --model large).
  3. Run that srt file through https://subtitletools.com/srt-cleaner tool to add the missing newlines. Uncheck all the boxes then click "Clean". Download the result.
  4. Install the NekoCap Chrome or Firefox extension. Go to the video you want to watch. On YouTube, use the NekoCap bar underneath the video title; on Netflix, click the NekoCap cat icon in the play bar. Choose "Editor → Load → Load from file" and choose your cleaned up .srt file. Click Load. NekoCap supports YouTube, Netflix, and some other services (unfortunately not Disney+/Hulu/HBO/Apple TV at this time). On Netflix, I recommend using it in combination with the Language Reactor plugin so you can pause the video without popping up the playback bar and obscuring the NekoCap subtitles.
  5. Alternatively, if NekoCap does not support the site you want to watch, you can download the video again at higher quality with FlixGrab+, then use https://animebook.github.io/ to watch the video file together with the generated .srt subtitle file locally.

That's it! You will now be watching your video with the most accurate subtitles that current technology has to offer. Enjoy!

For a sample of the quality, here is the Whisper-generated subtitle file from the French dub of the first episode of Bojack Horseman from Netflix: https://pastebin.com/7PMyg4CZ

I watched the episode with it and although there are still errors and omitted words here and there, and some subtitles are misaligned on timestamps, the difference between this and the Netflix subtitles is like night and day in terms of accuracy! I recommend keeping both the Whisper subtitles and the human subtitles on at same time since they tend to make mistakes in different places and you can decipher the words better if you have both of them at hand.

I'm expecting and hoping that someone will streamline this process into a simple combined end-to-end tool, and eventually even make it possible to stream the audio through Whisper in real-time while watching the video. But for now this is the simplest method I could find.

submitted by /u/ChiaraStellata
[link] [comments]

from Language Learning https://ift.tt/7jWwtfh
via Learn Online English Speaking

Comments