Jump to content

OpenAI - Whisper to generate missing subtitles for videos


Recommended Posts

Posted

Hi, Emby's subtitle feature has come a long way since I started using MB3 and I think the next step is now possible. 

The OpenAI group has released 'Whisper' a python (there is also a cpp implementation available) module that can be run locally to generate subtitles in 99 different languages (although english is the most efficient from a resource perspective currently) for any file containing audio. This is a OSS model that can be implemented (maybe as a plug in) across many OSes and would allow the generation of subtitles for things that currently lack them. Without hardware support it can't run in real time but as a scheduled task could be run during slow periods for a server.

https://arstechnica.com/information-technology/2022/09/new-ai-model-from-openai-automatically-recognizes-speech-and-translates-to-english/ for a overview

https://github.com/openai/whisper for the Github of the project.

  • Like 4
  • Agree 2
Posted
3 hours ago, Baenwort said:

Hi, Emby's subtitle feature has come a long way since I started using MB3 and I think the next step is now possible. 

The OpenAI group has released 'Whisper' a python (there is also a cpp implementation available) module that can be run locally to generate subtitles in 99 different languages (although english is the most efficient from a resource perspective currently) for any file containing audio. This is a OSS model that can be implemented (maybe as a plug in) across many OSes and would allow the generation of subtitles for things that currently lack them. Without hardware support it can't run in real time but as a scheduled task could be run during slow periods for a server.

https://arstechnica.com/information-technology/2022/09/new-ai-model-from-openai-automatically-recognizes-speech-and-translates-to-english/ for a overview

https://github.com/openai/whisper for the Github of the project.

I have followed OpenAI's projects, many of which are mind blowing. I'm not sure about this one. Meanwhile it is well-known what can be achieved by training transformer models with gigantic sets of data. In that case, the Google speech recognition and audio transcription abilities are more impressive as these are doing it in real-time with minimal resources.

Someone has extracted that functionality to make it work standalone, but it's a kind of hack of course. Still interesting.

What would be kind of a holy grail in that area would be to:

  • Recognize speech
  • Recognize speakers
  • Create subtitles from recognized speech
  • Translate subtitles to another language
  • Filter out the original voices from the audio
  • Use text-to-speech to let the actors speak in a different language

A while ago, I had made an experiment regarding the latter point:

https://user-images.githubusercontent.com/4985349/136570224-88a65ced-bb98-49fa-bcd9-1e766f90af26.mp4
https://gist.github.com/softworkz/3425fc196f5c7eac9e842a655c7e1e5c

It's the same language and this version mixes the original and the TTS voices for comparison.

 

Though, all of these things are experimental - only the Google ASR would be production ready, but it's not free to use (it works in the browser, but you are not even able to copy the transcribed text in any way).

And generally: features that do not work on all platforms, require excessive hardware, require installation of frameworks like Pytorch, require massive download of data, require manual installation and intervention and eventually cannot even work in realtime as part of Emby's media delivery pipelines - are not a great match for integration into the Emby core server (or Emby's ffmpeg) as it wouldn't reach the masses with all those preconditions.

It might be a nice idea for an Emby plugin, though. 

Also I'm sure that at some time there will be approaches for audio transcription that are more handy and universal to integrate.

  • Like 2
  • 1 month later...
Posted

So there is a C++ version that is working on enhancing subtitle generation to be real time: https://github.com/ggerganov/whisper.cpp However, even before realtime usage is feasible for everywhere (which since transcoding isn't and won't ever be I don't think this should be a limit) it would be a nice to have for offline scanning and building of .srt subs for media that no online database match is available. Emby allows best fit for downloaded subs, even though that results in issues when non-knowledgeable people use it as it can improve things and this could help in a similar way when there isn't even a close fit result online.

  • Like 1
  • 2 weeks later...
Posted (edited)

They have gotten whisper.cpp to produce a transcription that also highlights the current word being spoken for a Karaoke mode that would be REALLY nice for Emby's music function with lyrics. 

https://github.com/ggerganov/whisper.cpp#karaoke-style-movie-generation-experimental

This would allow something similar to the sing along mode on discs or even what Amazon Music does with lyrics! 

Edited by Baenwort
  • 10 months later...
Posted

@softworkzyou wanted a project for a large dataset? How is this?  ;-)

  • Haha 1
  • 1 year later...
Przemek
Posted

Hello, any news with implementation that whisper to Emby? It will be great to have possibility to generate subtitles from movies/series.
I have also bazarr on my server and there's whisper provider. I try and install whisper-asr-webservice conteiner but complately failed to make it work. I get info that it cannot generate polish subtitles from english or other languages audio. 
Regards.

Posted

I don't have any news but it's definitely a neat idea. Someone did start playing around with it recently: 

 

GrimReaper
Posted

Also:

 

Przemek
Posted

Great something is doing. I hope there will be some plugin with more languages support soon.

Regards.

Neminem
Posted (edited)

The issue with Bazarr and whisper is that the language model only does 

Any language to English, and NOT any language to any language.

I use Bazarr and whisper to create English subs, if I can't find them online.

And the use Lingarr to create different language subs.

GitHub - lingarr-translate/lingarr: Lingarr is an application that supports both local and SaaS translation services to translate subtitle files into a user-specified target language. With automated translation options, Lingarr simplifies the subtitle translation process. 

I don't think SubtitleCreator will help you to create pl subs, because it also uses whisper.

Edited by Neminem
  • Like 1
Przemek
Posted

Thank You @Neminem. I don't know about lingarr. Meybe I could try to use whisper to generate English subs and then with lingarr make polish subs.

Regards

neik
Posted

@Neminem, thank you for the input.
For how long have you been doing that and how good does that work from your experience?

Neminem
Posted (edited)

Well, I have been using Lingarr for about 2 months, i works great.

Before that I manually translated via Bazarr, Lingarr just does it automatic as I have set it up.

 

If you can handle the translator's inability to make decisions about words, use in some cases.

Its fare from perfect, but good enough to be useful for my wife that's half def to enjoy.

Example:

Sinkhole is translated into vaskehul.

Translating that back to English its wash hole.

She chuckels when she reads some of the translations 😂🤣

 

Edited by Neminem
Adding
Przemek
Posted

Hi, I try Lingarr today on unraid but there was now app in CA store so I use alternative install from dockerhub. I think because of that I have issues. I don't see radarr movies and I don't see any subtitles on series.

image.thumb.png.ff53653cba9f1fc9b7119e536684d0ff.png

Neminem
Posted

Ohh yeah you need Radarr and Sonarr to make this work.

19 hours ago, Przemek said:

I have also bazarr on my server and there's whisper provider.

But how did you get Bazarr to work without Radarr and Sonarr, as Bazarr also requires that ?

Neminem
Posted

By the way arr in the name gives you some hints, right.

Przemek
Posted
30 minutes ago, Neminem said:

Ohh yeah you need Radarr and Sonarr to make this work.

But how did you get Bazarr to work without Radarr and Sonarr, as Bazarr also requires that ?

I have sonarr and radarr also. I try whisper but it don't make polish subs. I'm in contact with developer maybe he will help me to solve my issues. 

Can You tell me what translation provider is the best to choose? I install also Libre Translate but as I remember when I help to translate some app in weblate best was Google or Microsoft.

BillOatman
Posted

SubtitleEdit will also translate subtitles.

Przemek
Posted

I finally make Lingarr work. I can generate polish subs now for every missing movies/series from my library.

Regards.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...