Jump to content

LiveTV translation / on demand subtitles Plugin using Whisper


Recommended Posts

Posted (edited)

So I've spent a lot of the weekend using OpenAI's open source Whisper. This is a live transcription/translation tool.

I was also watching some foreign language live TV and my friend couldn't understand, and I thought to myself, well why shouldn't there be live subtiltles/closed captioning.

I whipped together a proof of concept python script that:

1) Grabs the video feed from Emby for restreaming and generates a URL like this
'http://ip/emby/Videos/video/stream?Static=True&StartTimeTicks=12279100000&api_key=api

2) feed that url into ffmpeg, which takes 5 second chunks of audio

3) Feeds them  into whisper which spits out raw text for that 5 second chunk (translation usually taking .5-.8 seconds)

4) Feed that back into ffmpeg to overlay the translation as subtitles, achieving live transcription/translation of whatever is playing. I am using the "draw" feature to draw the subtitles on, as ffmpeg does not take a dynamically generated file as input for subtitles.

I'm still working on the timing of how long to display the subtitles, but it works well!

The proof of concept is I have to consume the feed from VLC.

So, the LLM and performance are easily attainable. One could adjust the preceicsion of the translation using a smaller and faster LLM, however on my 3090 even the Large model is performant enough to get live transcription and translation. Haven't tested this on my 1070 yet. There's no reason this couldn't be used for just generating your own subtitles as you need on the fly.

 

Now I've started brainstorming an actual plugin that achieves this. Before I delve into it I'm trying to forsee some of the issues.

To keep the plugin self contained we'd need:

1) Whisper and a CUDA capable GPU available to Emby

2) The plugin would have to basically remux on the fly, which isn't that big of an issue for live TV, but quality may suffier on HDR / large files, haven't test it there yet.

I'm not sure I have the knowledge to implement a full plugin. Any other considerations anyone can think of ?

Screenshot 2024-12-23 165721.png

Edited by bobo99
  • Like 1
Posted

Hi, it's definitely an interesting idea but would require a bit of infrastructure first. Subtitles are served in several different ways depending on the format, capabilities of the player, etc. Sometimes it runs through ffmpeg, sometimes not. So there would have to be hooks in place at all of those possible outputs in order to do something like this.

Posted
1 hour ago, Luke said:

Hi, it's definitely an interesting idea but would require a bit of infrastructure first. Subtitles are served in several different ways depending on the format, capabilities of the player, etc. Sometimes it runs through ffmpeg, sometimes not. So there would have to be hooks in place at all of those possible outputs in order to do something like this.

So I've been working past few days on a proof of concept and have an emby plugin working. I could use your guidance on some things.

I tried very hard to meet the requirement of having a self contained DLL for this, but that proved infeasible (impossible?). The whisper.net library spits out unmanaged assemblies, while emby is expecting managed .net assemblies, so it's not easy to import the whisper library. We'd need lots of low level access for cuda etc.

Some questions

1) Will the devs allow a plugin to be submitted that has a dependency on having software running elsewhere?

I.E in this scenario, having the whisper translation server running in a docker container, that the plugin calls out to. Given the structure of plugin development requirements, this is the only way to make this work using Whisper in it's current state. This would only be as far as entering the IP address of your docker.

 

2) Can you point in the right direction of how to expose a new UI element to figure out what I want to be translated? For example, currently my plugin subscribes to
        _sessionManager.PlaybackStart += OnPlaybackStart;
        _sessionManager.PlaybackStopped += OnPlaybackStopped;

To hook into any video playing, and will start translation. Using the plugin how can I create a new UI element, a check box, or similar that I can click "I want this media item to translate", as opposed to now which has every video playing being translated.

My current working proof of concept, is to hook into any "onplaybackstart", and use ffmpeg to create 5 second chunks of video. I then send for translation each 5 second chunk, receive back the srt with timing, and then use another ffmpeg stream to draw on directly the subtitles, and create a new playlist.m3u8.

My intention is to then force the playing session that kicked all this off to go play the newly created playlist.m3u8 with the translated subs. Is this feasible? Haven't figured out that part yet.

If you've got any other guidance on how to make this smoother and more consumable for the ecosystem I'm taking recommendations!

Cheers

Posted
Quote

To hook into any video playing, and will start translation. Using the plugin how can I create a new UI element, a check box, or similar that I can click "I want this media item to translate"

Hi,  you can't. This capability would have to be built.

Quote

as opposed to now which has every video playing being translated.

Curious, how are you accomplishing the translation and actually getting the server to utilize it?

Quote

1) Will the devs allow a plugin to be submitted that has a dependency on having software running elsewhere?

Yea I think the MediaInfo plugin does that, so that's fine. You just can't have a situation where the server is crashing without those dependencies in place because then we'll be the ones getting the brunt of the troubleshooting.

Posted
Quote

My current working proof of concept, is to hook into any "onplaybackstart", and use ffmpeg to create 5 second chunks of video. I then send for translation each 5 second chunk, receive back the srt with timing, and then use another ffmpeg stream to draw on directly the subtitles, and create a new playlist.m3u8.

So what happens if the app is direct playing without and ffmpeg involvement?

Posted
26 minutes ago, Luke said:

Hi,  you can't. This capability would have to be built.

Curious, how are you accomplishing the translation and actually getting the server to utilize it?

Yea I think the MediaInfo plugin does that, so that's fine. You just can't have a situation where the server is crashing without those dependencies in place because then we'll be the ones getting the brunt of the troubleshooting.

When you say built you mean, I'd have to modify the CSS/html/js and add the functionality there ? Would I be able to make that portable in the plug-in?

 

So my plugin has a check mark in the configuration, "do you want to translate ?" This is a flag which will translate any video, as there no way that I can see on how to control whether I translate a specific video.

I subscribe to _sessionManager.PlaybackStart += OnPlaybackStart; which when a new video starts, kicks off this sequence :

1) I create a new temp encoding folder in temp-encoding

2) I pre-generate a list of empty srt files for output0000.ts to outputxxxx.ts. this is important, they are empty but ffmpeg needs them to be there

3) Kick off a new ffmpeg command to create 5 second chunks of the media item in question, generating outputxxxx.ts every 5 seconds. This ffmpeg command generates an m3u8 file on the fly, referencing subtitles, the empty ones that are already there.

4) I have an async function watching that directory for new output.ts files and sends them via http posts for translation. Within .5 to .8 Seconds we get back our srt file with with timings and translated subs. The new srt files with translation over write the existing files on disk.

I wait 10 seconds for 2 chunks of 5 second video to be processed, building ourselves a buffer before we start playing
 

I then intend (haven't written this bit of code yet )to send a command to that session to restart playing the newly created m3u8 playlist from step 3. If we need to transcoded I'll let emby just retranscode it for whatever the client needs.

If we're direct playing , it doesn't matter, I have to get to the audio stream somehow so I kick off the same process regardless.

I think this works fine for smaller files but may get a bit expensive with larger files.

It is a bit janky, but I guess it's not janky if it works ?

I guess I could check if emby intends to transcoded it anyways and hook into that process so I'm not doing twice.

If you see any improvements to what I'm doing I'm all ears !

 

 

 

Posted

OK so you haven't actually gotten your translated subtitles to be rendered by an Emby app?

Posted (edited)
12 hours ago, Luke said:

OK so you haven't actually gotten your translated subtitles to be rendered by an Emby app?

The blank srt approach failed, so now I am using ffmpeg to "draw on" the subtitles in the temp-encoding folder.

I have only figured out how to add the temp-transcoding directory to a library, and then force a media playback 1 by 1 of my .ts segments with the subtitiles baked in.

That is I add Emby-Server/temp-transcoding/<guid>/translated.tsfiles to the library, and then force a playback of that folder, and each .ts segment comes in, but has really hard transitions between segments.

I can't figure out how to programatically force Emby to play my m3u(8) links to play the .ts segments in a smooth manner.

Thoughts on how I can achieve that ? The further I get into this, the more I'm thinking this would require functionality changes in how emby can play media ? I was hoping to just force a session to play my m3u8 file from disk, and stream in the .ts files I'm creating dynamically, but this does not seem to be possible. Media needs to be in the library first?

 

 

Edited by bobo99
Posted

Also is my understanding correct, I can't hook into Emby's transcoding ecosystem, but I have to manually write my own routines for transcoding?

I.e if a video is already being transcoded, I can't just use that existing transcoding/job folder to piggy back and generate the subtitles for that existing stream ?

Posted
Quote

Also is my understanding correct, I can't hook into Emby's transcoding ecosystem,

Right this is what I was alluding to earlier. There is currently no way to do this barring some pretty extreme (and likely fragile) hacking. We would have to build the infrastructure needed for you to hook into subtitle delivery.

Posted
42 minutes ago, Luke said:

Right this is what I was alluding to earlier. There is currently no way to do this barring some pretty extreme (and likely fragile) hacking. We would have to build the infrastructure needed for you to hook into subtitle delivery.

This is all stuff I discovered after trying. I now know of all the ways it can't work, so I have one more idea...

Can I use the plugin to create my own API endpoint.

That endpoint will become an HLS end point, so I can use my manual ffmpeg routine to post data to one endpoint, and then consume the feed.

I'd dynamically add an m3u8 tv tuner which would point to that new end point. I could call it then tear it down after. You think that's feasible ?

Posted (edited)
On 02/01/2025 at 13:14, Luke said:

Right this is what I was alluding to earlier. There is currently no way to do this barring some pretty extreme (and likely fragile) hacking. We would have to build the infrastructure needed for you to hook into subtitle delivery.

So I've made progress in getting this into a plugin. I have 2 things outstanding preventing me from getting it over the finish line.

How it works:
I hook into "onPlaybackstart" to get the intent of the user wanting to play something.

I stop their stream and spin up my own ffmpeg that outputs files in a way I need.

I do the translation and generate my own m3u8/subtitle files on the fly.

I then add the m3u8 file to the library, and call it to be played.

After playback stops I tear everything down and delete it.

It all works super well!

The help I require is from a dev ideally is:

1) I had the plugin reliably calling ffmpeg to do transcoding for me in a specific way. "Something" happened on my dev box, and now ffmpeg that is being called by emby cannot write anywhere. Not into C:\Programdata\Emby-Server, or C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp. I have reinstalled Emby, but have no idea what has changed in my environment to cause this. Currently I can no longer get ffmpeg from emby to write anywhere. However the plugin that is calling ffmpeg, can create the directories that it requires just fine.

I am calling ffmpeg like this:

var Arguments_to_pass = $"-i \"{inputFilePath}\" -c:v copy -c:a copy -f segment -segment_time 5 -segment_format mpegts -flush_packets 1 -segment_list \"{m3u8FilePath}\" -segment_list_type m3u8 \"{outputFilePath}\"";

logger.Info("FFmpeg arguments: " + string.Join(" ", Arguments_to_pass));

// Configure FFmpeg process            

// Start FFmpeg pocess using IFfmpegManager
var ffmpegRunner = _ffmpegManager.CreateFfMpegRunner(playSessionId, transcodingDirectory);
ffmpegRunner.Start(Arguments_to_pass);

 

But have also had it running like this : (and had it working before it broke)
 

 var processStartInfo = new ProcessStartInfo
 {
     FileName = "ffmpeg",
     Arguments = $"-i \"{inputFilePath}\" -c:v copy -c:a copy -f segment -segment_time 5 -segment_format mpegts -flush_packets 1 -segment_list \"{m3u8FilePath}\" -segment_list_type m3u8 \"{outputFilePath}\"",
    //Arguments = $"-i \"{inputFilePath}\" -force_key_frames \"expr:gte(t,n_forced*5)\" -c:v libx264 -preset fast -crf 23 -c:a aac -b:a 128k -f segment -segment_time 5 -segment_format mpegts -flush_packets 1 -g 150 -segment_list \"{m3u8FilePath}\" -segment_list_type m3u8 \"{outputFilePath}\"",
     RedirectStandardOutput = true,
     RedirectStandardError = true,
     UseShellExecute = false,
     CreateNoWindow = true
 };

 var process = new Process { StartInfo = processStartInfo };
 process.OutputDataReceived += (sender, args) => HandleFfmpegOutput(args.Data, process, playSessionId);
 process.ErrorDataReceived += (sender, args) => HandleFfmpegOutput(args.Data, process, playSessionId);

 process.Start();

 

2025-01-05 20:58:17.619 Info Live-Video-Translator: M3U8 path: C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8\playlist.m3u8
2025-01-05 20:58:17.619 Info Live-Video-Translator: Output path: C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8\output%03d_original.ts
2025-01-05 20:58:17.619 Info Live-Video-Translator: FFmpeg arguments: -i "C:\Users\<user>\Downloads\Season 1\no-subs.mp4" -c:v copy -c:a copy -f segment -segment_time 5 -segment_format mpegts -flush_packets 1 -segment_list "C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8\playlist.m3u8" -segment_list_type m3u8 "C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8\output%03d_original.ts"
2025-01-05 20:58:17.621 Error FfmpegManager: ProcessRun 'e96b94fe2045490e932d9d6aa5d2b1a8': Error starting Ffmpeg. WorkingFolder: C:\Users\<user>\AppData\Roaming\Emby-Server\system
        *** Error Report ***
        Version: 4.8.10.0
        Command line: C:\Users\<user>\AppData\Roaming\Emby-Server\system\EmbyServer.dll -noautorunwebapp
        Operating system: Microsoft Windows 10.0.22631
        Framework: .NET 6.0.33
        OS/Process: x64/x64
        Runtime: C:/Users/<user>/AppData/Roaming/Emby-Server/system/System.Private.CoreLib.dll
        Processor count: 16
        Data path: C:\Users\<user>\AppData\Roaming\Emby-Server\programdata
        Application path: C:\Users\<user>\AppData\Roaming\Emby-Server\system
        System.UnauthorizedAccessException: System.UnauthorizedAccessException: Access to the path 'C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8' is denied.
           at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
           at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
           at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
           at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
           at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
           at System.IO.FileStream..ctor(String path, FileStreamOptions options)
           at Emby.Server.Implementations.IO.ManagedFileSystem.GetFileStream(String path, FileOpenMode mode, FileAccessMode access, FileShareMode share, Int32 bufferSize, FileOpenOptions fileOpenOptions, Int64 preAllocationSize)
           at Emby.Server.Implementations.IO.ManagedFileSystem.GetFileStream(String path, FileOpenMode mode, FileAccessMode access, FileShareMode share, FileOpenOptions fileOpenOptions)
           at Emby.ProcessRun.Extensions.ProcessLogWriter..ctor(IFileSystem fileSystem, String logFilePath, Boolean writeStandardError, Boolean writeStandardOutput)
           at Emby.ProcessRun.Runners.ProcessRunnerCommon.OnInitialize()
           at Emby.ProcessRun.Runners.ProcessRunnerExtensible.OnBeforeStartProcessCore(StartParams processStartInfo)
           at Emby.ProcessRun.Runners.ProcessRunnerBase.Run(StartParams startParams)
           at Emby.ProcessRun.Runners.ProcessRunnerBase.Run(String exeFileName, String commandLineArgs, String workingDirectory)
           at Emby.Server.MediaEncoding.Unified.Ffmpeg.FfRunnerBase.Start(String commandLineArgs)
        Source: System.Private.CoreLib
        TargetSite: Microsoft.Win32.SafeHandles.SafeFileHandle CreateFile(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, System.IO.FileOptions)

2025-01-05 20:58:17.621 Error Live-Video-Translator: Error starting FFmpeg transcoding for session e96b94fe2045490e932d9d6aa5d2b1a8: Access to the path 'C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8' is denied.
2025-01-05 20:58:17.621 Error Live-Video-Translator: Exception in OnPlaybackStart(): System.UnauthorizedAccessException: Access to the path 'C:\Users\<user>\AppData\Roaming\Emby-Server\programdata\transcoding-temp\e96b94fe2045490e932d9d6aa5d2b1a8' is denied.
   at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
   at System.IO.FileStream..ctor(String path, FileStreamOptions options)
   at Emby.Server.Implementations.IO.ManagedFileSystem.GetFileStream(String path, FileOpenMode mode, FileAccessMode access, FileShareMode share, Int32 bufferSize, FileOpenOptions fileOpenOptions, Int64 preAllocationSize)
   at Emby.Server.Implementations.IO.ManagedFileSystem.GetFileStream(String path, FileOpenMode mode, FileAccessMode access, FileShareMode share, FileOpenOptions fileOpenOptions)
   at Emby.ProcessRun.Extensions.ProcessLogWriter..ctor(IFileSystem fileSystem, String logFilePath, Boolean writeStandardError, Boolean writeStandardOutput)
   at Emby.ProcessRun.Runners.ProcessRunnerCommon.OnInitialize()
   at Emby.ProcessRun.Runners.ProcessRunnerExtensible.OnBeforeStartProcessCore(StartParams processStartInfo)
   at Emby.ProcessRun.Runners.ProcessRunnerBase.Run(StartParams startParams)
   at Emby.ProcessRun.Runners.ProcessRunnerBase.Run(String exeFileName, String commandLineArgs, String workingDirectory)
   at Emby.Server.MediaEncoding.Unified.Ffmpeg.FfRunnerBase.Start(String commandLineArgs)
   at EmbyPluginSimpleUI.Plugin.StartFfmpegTranscoding(String inputFilePath, String playSessionId)
   at EmbyPluginSimpleUI.Plugin.OnPlaybackStart(Object sender, PlaybackProgressEventArgs e)

 

 

 

The 2nd thing I would require help with is, for my plugin to play the m3u8 file I generate I need it to be in the library. So I am adding my temp-transcoding directory to the library so that I can programatically play it. (unless there is a way to play a media item on the server, that's not in the library)
I am trying to add it like this, the library adding works, but my directory is not added to the library.

        const string LIBRARY_NAME = "LiveVideoTemporaryLibrary";

        logger.Info($"No existing '{LIBRARY_NAME}' found. Creating a new one.");

        // Create a new library
        var libraryOptions = new LibraryOptions
        {
            EnablePhotos = true,
            ContentType = "mixedcontent"
        };

        _libraryManager.AddVirtualFolder(
            LIBRARY_NAME,
            directoryPath,
            libraryOptions,
            true);

 

Edited by bobo99
Additional info
Posted
Quote

So I am adding my temp-transcoding directory to the library so that I can programatically play it.

Yikes. Well, I did say this whole thing would likely require some pretty extreme hacking until the necessary infrastructure is built to do it properly.

I would definitely make sure the library options are configured with the real-time monitor disabled, and I would explicitly disable as many other things as you can. Even then, I would still expect some quirks to be causing by having a library pointing to that folder.

  • Thanks 1
  • 3 weeks later...
  • 3 months later...
Posted

@bobo99in the end, in your plugin, how did you manage to play the stream with subtitles? Did you use an HLS master m3u8 referring to an m3u8 referring to ts chunks and a subtitle m3u8 referring to vtt chunks?

I am trying to do something similar to your plugin but using the openAI Whisper API (for those of us with weak machines :)) and I am struggling with finding the right format for the stream with subtitles. If you have a sample of files you could be sending, I'd appreciate it! Thanks!

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...