Jump to content

Auto PGS to SRT Converter


K-O-K

Recommended Posts

Ronstang

My only complaint is Tesseract 5.2.0 recognizes italics but does not transfer them....and if you use the original engine it writes the italics but is not as accurate....but still not too bad.

Link to comment
Share on other sites

12 hours ago, Ronstang said:

When it says use OCR conversion doe that mean upon playing the movie the first time it automatically creates an SRT subtitle file and places it in the media folder?  I doubt it matters, I am sure this is a valiant effort but I have never seen any OCR that does not need some corrections. 

The upcoming subtitle OCR feature is working in real-time as part of the transcoding process. It provides significantly better results than SubtitleEdit.

Watch The Emby Show - Episode 2 for details:

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Ronstang

@softworkzThat is great and I am impressed.  The problem  for me is I don't do transcoding.  I've built my library to not need any transcoding and everything plays smooth with almost no noticeable hit to system resources since my servers are pulling double duty encoding new content most of the time.  If emby can perform the OCR why can't it dump it to an SRT file?  What would really be nice would be to be able to let emby do this during off hours if chosen without actually watching the movie.  If your OCR is that good then maybe we should beg you to put it in an app we can use on demand....LOL.  

It only takes me a few minutes a sub now to get great results with Subtitle Edit.  The more I use it the less intervention in needs as I add things to my libraries.  This is a great feature that will make many happy softworkz so thank you from the community.  I love emby and so do my kids so I can't live without it but I'll keep rolling my own here as I want physical SRT files with all my media.

Link to comment
Share on other sites

10 minutes ago, Ronstang said:

That is great and I am impressed.  The problem  for me is I don't do transcoding.  I've built my library to not need any transcoding and everything plays smooth with almost no noticeable hit to system resources

It doesn't require transcoding. It also works with remuxing, where video and audio streams are copied only. The subtitles are OCRed and WebVTT files are generated from these for streaming via HLS.

Edited by softworkz
Link to comment
Share on other sites

Ronstang
1 minute ago, softworkz said:

It doesn't require transcoding. It also works with remuxing, where video and audio streams are copied only. The subtitles are OCRed and WebVTT files are generated from these for streaming via HLS.

English please...LOL.  But I see what you are saying that this isn't video transcoding so it is much less system resource intensive so that's a big plus.  When will this be available in beta for testing?  I run a beta and a stable server so I would like to try this out and see.  Thanks again....nice job.

Link to comment
Share on other sites

Just now, Ronstang said:

When will this be available in beta for testing? 

It's ready for testing since Feb 22. Since fall 22 it has been included but hidden, then it was postponed for after the 4.8 release which was supposed to be imminent at that time, but it turned out otherwise.
Development is complete, all these features are included in Emby Server already - just not enabled.
I expect this to become available for testing shortly after 4.8.

  • Thanks 2
Link to comment
Share on other sites

1 hour ago, Ronstang said:

If emby can perform the OCR why can't it dump it to an SRT file? 

We could do that, but that's not what Emby is about. It's not a conversion tool - it's a media server where you can throw in all your media and you can access it always and from everywhere and play that media on whatever device you have, so that it always works without you needing to care about it.
SRT subs for example are very popular but many clients can't play them and the power of Emby is to make things play without that users need to care and plan in advance.
The decision of how to deliver a certain media item is made by Emby (with assistance of its clients), is made right in the moment when it's needed, taking into account a range of factors about the media, the server, the client, the user etc.

How Emby processes and provides a certain media item can be different each time and as such, pre-extracting often doesn't help much. If this feature would have been more like SubtitleEdit's OCR, where you need to go over all contents manually, work with dictionaries and make corrections, then it wouldn't have been suitable for Emby: Everything here needs to work reliably, on-demand and without needing user intervention.

1 hour ago, Ronstang said:

What would really be nice would be to be able to let emby do this during off hours if chosen without actually watching the movie. 

There's no benefit in doing so - it's rather a waste of resources. Emby can easily do this on demand when it's needed, and in the exact way how it needs to be.

But if you really feel you need to do so, you can use the ffmpeg included in Emby to perform such conversions.

Ah,..if you're curious for a demo, you can send me a file for conversion.

Edited by softworkz
  • Like 1
Link to comment
Share on other sites

Ronstang
3 minutes ago, softworkz said:

There's no benefit in doing so - it's rather a waste of resources. Emby can easily do this on demand when it's needed, and in the exact way how it needs to be.

That's fine, I trust you and as soon as it's available I'll test it to see how resource intensive it is because using OCR in a desktop app is rather a resource hog.  I'll keep making my own SRT files until it's available though as I am dumping all my Blu-Ray onto the server now and want them available now.  Thanks

  • Like 1
Link to comment
Share on other sites

rbjtech
1 hour ago, softworkz said:

But if you really feel you need to do so, you can use the ffmpeg included in Emby to perform such conversions.

Interesting - so if ffmpeg is doing the heavy lifting here with included emby libraries - is it going to produce debug logs like image-extract-series for example ?

If yes, then adopting the command line would be easy enough and batch conversion could be something easily put into an Emby Plugin...

I believe 'preference' is to use external SRT's first in subtitle usage - so dumping them would be beneficial for the next user/playback  .. ?

Edited by rbjtech
Link to comment
Share on other sites

33 minutes ago, rbjtech said:

is it going to produce debug logs like image-extract-series for example ?

Yes.

 

34 minutes ago, rbjtech said:

so dumping them would be beneficial for the next user/playback  .. ?

As mentioned, I don't think so. The new subtitle capabilities are so rich and versatile that there would be zero benefit in having pre-extracted subtitles. Also, in-stream subtitles are generally preferable as there are less synchronization issues with them, especially when doing hardware overlay.

 

35 minutes ago, rbjtech said:

I believe 'preference' is to use external SRT's first in subtitle usage

 

AFAIK, the selection goes by language only, not by type.

Also, SRT is not a great format for subtitles. The by far best (most capable, most features, most sophisticated) is ASS.
ffmpeg uses ASS internally for that reason. So does the OCR filter of course. Once you export to SRT, you'll loose a number of details.

 

  • Thanks 1
Link to comment
Share on other sites

crusher11

But SRT will play on almost anything while PGS/VOBSUB/ASS are pretty limited in terms of playback AFAIK.

Emby definitely prefers SRT though. I OCR all my subs to SRT while retaining the PGS track, and the SRT is always the default, which is great on my dad's Samsung but not so great on my Shield.

Link to comment
Share on other sites

5 minutes ago, crusher11 said:

But SRT will play on almost anything while PGS/VOBSUB/ASS are pretty limited in terms of playback AFAIK.

Yes, SRT is a much more simple than ASS. If you watch the Emby Show - Episode 1, you can see a crazy ASS example with animations done with ASS.

But back to subtitle extraction: The point is that you don't need to care about this anymore in the future because Emby can convert between all subtitle formats and also provides a number of manipulations of the subtitles text and style.
When you pre-extract to SRT, you are losing information unnecessarily. Maybe later you play it on a client with ASS support or you want/must do burn-in, then it will be much better when you hadn't converted it to SRT before.

  • Like 2
Link to comment
Share on other sites

MrMackey

I am super excited about it!

Finally, others can then watch movies on devices that do not support pgs subtitles without almost killing my nas ^^

Link to comment
Share on other sites

crusher11
2 hours ago, softworkz said:

When you pre-extract to SRT, you are losing information unnecessarily. Maybe later you play it on a client with ASS support or you want/must do burn-in, then it will be much better when you hadn't converted it to SRT before.

From what I understand, there is no client that supports ASS that doesn't also support PGS, so converting to ASS is a waste of time.

I retain all my original PGS subs, so the only thing I lose is the five minutes it takes to OCR them. Maximises compatibility with clients that are SRT-only, while retaining full PGS features for clients that support that.

  • Agree 1
Link to comment
Share on other sites

2 minutes ago, crusher11 said:

What are they?

Web browsers, tizen, lg, probably others that I'm forgetting.

Link to comment
Share on other sites

Ronstang
13 hours ago, softworkz said:

But back to subtitle extraction: The point is that you don't need to care about this anymore in the future because Emby can convert between all subtitle formats and also provides a number of manipulations of the subtitles text and style.

When it's available I will test it for sure, but something tells me this is still going to be resource intensive because I have never seen OCR that isn't and if so then pre-extracting to SRT is still preferable.  I don't care about subtitle features....I want TEXT.  I don't care what color they are or if they have animations.....I'm not a child.  I simply want the subs for my wife so she can learn better English and for me because sometimes it's hard to understand what the dialogue is and having the subs makes it easy.  I have perfect hearing but I still use CCs on everything.

I will test them and if you have pulled off the feat of transcoded OCRed subs with little system resources I will applaud you and do it your way. 👍

Link to comment
Share on other sites

16 hours ago, MrMackey said:

I am super excited about it!

Finally, others can then watch movies on devices that do not support pgs subtitles without almost killing my nas ^^

Yes, and for cases where subtitle burn-in can't be avoided there are also other improvements coming:

  • Subtitle burn-in can be done with hw acceleration
    When hw acceleration is used, It won't be required anymore to transfer every video frame to CPU memory, blend-over the bitmap subtitles and transfer the result back to GPU memory.
  • Without hw acceleration:
    •  Currently: all subtitle bitmaps for a specific moment in time are rendered to a full video frame, then that full frame is scaled to match the video frame size and finally, the full subtitle frame is blended over each video frame
    • New: the subtitle bitmaps are scaled individually and are then blended individually onto the video frames 
      (note: subtitle bitmaps are always only of the size of the text they contain, much smaller than the video frames)
  • Like 1
Link to comment
Share on other sites

15 hours ago, crusher11 said:

From what I understand, there is no client that supports ASS that doesn't also support PGS, so converting to ASS is a waste of time.

 

11 hours ago, Luke said:

That’s not true. There are several devices  that can play ass but not pgs.

Luke is right. There's absolutely no relation between ASS and PGS support.

Technically, these are two fundamentally different methods for subtitle delivery. Bitmap subs are made for weak devices and require low effort to overlay (at the player side), while ASS subs are technically the most challenging to render and require a huge amount of implementation and expertise to render

Link to comment
Share on other sites

4 hours ago, Ronstang said:

When it's available I will test it for sure, but something tells me this is still going to be resource intensive because I have never seen OCR that isn't and if so then pre-extracting to SRT is still preferable.

The opposite is true: Pre-Extracting is resource intensive and put your system under high load. But then OCR is done as part of transcoding or transmuxing, it will be an iterative procedure: it happens while you are watching only - assuming you have activated transcode throttling (also applies to transmuxing).

OCRing subtitles is very different to OCRing a document full of text. Within a minute of video, you have about 0-20 subtitle bitmaps, each containing just a few words. And subtitle bitmaps are always just as large as the text they are containing. So what happens is that at maximum 20 small subtitle bitmaps will needs to be OCRed - per minute.

And that is just about nothing! You won't ever see your CPU usage going high due to this.

Link to comment
Share on other sites

Ronstang
3 minutes ago, softworkz said:

You won't ever see your CPU usage going high due to this.

That is awesome.  I look forward to testing this out because it sure would save me a lot of time messing with BS I would rather not.  Why can't you turn this on in beta for us to test NOW???  I don't wan to have to go back and convert 500 pgs subs to SRT as I am currently doing them as I rip the Blu-Rays so it's easy and inconsequential.  It not that I don't trust you but I bought emby premiere 2 1/2 years ago because I was told the new LiveTV was a few months out so until I see it....it's just vaporware.

Link to comment
Share on other sites

The Bigger Picture

It is clear where you're all coming from, assuming you need to do all those kinds of preparing media, pre-processing, pre-extraction and the likes. 
And I'm not afraid to name the cause: This was due to shortcomings of Emby. Emby just didn't do its job properly and you had to take measures to work around.

But it shouldn't be like that. You should be able to just throw in the media you have and let Emby care about everything else.

I think with the new subtitle features, we're making a huge step towards that and we're closing a long-time gap that has been a source of trouble and pain for many Emby users. 🙂 

  • Like 1
Link to comment
Share on other sites

5 minutes ago, Ronstang said:

Why can't you turn this on in beta for us to test NOW???

The whole feature complex of new subtitle features requires about 2-3 months in beta - being enabled for all. There are a lot for interdependencies between the individual parts, that's why we can't just enable and test one of those parts individually.

On the technical side, all is ready to go for this, but it's not my decision when to go and start.

10 minutes ago, Ronstang said:

I was told the new LiveTV was a few months out so until I see it....it's just vaporware.

I can assure you that it's not. It's working and running on my machine and the next step is to go for a larger public beta test.

On the technical side, all is ready to go for this, but it's not my decision when to go and start.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...