Jump to content

End Credits Detection for resume/next episode


crusher11

Recommended Posts

52 minutes ago, crusher11 said:

And again, we can ignore detection entirely if we allow people to just manually flag things.

I'm afraid that is targeting such a small audience of users that it isn't worth the time to design, develop, maintain and keep debugging.  This applies to any feature as we need to be applying resources to things that benefit the majority of users.

  • Agree 2
Link to comment
Share on other sites

crusher11
1 hour ago, ebr said:

I'm afraid that is targeting such a small audience of users that it isn't worth the time to design, develop, maintain and keep debugging.  This applies to any feature as we need to be applying resources to things that benefit the majority of users.

If you're already doing it for TV, how much extra work could it be to extend the it to movies? 

Link to comment
Share on other sites

rbjtech
11 minutes ago, crusher11 said:

If you're already doing it for TV, how much extra work could it be to extend the it to movies? 

Reliably detecting end-credits in both TV shows and Films is a totally different set of challenges vs detecting repeating audio passages in repeating TV episodes. 

If you Google Chromaprint you can then learn how it's done for TV Intro's - then apply that knowledge onto detecting End Credits and you will see where your logic falls down.. ;)

The Plugin has had some success merging Chromaprint and Black Frame Detection for end-credits (the Core have not ,to my knowledge, committed to even look at this yet) - but it is extremely cpu and disk intensive - far more than is acceptable for the average user.    It's effectiveness is mixed - on some shows it is millisecond perfect - on others it misses the credits entirely - so it still has a long way to go before being considered production worthy.

Edited by rbjtech
Link to comment
Share on other sites

crusher11

But I'm not talking about detecting them, I'm talking about recognising them and offering an editor. 

  • Like 1
Link to comment
Share on other sites

Cheesegeezer

Crusher, this is not something that would be developed specifically for you.... this is the whole point of requesting feature requests.

I'm well aware you would happily spend 6 weeks adding in these end points.  But as a developer, the idea of software development and plugins is to provide an intelligent solution to a hugely experienced dream or issue.  Workarounds simply aren't worth development time.  So with one click of a button, you can forget about it and move on.

If i was to release this as you want.... my inbox would fill up with so many feature requests bugs and issues, because it needs constant attention.

I will test my OCR program on a few movies and see what the stats are like for time, CPU, Memory and network resources.... 

But you are missing the point!!!  TV shows can be detected on sound comparisons (it fails on those that have different end credit music) because there is a chance for several or all to be the same.  Black screen detection is also added into the mix, which is a fairly reliable fallback.  OCR is probably the only real 100% reliable detection on TV shows and also would work for Movies.

 

 

 

 

  • Agree 2
Link to comment
Share on other sites

13 hours ago, Cheesegeezer said:

OCR is probably the only real 100% reliable

What if the movie actually contains a scene with textual content (a sign or document)?

Link to comment
Share on other sites

Cheesegeezer
12 minutes ago, ebr said:

What if the movie actually contains a scene with textual content (a sign or document)?

I have a whitelist of partial words to look for so it ignores anything out of the list.  Again for this Translations will be key to pick them up with all language varieties out there - but it works pretty good and even on scrolling text with video still playing in the background

string[] hitWords = new string[]
                                        {
                                            "direct", "produce", "manager", "written", "cast", "appearance", "star"
                                        };

                                        
                                        foreach (var hitword in hitWords)
                                        {
                                            if (imageText.Contains(hitword))
                                            {
                                                foundIt = true;
                                                ImageInfo.Add(new ImageInfo
                                                {
                                                    TimeStamp = imageTimeStamp,
                                                    ImagePath = imagePath,
                                                    DetectedText = imageText
                                                });

                                                timeStampLbl.Text = imageTimeStamp;
                                                timeStampLbl.Refresh();
                                                txtOutput.Text = imageText;
                                                txtOutput.Refresh();
                                                thumbBox.Image = pix;
                                                thumbBox.Refresh();
                                                this.imgCount++;
                                            break;
                                            }
                                        } 

 

Edited by Cheesegeezer
Link to comment
Share on other sites

7 minutes ago, Cheesegeezer said:

A couple of thoughts:

1) Was that really the very first credit?  Many movies will have any number of things it seems and, yes, in any number of languages which will explode the list of things you are looking for

2) That looked like it took nearly 10 seconds, is that accurate?

Link to comment
Share on other sites

Cheesegeezer
12 minutes ago, ebr said:

A couple of thoughts:

1) Was that really the very first credit?  Many movies will have any number of things it seems and, yes, in any number of languages which will explode the list of things you are looking for

2) That looked like it took nearly 10 seconds, is that accurate?

Yes it was the very first Credit, it took 15 secs actually to complete the whole process.

it started from 5 mins from the end of a TV show.  Scanned thru to find Blackscreen and then starts to check each frame, if black frame ISN'T detected it extracts the images at 1s intervals and then conducts the OCR - Which as you can imagine will take considerably longer

Hopefully the process makes sense to you.  But you could probably improve on this as your coding skills are well beyond mine.

 

  var stopWatch = new Stopwatch();
                stopWatch.Start();

                RunBlackDetection(videoPath); //<------ FFmpeg Black detection method
                ExtractThumbnailImages(videoPath, SaveFolder); //<-- FFmpeg Extraction Method Call -- SaveFolder is a field up near top
                System.Threading.Thread.Sleep(2000); //need to wait a couple of seconds for the images to save to the folder (if I did this in memory it would be way quicker)
                RunOCRProcess(); //<------ Start OCR detection from the found black frame start time.  If not just use the default and run until a hit frame is found.
                ShowResults(); //Populate the results so we can use them later.

                stopWatch.Stop();
                TimeSpan ts = TimeSpan.FromMilliseconds(Convert.ToDouble(stopWatch.ElapsedMilliseconds));

 

Edited by Cheesegeezer
Link to comment
Share on other sites

crusher11

This is my point about chapter markers, if you start there it massively reduces the looking around that's required. 

Link to comment
Share on other sites

Cheesegeezer
5 minutes ago, crusher11 said:

This is my point about chapter markers, if you start there it massively reduces the looking around that's required. 

Sorry mate, that makes no sense to me. 

Link to comment
Share on other sites

crusher11
15 minutes ago, Cheesegeezer said:

Sorry mate, that makes no sense to me. 

Instead of having to search backwards looking for a black frame to use as the start point, you can use the last (or second-last) chapter marker as the start point. 

Link to comment
Share on other sites

31 minutes ago, crusher11 said:

Instead of having to search backwards looking for a black frame to use as the start point, you can use the last (or second-last) chapter marker as the start point. 

That will be wrong at least as much as its right and will probably be beyond the black frame.  Please just trust us when we tell you that this is not a viable approach.

Link to comment
Share on other sites

1 hour ago, Cheesegeezer said:

it took 15 secs actually to complete the whole process

That would probably need to be improved by multiple orders of magnitude to become viable...

  • Agree 1
Link to comment
Share on other sites

Cheesegeezer
1 minute ago, ebr said:

That would probably need to be improved by multiple orders of magnitude to become viable...

Agree 100%, But like i said before.... you have the knowledge and experience to probably do this 😉.  I ran it on a Movie.  With the Search point 15mins from the end... and boy.... it took 2mins 3secs to find it.

I think for Movies it would certainly come as a plugin.... with a lot of disclaimers hahahaha

Link to comment
Share on other sites

crusher11
8 minutes ago, ebr said:

That will be wrong at least as much as its right and will probably be beyond the black frame.  Please just trust us when we tell you that this is not a viable approach.

I'm not saying "just set it to the chapter point", I'm saying "replace the black frame search (which is intensive) with jumping to the chapter point (which is not)". Then you do the OCR thing or whatever else you want to check.

Link to comment
Share on other sites

8 minutes ago, crusher11 said:

I'm not saying "just set it to the chapter point", I'm saying "replace the black frame search (which is intensive) with jumping to the chapter point (which is not)". Then you do the OCR thing or whatever else you want to check.

We know exactly what you are saying.  Thanks.

Link to comment
Share on other sites

  • 3 months later...
snagytx81

HI team,

I'm loving the "Skip Intro" feature. I'm using 4.7.6.0 and it works perfectly. Can this be enhanced to skip credits as well? Yes, there is a "Start Next", but that doesn't seem to be a smart one - to scan your library and detect when the credits start. Or am I missing something? In a lot of cases the credits start and there is no "Start Next", or the show is not over yet but the "Start Next" is already trying to move to the next show in the series.

Just my 2 cents, not sure how complex this would be.

Netflix and Prime gave us a taste of these features and now we expect them to be in every media server 🤣🤣🤣 🤣 🤣

 

Link to comment
Share on other sites

On 6/3/2022 at 3:06 PM, Cheesegeezer said:

es it was the very first Credit, it took 15 secs actually to complete the whole process.

it started from 5 mins from the end of a TV show.  Scanned thru to find Blackscreen and then starts to check each frame, if black frame ISN'T detected it extracts the images at 1s intervals and then conducts the OCR - Which as you can imagine will take considerably longer

Here are a few thoughts regarding the procedure and processing:

I would make this work on top of thumbnail extraction and not even touch the video files. It would make thumbnail extraction a requirement for the end-credits detection, but I think that's quite acceptable for users who want that feature.
This will tremendously reduce processing time and required CPU resources to an almost negligible amount. There's no seeking, decoding, scaling, color conversion required. You can simply work on the jpg images in the bif files.

Then, I would probably not use OCR for this. For OCR, you need to know exactly which language the titles will be and you need different language model data for each language. You could jump on an upcoming Emby feature  in this area for the models, but I don't think OCR is a good approach at all for this. It also wouldn't work well on the thumbnails.

For the actual detection, I would combine multiple image analysis approaches:

Advanced Dark Background Detection

The challenge here is to distinguish between a normal dark scene and a black frame with credits. The difference is that in the latter case you will have almost the "same black" color in many parts of the image while in normal scenes there will be many different shades of black. 
You can find the difference by calculating a histogram from the image which makes it easy to determine the spread of black tones.

Detecting Typical Titles

End titles can come in many different ways of appearance. That means that you also need multiple approaches of detection. It's best to start with the most simple one which provides a high level of confidence. Such case would be white/grey text on a solid black background.
For this case, you can again use the image histogram to see whether there are two high and narrow peaks (black and white).

Advanced Text Detection

To detect the presence of text in an image, we can make use of an other detail which is specific to text in a video: Text is made of letters which are forming contiguous areas of a single color, whereas filmed images (background) are formed by color pixels all of different colors. One way to recognize such potential text areas is to use a convolutional algorithm which can detect clusters of the same color. If this results in possible candidates, you can filter the image so only those colors remain and then you can do some line-by-line analysis looking for those colors and check whether this would match the typical distributional characteristics of text.

Finally

Variations of the above and other methods will allow you to compute a combined confidence level and the successful methods for each thumbnail.
The final evaluation would be done by looking at the results for the processed thumbnails (5min from end at 10s intervals makes 30 images), especially with respect to the contiguity and going backwards, starting from the last. For example, when most of the last 10 images are positive (and of similar detection kind) and all earlier ones are negative, you have a conclusive result.

I'm not sure how the results are meant to be used. But when it's about showing some "skip end titles" message in a client, then you would have a maximum timing offset error of plus/minus 5 seconds, which is probably ok IMO.

PS: What has been said above is correct though: no method will work reliably in all cases

  • Like 1
Link to comment
Share on other sites

40 minutes ago, samuelqwe said:

Ah, interesting article. I'm sure there's much more to it than they are writing...

The one important difference between them and a local Emby server is that they got almost infinite computing resources available.

  • Agree 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...