Jump to content

Audio Fringerprinting (Chromaprint) Segment detection


TeamB
 Share

Recommended Posts

rbjtech
11 minutes ago, TeamB said:

yes that is correct, still need to "detect" the intro but it would use a Chromaprint from an online library of known good theme audio chromaprints to do the search.

Thanks - with you now.  

This should also allow the detection to work on a brand new show/season 1 episode 1 - as we don't need to wait for something else to match it to - potentially up to 3 episodes in if ep1 did not have the normal intro.

An interesting option for sure - this has a lot more potential than trying to do this for every episode (which is what I was referring to earlier ..)

:)

Edited by rbjtech
Link to comment
Share on other sites

I was thinking about it, and one of the benefits to a provider, is that you'd know for sure that the fp clip of an episode intro would be exactly that. 

Currently, there has to be some other calculations to double check that the detection is actually the intro after you've run detection, but with a provided intro, it would be a clip of exactly what you want to detect.

So it would seem that you'd have a better chance of locating the intro. 

Link to comment
Share on other sites

8 minutes ago, chef said:

I was thinking about it, and one of the benefits to a provider, is that you'd know for sure that the fp clip of an episode intro would be exactly that. 

Currently, there has to be some other calculations to double check that the detection is actually the intro after you've run detection, but with a provided intro, it would be a clip of exactly what you want to detect.

So it would seem that you'd have a better chance of locating the intro. 

yes exactly, the down side is you would need to collect a bunch of the chromaprints of intro themes, I think with a good web app you could perhaps have community support to grow a library in time.

Link to comment
Share on other sites

crusher11

I'm confused as to the point of this thread. Isn't this all part of the existing Intro Skip thing anyway? What are we trying to achieve here?

Link to comment
Share on other sites

31 minutes ago, crusher11 said:

I'm confused as to the point of this thread. Isn't this all part of the existing Intro Skip thing anyway? What are we trying to achieve here?

This is just a discussion on the nuts and bolts of chromaprint audio segment detection, no point beyond that.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

I wanted to follow through with my original thought of having a curated source of Chomaprints for theme intros for Tv Shows on a service. So I put together a web site:

https://themeservice.azurewebsites.net/

And wrote a simple IntroDetection app to allow people to experiment with this idea. You can grab the tool here:

https://github.com/faush01/ThemeService/releases/tag/0.0.0.1

This is all written in c# .Net Core 6, both the web service and the Detection tool, all code is available here: https://github.com/faush01/ThemeService

The idea is "people" enthusiasts or people interested in helping extract the Chormaprint data from a TV Series and upload just the Chromaprint data to the above site. At the moment it is a free for all, just create an account and you can submit Chromaprint data.

The IntroDetection tool then uses the above Web Service to get Chromaprint data for Series you have locally and tries to "find" the theme intro in a simple sweep search.

If a theme is found it outputs a json file with the details of the match. Eventually this can be written back to your data source (in this case Emby) as chapters using a plugin I am writing.

Why you might ask is this better than the original idea of comparing episodes in a season? well it is not better it is just different, it requires a curated source of Chromaprint into data that is maintained my the community but in the end it should give better detection results as the search data is crafted for the best outcome. In theory this works but it does require community input to build the Theme Service data so who knows if there is interest in that.

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

sydlexius
18 hours ago, TeamB said:

I wanted to follow through with my original thought of having a curated source of Chomaprints for theme intros for Tv Shows on a service. So I put together a web site:

https://themeservice.azurewebsites.net/

And wrote a simple IntroDetection app to allow people to experiment with this idea. You can grab the tool here:

https://github.com/faush01/ThemeService/releases/tag/0.0.0.1

This is all written in c# .Net Core 6, both the web service and the Detection tool, all code is available here: https://github.com/faush01/ThemeService

The idea is "people" enthusiasts or people interested in helping extract the Chormaprint data from a TV Series and upload just the Chromaprint data to the above site. At the moment it is a free for all, just create an account and you can submit Chromaprint data.

The IntroDetection tool then uses the above Web Service to get Chromaprint data for Series you have locally and tries to "find" the theme intro in a simple sweep search.

If a theme is found it outputs a json file with the details of the match. Eventually this can be written back to your data source (in this case Emby) as chapters using a plugin I am writing.

Why you might ask is this better than the original idea of comparing episodes in a season? well it is not better it is just different, it requires a curated source of Chromaprint into data that is maintained my the community but in the end it should give better detection results as the search data is crafted for the best outcome. In theory this works but it does require community input to build the Theme Service data so who knows if there is interest in that.

What about a partial file hash (say, of the first 512kb) to attempt a match before running chromaprint?  Will that be a separate method?

Link to comment
Share on other sites

What would be the aim of the partial hash?

Link to comment
Share on other sites

sydlexius
Just now, TeamB said:

What would be the aim of the partial hash?

So if the end-goal is to have a service similar to Trakt or SponsorBlock/SponsorBlockServer where users can sync metadata (such as, say, recap, intro, interstitials, credits, and post-credits), then it may be useful to match those times based upon the actual file hash (or more reasonably, a partial hash).  If there is no match for the hash, then the it can proceed with a chromaprint to match up the intro data.  Perhaps there can be a UI or tools for creating additional markers to sync.  Then, for a given episode that may be in multiple formats, you can probably get away with certain attributes (intro, credits, etc) merely having an time offset from the authoritative value?

Link to comment
Share on other sites

Posted (edited)
10 minutes ago, sydlexius said:

So if the end-goal is to have a service similar to Trakt or SponsorBlock/SponsorBlockServer where users can sync metadata (such as, say, recap, intro, interstitials, credits, and post-credits), then it may be useful to match those times based upon the actual file hash (or more reasonably, a partial hash).  If there is no match for the hash, then the it can proceed with a chromaprint to match up the intro data.  Perhaps there can be a UI or tools for creating additional markers to sync.  Then, for a given episode that may be in multiple formats, you can probably get away with certain attributes (intro, credits, etc) merely having an time offset from the authoritative value?

I do not think exact file byte for byte matching (even just first 512 KB) is ever going to work, each rip or release will produce a slightly different file, yes if you happen get have an exact copy of a file it will match but if your version was re-encoded or even just re-muxed it will have a slightly different byte structure and the hashes will not match.

If you can think or have an example of a info service that uses file hashing as an example that would be something to look at but from what I know there is no service that does this for the above reasons.

The current approach in most services uses either a name lookup or for things like images it uses IDs from services like imbd theTvDb TheMovieDv etc. That is what the above Theme service uses, it uses the IMDB id to get the Theme Chromaprint and them tries to find that Chromaprint in the episodes. If the theme is different for each season as it sometimes is it can match on the IMDB and Season number to find the best Theme Chromaprint to search for in the episode.

 

Edited by TeamB
Link to comment
Share on other sites

rbjtech
Posted (edited)

@sydlexius - I believe you may have come to the same assumption I made earlier on that this is doing a per episode lookup - it's not - from my understanding, it's simply replacing the search/detect with a KNOWN Intro fingerprint - rather than try and deduce it.

So the existing process in the current Plugin (and Core) is -

1. Create FP for every episode

2. Scan every episode for repeat FP patterns - this is identifiable as 'the intro' - and also creates the time marker at the same time.  This is a very iterative process - ep1 compares to ep2-9, ep2 compares to ep1,3-9 etc etc.

 

If I'm understanding correctly @TeamB's idea is to eliminate the need for step 1 - and replace it with a lookup of that Intro based on it's TVDB FP lookup.  At this point - you KNOW the Intro FP (because that doesn't/shouldn't change - just the FP, no time/marker data at all) but what you don't know is WHERE that intro exists per epsiode.  So you DO still need to run step 2 to identify this but only 1 compare per epsiode.

 

However, I'm not convinced this is going to be any quicker than the current method - as while creating the FP's does take a good portion of the overall time - this is because it has to read the first 10-15 mins of data per episode to create the FP - if I understand correctly, this STILL needs to be done using this method - but this time, you are just comparing the detected FP against a known FP - but you still need to read the file to do this  and this is the 'slow' bit.  The actual detection is just maths - and whether you compare lets say 10 episodes (~90 checks) against each other or you compare one episode to the known intro FP - it's not going to be significantly faster in the overall timeline.

The advantage it DOES have, is you can use it to detect early in a brand new show - ie you don;t need to wait until Episode 1, 2 or even 3 before you can detect the Intro - assuming there is an Intro in Ep1,you should be able to detect it vs the known Intro.  A 2nd advantage is on shows with a lot of episodes - taking anime as an example - if it has 100 episodes in a single season, then the number of comparisons to each other episode gets large very quickly - all using potentially limited   memory and cpu.

It's an interesting proof of concept for sure :)

 

Edited by rbjtech
Link to comment
Share on other sites

21 minutes ago, rbjtech said:

However, I'm not convinced this is going to be any quicker than the current method - as while creating the FP's does take a good portion of the overall time - this is because it has to read the first 10-15 mins of data per episode to create the FP - if I understand correctly, this STILL needs to be done using this method - but this time, you are just comparing the detected FP against a known FP - but you still need to read the file to do this  and this is the 'slow' bit.  The actual detection is just maths - and whether you compare lets say 10 episodes (~90 checks) against each other or you compare one episode to the known intro FP - it's not going to be significantly faster in the overall timeline.

yes correct, the slowest part in both cases is extracting the first x minutes of audio chromaprint from the episodes to do the actual search against.

I agree this may not be a lot quicker, I feel it should be more accurate though with a good source of curated theme intro chromaprints correctly identified and available searching should be less hit am miss.

Link to comment
Share on other sites

rbjtech
4 minutes ago, TeamB said:

I agree this may not be a lot quicker, I feel it should be more accurate though with a good source of curated theme intro chromaprints correctly identified and available searching should be less hit am miss.

Possibly 🤪  We've had a lot of experience on detecting Intro's over the last year or so - and we can say for sure that while some Intro's are cut and dry - easy to detect - many vary from episode to episode (as the content fade out/fade in is overlaid for example) and some are just a downright mystery on why they are not detected - haha.

Link to comment
Share on other sites

Posted (edited)

@rbjtech

Do you have some examples of difficult ones?

Edited by TeamB
Link to comment
Share on other sites

sydlexius
On 5/11/2022 at 2:09 AM, rbjtech said:

@sydlexius - I believe you may have come to the same assumption I made earlier on that this is doing a per episode lookup - it's not - from my understanding, it's simply replacing the search/detect with a KNOWN Intro fingerprint - rather than try and deduce it.

So the existing process in the current Plugin (and Core) is -

1. Create FP for every episode

2. Scan every episode for repeat FP patterns - this is identifiable as 'the intro' - and also creates the time marker at the same time.  This is a very iterative process - ep1 compares to ep2-9, ep2 compares to ep1,3-9 etc etc.

I should clarify how I intend for this skip database should be read from.  Calculating a partial hash of the file, then querying for that checksum should be relatively quick.  If it comes up with nothing, then running a Chromaprint should be necessary.  Clearly the early state of the database will have users contributing new data at a high rate, but as such a database reaches steady state, it would make sense for the client side of things to spend less computational time on gathering the curated data.

Link to comment
Share on other sites

1 hour ago, sydlexius said:

Calculating a partial hash of the file, then querying for that checksum should be relatively quick.

Have you experimented with this yet? It would be interested to look at this, what algorithm (hashing function) and how much would you extract from the file (how much, where from).

I think Sha256 or MD5 just the first 512 bytes of the file is going to be to variable to match variants with diffident file formats (mkv, mp4 etc) different encoding (the video stream was re-encoded) or even same file being re-muxed with extra header info say adding chapters to an mkv file.

There might be some location aware hashing algorithms like LsHash (https://en.wikipedia.org/wiki/Locality-sensitive_hashing) or CTPH (https://blog.nettitude.com/uk/context-triggered-piecewise-hashing-to-detect-malware-similarity) that can possible be used to identify similar files based on header extraction. That would also be interesting to investigate. But beyond the scope of this project.

However the main point is audio chromaprints as a detection approach can and do work.

1 hour ago, sydlexius said:

Clearly the early state of the database will have users contributing new data at a high rate

not based on my experience, getting people to even agree its a useful data source looks like it is going to be an up hill battle.

Link to comment
Share on other sites

2 hours ago, TeamB said:

not based on my experience, getting people to even agree its a useful data source looks like it is going to be an up hill battle.

Yeah, collecting data like this is basically collecting knowledge of every media item in someone's library - and its source potentially.  We wouldn't be looking to put anything like this into the core and I'd have to think about whether or not we'd want it in the plug-in catalog.

Link to comment
Share on other sites

1 minute ago, ebr said:

Yeah, collecting data like this is basically collecting knowledge of every media item in someone's library - and its source potentially.  We wouldn't be looking to put anything like this into the core and I'd have to think about whether or not we'd want it in the plug-in catalog.

no, you have missed the point of the data source. This data source is just like a poster or background image data source for your media.

The data is just like an image, lets say the Poster image of a TV Show. The audio chromaprint data is a summary of an audio segment, it is extracted just once, one person does this work, some contributor to the system, they extract the theme intro chromaprint once and upload it to the service, the chromaprint is small, about 3 Kb in length and it can identify a segment of audio with the correct processing on a clients machine.

At no point does any clients machine data leave the client machine. The theme chromaprint is downloaded from the theme service and the search is done on the client.

Link to comment
Share on other sites

18 minutes ago, ebr said:

We wouldn't be looking to put anything like this into the core and I'd have to think about whether or not we'd want it in the plug-in catalog.

I must be doing a very bad job of explaining what this system is then. You already have a bunch of systems that pull data in from various data sources, Images, Subtitles, Metadata info fror\m all sorts of data sources. This is just another data source for info about media files. In this case it is a fingerprint of the theme audio. In fact it is even better than all the other sources of data (safer) as there is no actual copy righted images or logos or anything, it is a fingerprint representing a audio segment, it is not the audio segment, think of it like a description of how the audio sounds (based on freq distribution over time)

Audio finger print approaches are nothing new, think of picard the song identifier.

  • Agree 1
Link to comment
Share on other sites

rbjtech
Posted (edited)
3 hours ago, TeamB said:

I must be doing a very bad job of explaining what this system is then. You already have a bunch of systems that pull data in from various data sources, Images, Subtitles, Metadata info fror\m all sorts of data sources. This is just another data source for info about media files. In this case it is a fingerprint of the theme audio. In fact it is even better than all the other sources of data (safer) as there is no actual copy righted images or logos or anything, it is a fingerprint representing a audio segment, it is not the audio segment, think of it like a description of how the audio sounds (based on freq distribution over time)

Audio finger print approaches are nothing new, think of picard the song identifier.

It may help to say there is only potentially ONE fingerprint that can be used for the entire show if the intro never changes.

Taking an example - The Big Bang Theory, it used the same Intro throughout all 12 seasons - therefore the downloaded FP (metadata) for that show would be, as you say a binary Chromaprint of just the Intro (24 seconds of it) - but it would be linked to TVDBId=80379 for EVERY Episode.  It contains no other data (such as timings) and other than linking to TVDBId 80379, it has nothing in it that can reconstructed as the Big Bang Theory 'Intro' theme - it's just a bunch of Chromaprint FP Data - you could not reconstruct the Big Bang Intro from it even if you wanted to.

Upon downloading it once (as Metadata) - you THEN create your own FP of each episode segment (10 minutes worth - as Introskip Plugin and Core do now) but this time you COMPARE it against your known good Big Bang Intro FP metadata - there is no need to compare Episode to Episode.  Once you get a hit - THEN the timing for your local Episode is calculated and that becomes your Intro data for that Episode.  No data is uploaded, the same way as no metadata is uploaded today.

Using the show Poster and Season Posters is a good analogy - you may just have one FP for the Show, or if the Intro changes per Season, you may have one per Season - but unless the show has a different Intro per Episode (rare), then this should be all you need.

I hope I got that right - I don't want to confuse matters further ! 🤪 

edit - to add, it would also potentially save a significant amount of local disk space for the .bin (FP) files - For my 25K episodes, I have 50K of .bin files at about 12Gb on disk (Introskip Plugin has one from intro and a second for the Credits, Core just has an Intro FP bin file) - only the show and season level FP's would be needed using this new method (once downloaded), so a fraction of this total. 

Edited by rbjtech
  • Agree 1
Link to comment
Share on other sites

Posted (edited)
59 minutes ago, rbjtech said:

I hope I got that right - I don't want to confuse matters further !

yes that is correct 👍

And yes some TV SHows only need one Theme item, Start Trek Discovery looks like it has only one for all 4 seasons but Start Trek Picard so far have a new theme for each season. You can see I had to add 2 theme chromaprint items for Picard that have season ids and for Discovery I was able to just add one with no season.

https://themeservice.azurewebsites.net/

 

 

Edited by TeamB
Link to comment
Share on other sites

rbjtech

btw - I'm just testing the Chroma database you setup - it's querying the dB API as we speak and running through Star Trek .. nice :)   I'll let you know the results ...

Link to comment
Share on other sites

another benefit of using a central DB like this is you can specify the amount of time into the episodes you need to extract to do the match against, in the case of Start Trek Discovery it needed 20 min 😲 as some of the intro segments were 17:30 min in and these new Star Trek shows have a 1:30 ish into which is also crazy. So every Series and season can have this extraction time set with its chromaprint.

Link to comment
Share on other sites

4 minutes ago, rbjtech said:

btw - I'm just testing the Chroma database you setup - it's querying the dB API as we speak and running through Star Trek .. nice :)   I'll let you know the results ...

The current implementation is a little slow as it is not doing any of the work in parallel, you can speed this up by running multiple extractions and detection at the same time up to the limit where you saturate your disk IO and CPU.

Link to comment
Share on other sites

rbjtech
Just now, TeamB said:

another benefit of using a central DB like this is you can specify the amount of time into the episodes you need to extract to do the match against, in the case of Start Trek Discovery it needed 20 min 😲 as some of the intro segments were 17:30 min in and these new Star Trek shows have a 1:30 ish into which is also crazy. So every Series and season can have this extraction time set with its chromaprint.

Excellent point - that in itself will save a lot of time for Intro's that start @ 0:00 for example - I see the 20 mins in the ffmpeg command ;) - it finished now - I'm just comparing to the Plugin results ..  brb

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...