Jump to content

Audio Fringerprinting (Chromaprint) Segment detection


TeamB
 Share

Recommended Posts

rbjtech
1 minute ago, TeamB said:

The current implementation is a little slow as it is not doing any of the work in parallel, you can speed this up by running multiple extractions and detection at the same time up to the limit where you saturate your disk IO and CPU.

Yep - been there done that with the Plugin - that uses configurable parallel ffmpeg workstreams.  That in itself is an interesting algorithm of I/O vs ffmpeg processing.

Link to comment
Share on other sites

rbjtech
Posted (edited)

It's working ok - :) - it didn't pick up Picard at all (S1,S2), but it did pickup ST Discovery (excl S4 - which may have a different Intro, I'll check in a sec..)

I compared a few episodes from the JSOn's - the Plugin detection time is in Brackets - IntroStart looks correct, but IntroEnd may be a little short ?

s1e1 - 6:04 (exact) 7:30 (7:34)
s1e2 - 4:05 (4:06) 5:31 (5:34)
...
s3e12 -6:25 (6:26) 7:51 (7:54)

Edited by rbjtech
Link to comment
Share on other sites

2 minutes ago, rbjtech said:

It's working ok - :) - it didn't pick up Picard at all (S1,S2), but it did pickup ST Discovery (excl S4 - which may have a different Intro, I'll check in a sec..)

I compared a few episodes from the JSOn's - the Plugin detection time is in Brackets - IntroStart looks correct, but IntroEnd may be a little short ?

s1e1 - 6:04 (exact) 7:30 (7:34)
s1e2 - 4:05 (4:06) 5:31 (5:34)
...
s3e12 -6:25 (6:26) 7:51 (7:54)

interesting, it detects them all on my system, it would be interesting to see the output, just pipe the output to a txt file and PM it to me if you get a chance.

IntroDetection.exe > detection.log

Link to comment
Share on other sites

rbjtech
Posted (edited)
28 minutes ago, TeamB said:

interesting, it detects them all on my system, it would be interesting to see the output, just pipe the output to a txt file and PM it to me if you get a chance.

IntroDetection.exe > detection.log

Think I found the issue - it crashed due to the filename of the s3e12 having multiple dots in it confusing the file separator in the .mkv - just gonna rename it and see if it then continues .. brb

edit - it still crashes at s3e12, but I think that is because s3e13 does not exist - the ...mkv in the filename screwed up my import script lol.  Just gonna re-rip that episode and I'll brb.. 

Edited by rbjtech
Link to comment
Share on other sites

31 minutes ago, rbjtech said:

Think I found the issue - it crashed

Yeah as a POC it has minimal error checking and recovery, kind of the way I like to build POCs as you get to see where the brittle breaky bits are without hiding away errors in try/catch

If you have the stack trace I can see what it is having issues on.

Link to comment
Share on other sites

rbjtech
14 minutes ago, TeamB said:

Yeah as a POC it has minimal error checking and recovery, kind of the way I like to build POCs as you get to see where the brittle breaky bits are without hiding away errors in try/catch

If you have the stack trace I can see what it is having issues on.

 

Unhandled exception. System.ArgumentNullException: Value cannot be null. (Parameter 'fileName')
   at System.IO.FileInfo..ctor(String , String , String , Boolean )
   at IntroDetection.ActionExtract.ExtractChromaprintData(String input_file, TimeSpan ts_start, TimeSpan ts_end)
   at IntroDetection.ActionProcess.ProcessEpisode(EmbyItem item, ThemeInfo ti)
   at IntroDetection.ActionProcess.ProcessEpisodes()
   at Program.<Main>$(String[] args)
   at Program.<Main>(String[] args)

It's sorted now - if the episode does not exist, then it crashes when it gets to that episode.

(The episode didn't exist in Emby as a result of my poor handling of the suspect filename in my import script - nothing to do with your code ;))

It's now picking up S4 and Picard too - when it's done I'll ping you the output via PM anyway.

Thanks !

Link to comment
Share on other sites

rbjtech

@chef @Cheesegeezer - If you haven't done so already, I'd urge you to have a play with this when you get time.

Having a central API of the FP Data gives you the ability to SHARE any type of audio sequence FP and then potentially tag it.  In addition to show and season level Intro's - this 'could' easily be used for things like episode level 'Recaps/Previously on' segments - as they will have a unique FP - and will just be 'found' in the users file as if it were an Intro. 

There could also be opportunities to share output from the Introskip Plugin - and if it's a unique tvdb id (ie it hasn't been uploaded before), then it's uploaded anonymously.    This should probably only happen with 'confirmed' content - but the collaboration could be interesting.

If the data source becomes accurate enough, then I guess there is no reason why emby itself couldn't use it.

It's an interesting POC for sure .. some great ideas here  @TeamB .. :)

Link to comment
Share on other sites

That's pretty cool.

So the online db hosts just the intro fp for comparison?

Link to comment
Share on other sites

rbjtech
3 minutes ago, chef said:

That's pretty cool.

So the online db hosts just the intro fp for comparison?

Yes.

So rather than having to deduce it from a comparison of episodes, it already has it so knows exactly what to look for.

Then the local detection is no different to what IntroSkip does.

I'll try and find some time later to do some apples to apples speed comparisons - FP lookup/detect (ThemeService) vs FP compare/detect (Plugin)

The main advantage though is the flexibility - it can be used to detect anything that a user has uploaded as a 'audio/FP of interest' - it could range from a single 'Intro' that covers the entire show (as I have done, I upload BigBangTheory) to an episode level Recap, Post Credit or your Favourite Scene from a TV show ... haha. 

It's very cool !

:)

Link to comment
Share on other sites

So you still chroma print the item in your library.

Then you request (by name or other provider id) the different extracted parts from the online data base?

 

Would there be a tool to help people extract  the different parts of the file for submission?

Will there be a tool to manage the submissions to make sure the fp data is proper?

With the online image resources, we can check the data a glace to see if the submissions fits criteria.

The submissions would have to have the timestamps appended to the data. A tool could extract those time stamps to check it was actually the part labeled in submission. If you had the actual encoding it was extracted from.

 

It's a big one. But I like it.

 

 

Link to comment
Share on other sites

rbjtech
Posted (edited)
16 minutes ago, chef said:

The submissions would have to have the timestamps appended to the data. A tool could extract those time stamps to check it was actually the part labeled in submission. If you had the actual encoding it was extracted from.

No - and this is the key point here - we are not attempting to directly match 'your' episode (and any form of timings) with an online database - as we have discussed many times in the IntroSkip journey - this is all but impossible due to the infinite number of possible iterations there may be.   'All' the lookup is doing is providing a common FP .bin file and matching it (using the tvdb ID) with a Show or Season level from your emby collection.   The PoC doesn't do episode level comparisons - but could do - the local compare/detect is no different than Introskip but it doesn't have to compare against other episodes.

As an example - and why I uploaded it - the Big Bang Theory Intro is identical in my 'copy', your 'copy' , anybodys 'copy' across all episodes because that is a fixed item - what is unknown, is WHEN it plays - and that is why we then need to use the normal Introskip algorithms to detect it.

Submission stuff is listed on the web page after you create a login - yea it needs a GUI, but for a PoC it works perfectly .. ;)

Lots of unknows on quality of the submissions stuff I agree - but that aside, it's the possibilities here that are interesting.

Edited by rbjtech
  • Like 1
Link to comment
Share on other sites

6 minutes ago, chef said:

So you still chroma print the item in your library.

yes, you still need to extract the first X minutes to do the search against.

6 minutes ago, chef said:

Then you request (by name or other provider id) the different extracted parts from the online data base?

yes, at the moment the POC uses IMDB, season and episode to identify a theme chromaprint to use when searching in an episode.

8 minutes ago, chef said:

Would there be a tool to help people extract  the different parts of the file for submission?

Yeah I guess, at the moment I am just using ffmpeg directly to extract the bin data but you could build a tool for that.

9 minutes ago, chef said:

Will there be a tool to manage the submissions to make sure the fp data is proper?

That is an interesting one, how do you make sure, the only real way would be test it and see if you get good matches.

One idea I had was each theme cp on the server cold be voted Up when a client sucsesfully used it to detect an intro.

The whole comunity contribution system would need to be though out and built, the thing I have put in place is just a POC and a free for all, there would need to be checks and balances put in place.

  • Like 2
Link to comment
Share on other sites

Wow, that's next level. Very cool. 

 

Link to comment
Share on other sites

sydlexius
19 hours ago, TeamB said:

Have you experimented with this yet? It would be interested to look at this, what algorithm (hashing function) and how much would you extract from the file (how much, where from).

I think Sha256 or MD5 just the first 512 bytes of the file is going to be to variable to match variants with diffident file formats (mkv, mp4 etc) different encoding (the video stream was re-encoded) or even same file being re-muxed with extra header info say adding chapters to an mkv file.

There might be some location aware hashing algorithms like LsHash (https://en.wikipedia.org/wiki/Locality-sensitive_hashing) or CTPH (https://blog.nettitude.com/uk/context-triggered-piecewise-hashing-to-detect-malware-similarity) that can possible be used to identify similar files based on header extraction. That would also be interesting to investigate. But beyond the scope of this project.

However the main point is audio chromaprints as a detection approach can and do work.

not based on my experience, getting people to even agree its a useful data source looks like it is going to be an up hill battle.

In the interest of full disclosure, I'm not a coder.  I do a bunch of admin stuff using PowerShell, and have had to compute partial hashes in the past.  here's an example bit of code that I've used for this:

$ifile = /path/to/file
$algo = New-Object -TypeName System.Security.Cryptography.SHA256CryptoServiceProvider
$buffsize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$buffer = New-Object Byte[] $buffsize

while ($stream.Position -lt 2097152) {
    $bytesread = $stream.Read($buffer, 0, $buffsize)
    $null = $algo.TransformBlock($buffer, 0, $bytesread, $null, 0)
}

$algo.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [System.BitConverter]::ToString($algo.Hash).Replace('-', '')
Write-Output $Hash

512 bytes is too small a sample size, IMO.  In my example, it's a 2MB sample (which is probably too large) however it takes about 64ms to perform this (an aggravating factor is the slow performance of PowerShell, while a mitigating factor is that this is being run on a PCIe 4.0 NVMe drive).

As for buy-in by users and developers, that's not an area of my expertise.

Link to comment
Share on other sites

1 hour ago, sydlexius said:

In the interest of full disclosure, I'm not a coder.  I do a bunch of admin stuff using PowerShell, and have had to compute partial hashes in the past.  here's an example bit of code that I've used for this:

$ifile = /path/to/file
$algo = New-Object -TypeName System.Security.Cryptography.SHA256CryptoServiceProvider
$buffsize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$buffer = New-Object Byte[] $buffsize

while ($stream.Position -lt 2097152) {
    $bytesread = $stream.Read($buffer, 0, $buffsize)
    $null = $algo.TransformBlock($buffer, 0, $bytesread, $null, 0)
}

$algo.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [System.BitConverter]::ToString($algo.Hash).Replace('-', '')
Write-Output $Hash

512 bytes is too small a sample size, IMO.  In my example, it's a 2MB sample (which is probably too large) however it takes about 64ms to perform this (an aggravating factor is the slow performance of PowerShell, while a mitigating factor is that this is being run on a PCIe 4.0 NVMe drive).

As for buy-in by users and developers, that's not an area of my expertise.

Sorry, i should have been clearer, creating partial hashes is not difficult, as you have pointed out it is relatively trivial to do.

The question is, does it work to uniquely identify a media file? That is what i was asking, have you tried it with a few different versions of the same source, different muxed slightly different resolution, different encoder options too see if you can produce a hash that can identify that source across all the different versions of that file?

Link to comment
Share on other sites

On 5/12/2022 at 11:00 PM, TeamB said:

At no point does any clients machine data leave the client machine

Then who is this?

On 5/12/2022 at 11:00 PM, TeamB said:

one person does this work, some contributor to the system

The fact that it is just the fingerprint does help the situation but someone has to contribute those - which means there has to be some option in Emby to allow that to be done and uploaded and, once you have that, it creates the possibility of data from my machine going up to your central database.  From user perspective this would be a potential privacy issue and would cause problems with some delivery mechanisms such as app stores.

Link to comment
Share on other sites

rbjtech

I don't believe TeamB is suggesting the info comes from Emby - it can come from anybody willing to upload the FP data - in this regard, it is no different to any other open metadata provider.

 

Link to comment
Share on other sites

1 hour ago, rbjtech said:

I don't believe TeamB is suggesting the info comes from Emby - it can come from anybody willing to upload the FP data - in this regard, it is no different to any other open metadata provider.

 

Okay.  That would be fine but I imagine very difficult to obtain.

Link to comment
Share on other sites

Posted (edited)
10 hours ago, ebr said:

which means there has to be some option in Emby to allow that to be done and uploaded and, once you have that, it creates the possibility of data from my machine going up to your central database

no, all contribution is done through the web site by people who have a clue.

https://themeservice.azurewebsites.net/

Just like The TV DB metadata service, there is no upload option in Emby to upload your edited metadata. And the same with theme data, you dont want people just randomly uploading data, you want contributors who know the the #$%^ they are doing.

 

Edited by TeamB
Link to comment
Share on other sites

Team, how is it going?

Can we help you with anything? Tests, logs, etc.?

Link to comment
Share on other sites

33 minutes ago, neik said:

Team, how is it going?

Can we help you with anything? Tests, logs, etc.?

This is currently not an Emby plugin, it is a POC web service and stand alone detection tool atm.

If people want to help there is plenty they can do, have a look at the site

https://themeservice.azurewebsites.net/

Add some theme chromaprint data for your shows, there are instructions on the add page of the above site.

Download the intro detection tool (windows only at the moment)

https://github.com/faush01/ThemeService/releases

and run it on your emby server to detect intros and report back any feedback or questions you may have.

You can either download the latest full release of ffmpeg or use the ffmpeg that comes with the latest Emby Server 4.7

  • Like 2
Link to comment
Share on other sites

  • 3 weeks later...
snoe101

Hi sponsorblock has been trying to expand beyond youtube.


I've been looking at tagging podcasts and this approach of using chromaprint looks very similar to what I think will be needed to tag segments in the face of dynamic ads, ad free supporter feeds, and different hosts wrapping/re-encoding episodes. I'm going to try to implement your identification algorithm for podcast segments. Sponsorblock at least seems interested in storing spodcast segments.

I wonder if it's possible for these projects to work together? Personally, I could imagine sponsorblock being interested in storing tv intro and end credits segments in their db. A unified approach to all this may help us all skip the fluff no matter what media we're looking at.

Link to comment
Share on other sites

Yeah, community cross collaboration, not sure that is going to happen unfortunately.

I have been noticing this a lot over the last few years, the general communities of these media projects are not very willing to contribute to wider community projects if it is not exclusive to their current favorite project, I see this with plugins, data sources, sharing services and even on forums with people asking for help. Things feel more closed now and people holding their cards closer to their chest more often than not.

Communities are becoming more closed and even hostile towards other communities of other tools, apps and services, to test this go and post a positive affirmation about Plex or Jellyfin on this forum, yes I know this is for Emby but all the tools are doing the same thing and what one does well the others should at lease look at, you will have 4 or 5 community watchdogs on you in seconds.  This might be being a bit harsh, I know people feel loyalty towards their tool set and community they feel they belong to and just want to stand up for it but this often turns into closed minded aggression and I have seen it time and time again on this and other forums.

Anyway, rant over.

I feel audio fingerprinting has a lot to offer for projects like intro detection, yeah it is not a silver bullet but it is a good tool to have in your toolkit. And for other projects I am sure it will also help, I bet there are many media projects out there that could use audio fingerprinting and it just takes someone to come up with a good solution to a problem using it.

Take the current core implementation of the into detection in Emby for example, it is using an idea someone came up with and wrote a blog about ages ago and then we stumbled on it and adapted it into a usable approach, a community member built a plugin and then the core Emby team merged that with Emby as a core feature. This all happened because someone came up with a cool idea on how to use audio fingerprinting. I am not even sure that original person was a member of this community. Here is the blog.

https://www.3am.engineering/2019/04/skip-that-intro/

I though about add/commercial detection as well for live tv but the size of the commercial audio fingerprint data I feel would be too large and would take too long to scan for all the different commercials, but compared to actual video frame detection like a lot of the current ones use who knows it might be viable.

 

Link to comment
Share on other sites

Cheesegeezer
47 minutes ago, TeamB said:

Yeah, community cross collaboration, not sure that is going to happen unfortunately.

I have been noticing this a lot over the last few years, the general communities of these media projects are not very willing to contribute to wider community projects if it is not exclusive to their current favorite project, I see this with plugins, data sources, sharing services and even on forums with people asking for help. Things feel more closed now and people holding their cards closer to their chest more often than not.

Communities are becoming more closed and even hostile towards other communities of other tools, apps and services, to test this go and post a positive affirmation about Plex or Jellyfin on this forum, yes I know this is for Emby but all the tools are doing the same thing and what one does well the others should at lease look at, you will have 4 or 5 community watchdogs on you in seconds.  This might be being a bit harsh, I know people feel loyalty towards their tool set and community they feel they belong to and just want to stand up for it but this often turns into closed minded aggression and I have seen it time and time again on this and other forums.

Anyway, rant over.

i think I explained a lot in a PM regarding the reason behind a lot of closed sourced projects. And this remark just cements your comment about hostility. But just know your competition, plex is 💯 closed source with no plugin availability, concentrating on commercial solutions. Jellyfin is stuck on rev 3 of emby and nothing has moved forward. Kodi is a great frontend to Emby using its api. 
 

all the data for IntroSkip plugin is available in the API. 
 

all the plugins don’t charge for their efforts. 
 

and community is community and for all the data to be hosted elsewhere is quite a ridiculous proposition as there is no control.

so there is my rant over! 

47 minutes ago, TeamB said:

I feel audio fingerprinting has a lot to offer for projects like intro detection, yeah it is not a silver bullet but it is a good tool to have in your toolkit. And for other projects I am sure it will also help, I bet there are many media projects out there that could use audio fingerprinting and it just takes someone to come up with a good solution to a problem using it.

Take the current core implementation of the into detection in Emby for example, it is using an idea someone came up with and wrote a blog about ages ago and then we stumbled on it and adapted it into a usable approach, a community member built a plugin and then the core Emby team merged that with Emby as a core feature. This all happened because someone came up with a cool idea on how to use audio fingerprinting. I am not even sure that original person was a member of this community. Here is the blog.

https://www.3am.engineering/2019/04/skip-that-intro/

I though about add/commercial detection as well for live tv but the size of the commercial audio fingerprint data I feel would be too large and would take too long to scan for all the different commercials, but compared to actual video frame detection like a lot of the current ones use who knows it might be viable.

 

It’s cool but, you of all people should know the team is… behind the IntroSkip plugin. 
 

Bill has already got a comskip plugin for Emby, he’s working on something really cool also. 
 

us plugin developers want to bridge a stop gap until the core has time to implement them or leave them as plugins.

I’m actually up for helping this guy out, but you paint a dark scenario, which is not what we are about.  
 

From 2008 i have been with this project, so naturally there is loyalty. 
 

@snoe101 pm me with your ideas 👍👍👍

Link to comment
Share on other sites

First I just wanted to state I was not directly calling anyone out on this, I agree people should do what they feel is the correct way to contribute to the ecosystem, if that means more of the same then so be it.

I was not referring to this community specifically, but all of them, in all areas are becoming a little more closed walled garden of info and project inclusion. Even some of the projects you mentioned, Kodi, Jellyfin ect are like this, they have a close group of users/admins/support hangers-on that will blindly defend their project against all perceived bad comments.

When Emby went closed source I was really disappointed how the community handled itself, people saying all sorts of off the wall weird outright wrong driven by emotion stuff cos their favorite stream app was under a perceived threat. It was really not handled well by everyone involved I feel.

At the end of the day this is just my thought on this, this is just IMO, I know I am in the minority on most of this and that is fine. I do remember working on some projects were we all came together as a user team to build things like the original XBMC3 Kodi addon we ported from the Plex plugin. We had people from both sides of the fence working together on the best solution. I dont ever see anything like that happening again. Now it is all like, how do we build this so no one else can use it. Again this is not aimed at this community but in general this is what I feel.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...