Jump to content

How to find duplicates?


brallor

Recommended Posts

PenkethBoy
14 minutes ago, cayars said:

Well you can do it with any script language .  You just need to build a master list of all the content but then you can't easily eliminate 3D or 4K material unless you keep it in different locations.

But it's still not easier IMHO then just running a SQL statement against a database that already has all the data including movie name, location and it's attributes.

LOL - you keep banging the sql drum where only a miniscule number of people know how to use it - i do but would not use it for this - way overkill

not sure why 3d 4k matters in any way

Link to comment
Share on other sites

Ronstang

You're forgetting that most of us are not coding geeks, I have no idea how to run a script.  We want to hit a button and see the stats.  I know I do, I'm not jumping though a bunch of hoops to find my dupes as I will just wait until I come across them perusing my collection looking for something to watch.  EmbyStat is easy enough but the function I need was taken out.  

The easier Emby is to use the larger market it will have from a business perspective.  Most people these days are lazy and don't want to learn new things they want instant gratification.  Everyone who sees my setup and peruses my collection wants one of their own until I tell them all the work it takes. 

Link to comment
Share on other sites

yaksplat
31 minutes ago, cayars said:

Well you can do it with any script language .  You just need to build a master list of all the content but then you can't easily eliminate 3D or 4K material unless you keep it in different locations.

But it's still not easier IMHO then just running a SQL statement against a database that already has all the data including movie name, location and it's attributes.

True.  I agree that it'd be nice if the db wasn't in exclusive mode.  I don't feel like shutting down the server to run a query.

Link to comment
Share on other sites

yaksplat
1 minute ago, Ronstang said:

Everyone who sees my setup and peruses my collection wants one of their own until I tell them all the work it takes

You want to do this? Here's my server, backup strategy, methods of finding content, vpn, network system etc....  You can see the light fade from their eyes.

Link to comment
Share on other sites

Happy2Play
1 minute ago, Ronstang said:

EmbyStat is easy enough but the function I need was taken out.  

Already mentioned in other topic, just run previous version 0.2.0 beta 20.

Link to comment
Share on other sites

You guys crack me up.  It's too hard to execute a batch file to run a script because it's difficult but then show multiple steps operations using programs, sorting data and other stuff or writing scripts to find files with same names across your discs.  All of these have issues. Top Gun.mp4, Top Gun (1986).mpg, Top Gun.mkv, Top Gun (1986).mkv are the SAME movie but looking at files names won't find them because they are not duplicate file names.

I want to find duplicate movies not duplicate files so I need the data already mapped to metadata IDs

Exporting data from Emby into Excel via plugin, can allow you to use the Movie Name vs file name so it doesn't have that issue but then you have a few more steps to do and it's manual.

But all that data is already in the Database which is why I like that option.  I'm certainly not saying other must do it "my way" as there isn't a right way short of Luke or 3rd party adding this directly into Emby.  Personally I think we should have a DUPE FILTER built right into each library UI so you can see dupes and remove them right from the interface but until we get that...

Just pick your poison. :)

Link to comment
Share on other sites

PenkethBoy

Sorry - the database can and is wrong - see collection issues where orphaned data is left behind - its not the only error emby has with data

file names is one way to do it - not the only way - i have just shown two different ones that depending on a users cababilites could use to find duplicate files

SQL is not the panacea you think it is - it no better or worse than the other methods - all depend on manual intervention 

and a plugin although convenient for easy of use - suffers from the same data issues the sql will

none are perfect - brain cells will have to be applied to the results

top gun example - easy to account for  - the issues you sight are not issues but methods of getting to a result

sql has its limitations and with sqlite more so

maybe put your money where your mouth is and show us script junkies how you do it?

Link to comment
Share on other sites

IMHO results are only as good as the data.  If you have mismatched names in Emby then that's it's own issue that should be fixed before worry about dupes in my book. To me this is no different then having a movie name without a year where you can't tell what movie it is without viewing it first.

garbage in, garbage out applies.

Even with bad data I'd wager for the average user looking for data using SQL is better than any file based approach because I'd think many dupes are caused by similar but different file names of movies, especially how many people get there files and don't use tools like filebot to rename them.

I totally agree, no one way is perfect but to me basing dupes on file names is far more limited than SQL any/every day of the week. :)

Link to comment
Share on other sites

PenkethBoy

Yes depends where you are starting from

If you are part of the unwashed you have files named in a random manner and dont do any cleanup (seeding/use whatever the ...arr saves the file as etc) - via filebot or other methods so you have standard naming - then its a crapshoot. Also without cleanup you get a few emby miss identified Movies etc - then the Name matching via SQL wont work until/if a user actual engages and corrects it.

If you do a bit of work on files as they "arrive" then you dont get inconsistent results - i.e. file names other than a genuine error/ cockup

Then your Top Gun example does not apply as changing the "name" to "basename" in my ps script will ignore the file extension

  • Like 1
Link to comment
Share on other sites

Ronstang
23 hours ago, Happy2Play said:

Already mentioned in other topic, just run previous version 0.2.0 beta 20.

For me at least this version doesn't work.  It reports "unhandled backend exception" and shows the Movies as "loading" into perpetuity.

EDIT:  Just tried a clean install after removing latest version and now running EmbyStat results in an error and it never starts.

Edited by Ronstang
Link to comment
Share on other sites

Ronstang

Ok, I got the older version of EmbyStat working finally and it is great for my needs.  I hope the author decides to put the ability to see duplicates back into the newer version as it is definitely much nicer than this version.

  • Like 1
Link to comment
Share on other sites

rbjtech

Currently, my file names hold extra info that is not in the SQL dB - especially on HD Sound formats - Atmos, DTS-X etc and also HDR coding method, DV, HDR 10+ etc . 

Once the SQL dB caters for all of this automatically, then I will agree that SQL is the probably the better option, but currently, for me anyway, searching by filename for specific items is actually more accurate.

 

Edited by rbjtech
  • Like 2
Link to comment
Share on other sites

That's cool you do that.  I never got around to doing that but have meant to do it for years. :) 

Link to comment
Share on other sites

Ronstang

I've been adding extra info to my filenames and folders for almost two decades when I went to reorganize my giant music collection .  Having that extra info makes identifying things super quick and I can tell if I have a duplicate or a different version whether it be the pressing of a CD or the encoder used..  It's also how I identify different versions of movies in Emby by going to the details page and looing at the folder name / file name.....which then tells me all I need to know at a glance.

Link to comment
Share on other sites

  • 3 weeks later...

@varmandra sorry for the late reply. It is not a feature that will be forgotten in EmbyStat. I will have to take some more time to implement that feature again and also update to the tvdb V2 API asap.

  • Like 1
Link to comment
Share on other sites

Ronstang
16 hours ago, reggi said:

@varmandra sorry for the late reply. It is not a feature that will be forgotten in EmbyStat. I will have to take some more time to implement that feature again and also update to the tvdb V2 API asap.

That is great because your newer version is super nice and having that feature back would be really nice.  Thanks for a great program.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...
On 12/23/2020 at 1:48 PM, GtownE said:

@reggi Any updates??

You may want to check out the latest news here: 

 

Link to comment
Share on other sites

  • 11 months later...
horstepipe

Hello folks,

I also want to find duplicate movies. The problem here is that I only need specific libraries to be compared. I'm having a movies and a movies 4k libary which should not be compared, as duplicates are welcome there.

Any idea to do that just with selected libaries?

Link to comment
Share on other sites

On 12/23/2021 at 8:50 AM, horstepipe said:

Hello folks,

I also want to find duplicate movies. The problem here is that I only need specific libraries to be compared. I'm having a movies and a movies 4k libary which should not be compared, as duplicates are welcome there.

Any idea to do that just with selected libaries?

Not built-in, but if you search the forum for duplicates I think you will find techniques that users have come up with.

Link to comment
Share on other sites

horstepipe

Thanks, but unfortunately I do not find a way for my purpose (just comparing some libraries).

if anyone got a link please share 🙂

Link to comment
Share on other sites

  • 2 months later...
horstepipe
On 12/28/2021 at 6:02 AM, horstepipe said:

Thanks, but unfortunately I do not find a way for my purpose (just comparing some libraries).

if anyone got a link please share 🙂

BUMP

@cayars

could you please tell me how to check for duplicates within the database (linux or windows, doesn't matter) in ONE libary?

Link to comment
Share on other sites

 DBBrowser for Sqlite

This is what I use but I exclude 3D and 4K files (I keep 3D and 4K in the name)

select Name, ProductionYear from MediaItems where type=5 and Path not like '%4k%' and Path not like '%3D%' and Path not like '%extra%' group by Name || ProductionYear having count(Name || ProductionYear) > 1;

You can find a copy here with some other scripts:

from that message down a few are a couple of scripts you might like

Edited by cayars
  • Thanks 1
Link to comment
Share on other sites

  • 5 months later...
rcanpolat

Bumping this topic up again. Emby needs a way to display and allow for media deletion of duplicates.

I keep a Plex docker container on standby and fire it up every few weeks to check for duplicates. It takes 23.6 GB (and growing) of drive storage just to have that container on standby for the duplicates detection feature. Granted there are more cumbersome ways of finding duplicates outside of Plex but nothing is as elegant or simple as Plex's method. Surly the Emby dev's can build something to show duplicates of TMDB or IMDB codes and provide a delete button or a trash can for safe measure. 

  • Agree 4
Link to comment
Share on other sites

  • 5 months later...
danmarcoux

Bumping a 6 month old thread....

My use case is different from everything I've seen on here.   There are times when Emby misidentifies a file sometimes causing a duplicate of a different title.  

I've written scripts to run against a generated report (ReportExport.csv) comparing the SortName field with the Path field (stripping pathname and extension and articles (the, an, a - my file name will be something like "Big Trail, The (1930).mkv" not "The Big Trail (1930).mkv" or "Big Trail (1930).mkv") and yet I still find duplicates or misidentified files.  Very frustrating.  

The closest solution I've found is on page 1 of this thread where you "Start collapsed" and look for ": 2".  That's a simple solution for a small library, but I have a large library and am literally finding hundreds of ": 2".  

  • Thanks 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...