Jump to content

How to find duplicates?


brallor

Recommended Posts

danmarcoux

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

  • Thanks 1
Link to comment
Share on other sites

3 hours ago, danmarcoux said:

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

Thanks for sharing.

Link to comment
Share on other sites

On 1/13/2023 at 8:52 PM, danmarcoux said:

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

I'd agree, it would be nice to have but probably not as easy as you would think.

It would likely need to be a bit flexible as people take 'duplicate" to mean different things. For example, one person wants to make sure there isn't two of the same exact file, another person wants to make sure they don't have multiple versions of an episode or movie. Another person may have 4K, 1080 & 720 versions of media but wants to make sure there isn't two or more of any specific resolution.

The above might only be concerned with local physical files. Some people mount file systems to cloud provider storage.  Then there is also Emby's media placeholder support using stub and strm files. People using remote mounted file systems and strm files likely have even different views of what a "dupe" is.

 

Link to comment
Share on other sites

danmarcoux
On 1/16/2023 at 7:48 AM, cayars said:

I'd agree, it would be nice to have but probably not as easy as you would think.

It would likely need to be a bit flexible as people take 'duplicate" to mean different things. For example, one person wants to make sure there isn't two of the same exact file, another person wants to make sure they don't have multiple versions of an episode or movie. Another person may have 4K, 1080 & 720 versions of media but wants to make sure there isn't two or more of any specific resolution.

The above might only be concerned with local physical files. Some people mount file systems to cloud provider storage.  Then there is also Emby's media placeholder support using stub and strm files. People using remote mounted file systems and strm files likely have even different views of what a "dupe" is.

 

Oh, the duplicates of the VOD and 4K streams was driving me nuts!   I would love some way to not have a library (or libraries) to be excluded from Actor media searches.  I'm probably not explaining that right.  Under a select movie, there are the actors in the movie.  I you select an actor, the Emby shows you which Movies (Shows, Epidsodes) the actor appears in.  So for example, "Sosie Bacon" was showing up Smile twice uder Movies - both for Smile - one which I had in my library and one which was VOD.  There was no way of telling (at least easily from the Roku App) which one would play.  Same thing 4K movies.  Unless I go into the 4K Library, I really don't want them showing up when I select an actor.  Harrison Ford for example.  I don't want to see (nor can I easily tell) the 4K version of Star Wars.

The 4K stuff I changed all of the posters of the movies in the 4K library so I could easily tell the 4K versions from the "standard" versions.  I also changed the Titles and SortTitles, too, to include 4K in it.   It was tedious at first, but unless I add a bunch of 4K movies at a single time, it's just a simple process for me follow.

The VOD stuff wasn't so easy since it changed whenever the provider changed out the VOD stuff.  So I came up with a different solution.  More scripting.  I changed the VOD Library so the "nfo" Metadata Saver was selected (I don't have that selected on my other libraries, I don't want my filesystem cluttered up by small .nfo files).   In the script that would fetch updated VOD titles, I put a sleep in there (1 hour to be safe) for Emby to see the new .strm files and create .nfo files for them.  After the sleep was done, I'd go in and modify the "<title>" entry in the .nfo file and prepend "VOD - " to it.  So, for example, Smile became "VOD - Smile".  All of the other metadata stayed the same, so now when I show movies for "Sosia Bacon" I see "Smile" and "VOD - Smile".   (It also changes the Title in the Report Plug-In, too, so I don't see the VOD titles as a duplicate of the titles in my "permanent" library.)

 

 

Link to comment
Share on other sites

I'd think the easiest way to do that is to shutdown the server, copy the database and restart Emby Server.

Then using a tool like SQL Browser or similar a SQL Query could be ran to identify the dupes.  Once the results are what you expect it would be easy to add additonal text to the select statement for form CLI commands to delete or remove files/folders.

This takes a bit of SQL knowledge but I've posted this type of thing a couple times in the forum that could be used as a base.

Click through to the trailer post as well.

Link to comment
Share on other sites

danmarcoux

Pulling and updating the stuff from SQLLite is very cool.   I could definitely see myself writing a script to do this... (I've only been writing SQL scripts since, oh, 1992 or so. :)) So, other than doing something in PowerShell to kill the Emby process in Windows, is there a "sane" way to safely stop Emby?   I'd rather create a task to do this so I don't have do anything manually. 

 

Link to comment
Share on other sites

Happy2Play
8 minutes ago, danmarcoux said:

Pulling and updating the stuff from SQLLite is very cool.   I could definitely see myself writing a script to do this... (I've only been writing SQL scripts since, oh, 1992 or so. :)) So, other than doing something in PowerShell to kill the Emby process in Windows, is there a "sane" way to safely stop Emby?   I'd rather create a task to do this so I don't have do anything manually. 

 

API call to shutdown.

curl -X POST "http:// LOCALHOST or IP:8096/emby/System/Shutdown?api_key={APIKEY}" -d ""
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...