Jump to content

Recommended Posts

Posted

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

  • Thanks 1
Posted
3 hours ago, danmarcoux said:

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

Thanks for sharing.

Posted
On 1/13/2023 at 8:52 PM, danmarcoux said:

So I was able to do most of I wanted to do with a single line in Linux (actually WSL on Windows).

  • awk -F ';' -v OFS=';' '{print $6, $9}' ReportExport.csv| sed -e 's/^\"//; s/\";/;/' | sort | uniq -cd | less

It still would be nice to have some sort of way within Emby to find duplicates.  I'm sure somebody without any kind scripting knowledge would it useful to have.

I'd agree, it would be nice to have but probably not as easy as you would think.

It would likely need to be a bit flexible as people take 'duplicate" to mean different things. For example, one person wants to make sure there isn't two of the same exact file, another person wants to make sure they don't have multiple versions of an episode or movie. Another person may have 4K, 1080 & 720 versions of media but wants to make sure there isn't two or more of any specific resolution.

The above might only be concerned with local physical files. Some people mount file systems to cloud provider storage.  Then there is also Emby's media placeholder support using stub and strm files. People using remote mounted file systems and strm files likely have even different views of what a "dupe" is.

 

  • Like 1
Posted
On 1/16/2023 at 7:48 AM, cayars said:

I'd agree, it would be nice to have but probably not as easy as you would think.

It would likely need to be a bit flexible as people take 'duplicate" to mean different things. For example, one person wants to make sure there isn't two of the same exact file, another person wants to make sure they don't have multiple versions of an episode or movie. Another person may have 4K, 1080 & 720 versions of media but wants to make sure there isn't two or more of any specific resolution.

The above might only be concerned with local physical files. Some people mount file systems to cloud provider storage.  Then there is also Emby's media placeholder support using stub and strm files. People using remote mounted file systems and strm files likely have even different views of what a "dupe" is.

 

Oh, the duplicates of the VOD and 4K streams was driving me nuts!   I would love some way to not have a library (or libraries) to be excluded from Actor media searches.  I'm probably not explaining that right.  Under a select movie, there are the actors in the movie.  I you select an actor, the Emby shows you which Movies (Shows, Epidsodes) the actor appears in.  So for example, "Sosie Bacon" was showing up Smile twice uder Movies - both for Smile - one which I had in my library and one which was VOD.  There was no way of telling (at least easily from the Roku App) which one would play.  Same thing 4K movies.  Unless I go into the 4K Library, I really don't want them showing up when I select an actor.  Harrison Ford for example.  I don't want to see (nor can I easily tell) the 4K version of Star Wars.

The 4K stuff I changed all of the posters of the movies in the 4K library so I could easily tell the 4K versions from the "standard" versions.  I also changed the Titles and SortTitles, too, to include 4K in it.   It was tedious at first, but unless I add a bunch of 4K movies at a single time, it's just a simple process for me follow.

The VOD stuff wasn't so easy since it changed whenever the provider changed out the VOD stuff.  So I came up with a different solution.  More scripting.  I changed the VOD Library so the "nfo" Metadata Saver was selected (I don't have that selected on my other libraries, I don't want my filesystem cluttered up by small .nfo files).   In the script that would fetch updated VOD titles, I put a sleep in there (1 hour to be safe) for Emby to see the new .strm files and create .nfo files for them.  After the sleep was done, I'd go in and modify the "<title>" entry in the .nfo file and prepend "VOD - " to it.  So, for example, Smile became "VOD - Smile".  All of the other metadata stayed the same, so now when I show movies for "Sosia Bacon" I see "Smile" and "VOD - Smile".   (It also changes the Title in the Report Plug-In, too, so I don't see the VOD titles as a duplicate of the titles in my "permanent" library.)

 

 

Posted

I'd think the easiest way to do that is to shutdown the server, copy the database and restart Emby Server.

Then using a tool like SQL Browser or similar a SQL Query could be ran to identify the dupes.  Once the results are what you expect it would be easy to add additonal text to the select statement for form CLI commands to delete or remove files/folders.

This takes a bit of SQL knowledge but I've posted this type of thing a couple times in the forum that could be used as a base.

Click through to the trailer post as well.

Posted

Pulling and updating the stuff from SQLLite is very cool.   I could definitely see myself writing a script to do this... (I've only been writing SQL scripts since, oh, 1992 or so. :)) So, other than doing something in PowerShell to kill the Emby process in Windows, is there a "sane" way to safely stop Emby?   I'd rather create a task to do this so I don't have do anything manually. 

 

Happy2Play
Posted
8 minutes ago, danmarcoux said:

Pulling and updating the stuff from SQLLite is very cool.   I could definitely see myself writing a script to do this... (I've only been writing SQL scripts since, oh, 1992 or so. :)) So, other than doing something in PowerShell to kill the Emby process in Windows, is there a "sane" way to safely stop Emby?   I'd rather create a task to do this so I don't have do anything manually. 

 

API call to shutdown.

curl -X POST "http:// LOCALHOST or IP:8096/emby/System/Shutdown?api_key={APIKEY}" -d ""
  • 2 years later...
Teredactle
Posted

Was there any headway made to include some duplicate finder feature as part of Emby? I don't find any other post(s) about this, and it would surely be helpful.

TY!

visproduction
Posted (edited)

Little know Windows File Explorer trick:

  1. Open drive or folder.
  2. Search for *.mp4 or *.mkv
  3. Select all results [Ctrl a]
  4. Right click with Shift button depressed - Select Copy as path
  5. Open text editor
  6. Paste the results [Ctrl v]
  7. Sort by alpha

Result:  You get a list of all the *.mp4 or *.mpv files in the drive / folder and all subfolders.  The list includes the full file directory addresses.
 

Quote

"H:\BackDrawer1\The Kings Daughter (2022)\The Kings Daughter (2022) - 720P.mp4"
"H:\BackDrawer1\The Kissing Booth (2018)\The Kissing Booth (2018) - 1080P.mp4"
"H:\BackDrawer1\The Kissing Booth 2 (2020)\The Kissing Booth 2 (2020) - 1080P.mp4"
...

If you have multiple drives / folders, just do this for every combination to get full lists of all your content.  Now the problem is that a different drive and folder may have a copy of some media that is elsewhere on another folder.  Just doing an alpha sort won't bring these two media file addresses together.  So, you can do another hack trick to fix this.  You need a good text editor, not just the standard notepad or notes.

  1. Search replace with  ability to add replace with a tab. 
    1. Enter in search \
    2. Enter in replace a tab (MS Word allows ^t, advanced (more) replace for text editors allow lower expanded feature, where [Ctrl Tab] can add a tab.
Quote

H:    BackDrawer1    The Kings Daughter (2022)    The Kings Daughter (2022) - 1080P.mp4
H:    BackDrawer1    The Kings Daughter (2022)    The Kings Daughter (2022) - 720P.mp4
H:    BackDrawer1    The Kissing Booth (2018)    The Kissing Booth (2018) - 1080P.mp4
 

     2. Replace the quotes with nothing.  Otherwise Excel will get confused.

Copy paste this into Excel.  The Tabs will make Excel put each text area into it's own cell on a row.

The tricky thing left to do, is if you have subfolders that do not line up.  If media is in the 2nd subfolder on a drive like Season 01.  Then it will not line up with other media that is not so far down the folders.  Excel lets you sort all those type of rows together and grab them all at once and drag several hundred over a column if needed.  Just keep all the row data together.  You want to find one column that lines up with the media file names of all your collection.  Once you do that, you can sort by column for the file name and you can see 10,000 plus rows comparing media file names.

Obviously with that many media files, it can become tricky to find the duplicates, even if they are now on adjacent rows.  Excel has an easy answer to that.

   1.  Enter this code into a free, open column that has no data yet.
=IF(D1=D2,"dupe","")

=IF(D1=D2,"dupe","")

OR

=IF(D1=D2,1,0)

  2.  Now drag down the formula to copies to all cells below until you reach the end of the rows with data.  You can also copy paste, if there are so many rows.  This will add easily 100 or 10,000 lines of code in your spreadsheet in seconds.

  3.  Anywhere you see either "dupe" or 1, depending on which code you preferred.  That shows a duplicate file name.  This won't catch slightly different names.

  4.  If you want to lock the results showing the dupes in place, you can select the column with the dupe info, 'copy' and 'paste as value'.  That will remove the formula and put in the dupe or 1 value.  Now if you sort the entire rows by the dupe column, the duplicates will all come together.

Summary:
Yes, this is a whole slew of hack steps.  It's heavy manual, but once you learn it, you can review all media files for duplicates across all your hard drives.  If I wanted to check, say 5 hard drives for all .mp4 and .mpv files and sort 5000 entries to find the dupes, I think it would take me about 30 minutes.  If you like this kind of hack tricks, maybe it can work for you.  Most people will probably only use some automated plugin.  At least this is a method where you can change your own tire, so to speak.

Hope that helps

 

Windows Explorer Shift trick to find search Copy as Path.jpg

Sort movie list in Excel by column 01.jpg

Edited by visproduction
  • Like 1
  • 4 weeks later...
Posted

A new Duplicates filter for Movies and Episodes will be in Emby Server 4.9:

Untitled.png

  • Like 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...