Jump to content

Emby Media Deduplication Script - Manage Your Large Collections


troy_

Recommended Posts

Hello fellow Emby enthusiasts,

Today, I’m excited to share with the community a new tool designed to aid in managing large media libraries with a level of automation – introducing the Emby Media Deduplication Script! As our collections grow, so does the likelihood of accumulating duplicate content. This script aims to streamline the process of identifying and eliminating such redundancies.

Why the Deduplication Script?
With extensive collections spanning thousands of titles, manually sorting through duplicates can be an overwhelming task. The deduplication script was created out of a need to handle these large libraries efficiently and automatedly, performing thorough scans and safely deleting duplicates without requiring user supervision for each action.

What Does It Do?
The script connects to your Emby server and:
 
- Scans the specified library for media items with matching provider IDs (such as Imdb, Tvdb, Tmdb).
- Utilises a quality-based algorithm to retain the best version of a duplicated item.
- Deletes all other lesser-quality duplicates.
  
It’s designed to be hands-off; once configured and initiated, the script will operate autonomously, making it suitable for large libraries where manual oversight isn't feasible.

How Can You Help?
We're calling for the brave and intrepid to help us test the script. Before you proceed, here are some crucial things to consider:

  • The script deletes media: It is built to be unsupervised, and upon confirmation (`--doit` flag), it will remove duplicate content deemed lesser in quality.
  • It's powerful: While designed with care and extensive error handling, it has the potential to make significant changes. Test it with smaller sections of your library or use `--doit` sparingly if you want to assess its actions step-by-step.
  • Feedback is invaluable: Share how it fares with your collection. Insights into its operation and suggestions for improvements are warmly welcomed.

Testing Guidelines:

1. Backup your library metadata (and your content if you can?)
2. Review the script's logging details to understand its decisions.
3. Don't use the `--doit` flag or set it to false before you let the script make any changes.

Getting Started:

This is not an Emby plugin, nor does it run "inside" Emby. Here's a quick guide to get you started:

  1. Access the Script Repository:
    The script is available on GitHub, a platform for sharing code. Visit the repository at troykelly/emby-dedupe, where you can download the container and view detailed instructions.
  2. Why you need Docker:
    Docker is a container platform that allows you to run applications in a controlled environment, ensuring consistency regardless of where the script is run. It simplifies the process of setting up and operating software across different systems. You can download Docker from Docker's official website.
  3. How to Use the Script with Docker:
    Once you have Docker installed and running on your system, you can pull the containerized version of the deduplication script from the repository. The container includes all necessary dependencies, so you don’t have to worry about the setup. Instructions for pulling and running the Docker image are detailed in the repository’s README file.
  4. Initial Precautions:
       - Make sure your Emby server is running.
       - Make a backup of your Emby server data.
       - It is recommended to run the deduplication script when your server is not under heavy load, to minimize potential issues.
  5. Execution:
    The repository instructions will guide you through executing the script step-by-step. You'll be able to simulate what the script does before you let it make any actual changes. This is the perfect way to see what the script might do once the `--doit` flag is enabled.
  6. Need Help?
    If you have any questions or get stuck, feel free to ask in this forum thread or open an issue on the GitHub repository for assistance.

Important Note:
Please ensure that you thoroughly understand the script’s functionality and the implications of automated content deletion. While designed for safety and accuracy, verifying its actions align with your expectations and collection management strategies is essential.

We are looking forward to the community's involvement in refining this tool. As we collectively test and provide feedback, we can fine-tune its performance better to serve the diverse needs of Emby users worldwide.

Happy deduplication!
 

Edited by troy_
formatting
  • Thanks 2
Link to comment
Share on other sites

I'd really appreciate some input here: https://github.com/troykelly/emby-dedupe/issues/31

If you are comparing two pieces of visual media - what would be your decision flow?

  • Vertical and horizontal resolution?
  • bit rate?
  • codec?
  • audio channels?
  • languages?

You can see the new decision flow here (yes, it's code - but it's still pretty clear even if it's not something you do every day)

Link to comment
Share on other sites

The script uses the existing provider matches, it's not looking for content matching - the assumption is that Emby has already done that.

I'm trying to refine what "better quality" is to everyone. Highest resolution? Best compression? Color space?

What if it's an upscaled copy versus a lower resolution original, etc...

  • Thanks 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...