Jump to content

Script: Suggests corrections to Emby's TMDB match


ginjaninja

Recommended Posts

ginjaninja

Background

The script logs movies it believes are misidentified on TMDB.

The preference algorithm is a work in progress. Currently (highest preference first)

  • nearest year with exact match on title (title, original title and alternative titles), otherwise
  • best word match count (title, original title and alternative titles)
  • best levenshtein distance (title, original title and alternative titles)
  • highest vote count

It works better on a library where the filenames match TMDB and religiously follow the Emby movie naming scheme. Emby's default TMDB choice is better when there is a discrepancy with TMDB esp. year but perversely Emby's default choice was not always as reliable when media exactly matches TMDB. So far i would characterise the algorithm as a useful check on Emby's choice rather than an improvement.

Requirements

  • Powershell 7.
  • TMDB api key and add to config. (they are free for personal use)
  • edit config.psd1 to your preferences, add un and pw.

image.thumb.png.bf55340b0967d29aab4e5565510fd7c9.png

Sample log showing 35 misidentifications (from 3500).

log.csv

 

The log contains links for easier review and actioning of suggested corrections

  • item in emby
  • item in tmdb (current id)
  • item in tmdb (the suggested id)

risks are low as the only changes made are to a log file.

issues and suggestions welcome, interested to hear peoples results. Particularly interested in ideas/properties to improve the matching algorithm. If i can make it reliably better than Emby i would slave it to a ScriptX omediaaddded event to update the default choice.

v0.0.0.2

corrected emby item url in log

CheckIdentity v0.0.0.2.zip

 CheckIdentity v0.0.0.1.zip

Edited by ginjaninja
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

  • 4 weeks later...
ginjaninja

v0.0.0.3 Changes

reduced weighting of year to lowest (except edge cases), save when a exact match with a level of vote count is found

check to stop further api requests when an 100 confident match is already found - slight speed improvement.

added a preference function with sample rules to handle edge cases and assess a confidence score (early days). Amendments straight forward.

  • rule to address limited filesystem character set unfairly diminishing legitimate matches eg 50/50
  • rule to increase weight of year discrepancy in low vote count scenarios eg The Angel with the Trumpet

Verbose mode - displays the preference engine output

reduced tmdb api  page count, when year exists to 2, 3+ never helped in my test set - slight speed improvement

Logs incorrect filenames on filesystem in relation to matched title/year

wired up native and preferred languages in case they became useful in disambiguation - not currently in use

 

This version keeps the the weighting for exact matches whilst be similarly accommodating of Emby's tolerance of year discrepancy.

In time, the confidence score might allow for making changes to Emby and Filesystem in high confidence scenarios, whilst alerting user to lower confidence scenarios.

hope 1 day to turn into a plugin

CheckIdentity v0.0.0.3.zip

Edited by ginjaninja
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...