Auto-Naming Having Issues: Seems to be a trend...

March 11, 2019

I recently did a "rebuild" of my Movie library because I was having issues. I re-imported the entire 2600+ movies and have found some naming issues that never should have happened. It seems there's a glitch in the auto-naming process.

Example:

DIR: Penguins of Madagascar - Operation DVD Premiere (2010)

|-- FILE: Penguins of Madagascar - Operation DVD Premiere (2010).mkv

The name is dead-on and the year is correct. Yet, the auto-naming chose The Penguins of Madagascar: Operation Penguin Patrol instead, which was released in 2011. So, it ignored the correct name AND the correct year and opted for a different title.

This is one of three I have found so far that exhibited the exact same outcome - ignoring the correct name AND year and choosing a very similarly named release from a different year instead. This leads me to believe that there's either a bug or a misconfiguration in the auto-naming setup.

Server version 4.0.2.0

BTW: All movies found so far were properly identified by Plex when they were imported there.

March 11, 2019

The Big Bang (2011) gets incorrectly identified as Nanny McFee and the Big Bang (2010) even though the directory and filename BOTH have the correct title AND the correct year.

March 11, 2019

Jackass 3D (2010) misidentified as Jackass 3.5 (2011).

March 11, 2019

The List (2007) misidentified as The Bucket List (2007)

Why is Emby ignoring perfect matches and making this some sort of "popularity contest" instead? "I doubt you have The List since it wasn't very popular. You must have meant The Bucket List instead."

March 11, 2019

Big Bang Theory was originally aired in 2007 - so if you start putting other years in the title - it's going to confused things.

\Big Bang Theory (2007)\

\Big Bang Theory (2007)\Season 1

\Big Bang Theory (2007)\Season 1\Big Bang Theory (2007) - s01e01.mkv

...

\Big Bang Theory (2007)\Season 12

\Big Bang Theory (2007)\Season 12\Big Bang Theory (2007) - s12e01.mkv

.. is an example of how it should be named to be correctly picked up.

(*) Ah sorry - you're not talking TV Series .. my bad !

Edited March 11, 2019 by rbjtech

March 11, 2019

Big Bang Theory was originally aired in 2007 - so if you start putting other years in the title - it's going to confused things.

\Big Bang Theory (2007)\

\Big Bang Theory (2007)\Season 1

\Big Bang Theory (2007)\Season 1\Big Bang Theory (2007) - s01e01.mkv

...

\Big Bang Theory (2007)\Season 12

\Big Bang Theory (2007)\Season 12\Big Bang Theory (2007) - s12e01.mkv

.. is an example of how it should be named to be correctly picked up.

The Big Bang is a movie (not a TV show). It's also named correctly according the conventions used by both Emby -AND- Plex.

While I appreciate you posting up naming guidelines, they don't directly translate here because it's MOVIE content that's being discussed.

March 11, 2019

If you type in your searches in 'themovedb.org' itself - I am getting the same results. Emby is just using the API so it's actually a problem in themovedb.org I think ..

https://www.themoviedb.org/search?query=The%20List%20y%3A2007&language=en-US

As an example - using the correct syntax - shows 'The bucket list' ..

In the library settings (advanced), it may also be worth changing the order of the downloaders to something that isn't TheMoveDb if it's throwing errors ..

Edited March 11, 2019 by rbjtech

March 11, 2019

If you type in your searches in 'themovedb.org' itself - I am getting the same results. Emby is just using the API so it's actually a problem in themovedb.org I think ..

https://www.themoviedb.org/search?query=The%20List%20y%3A2007&language=en-US

As an example - using the correct syntax - shows 'The bucket list' ..

API searches are able to be more intelligent than human searches... You can include the year and a "primary release year" as possible variables. Since at least the year piece is known, that should be part of the search.

Additionally, allowing multiple responses and then opting for a perfect match before choosing whatever is the first response should be part of the process.

March 11, 2019

API searches are able to be more intelligent than human searches... You can include the year and a "primary release year" as possible variables. Since at least the year piece is known, that should be part of the search.

Additionally, allowing multiple responses and then opting for a perfect match before choosing whatever is the first response should be part of the process.

It is - look at the URL - it includes the year. This is my point, TheMoviedB is incorrectly identifying it based on the title AND year..

March 11, 2019

I don't know whether it would be called correct or incorrect, but "The List" is shown as a result. It's just not the FIRST result returned. There's some sort of loose search going on behind the scenes that returns more than the specific items searched for.

My point in using exact names is that ALL results should be compared against the original search locally and preference given to an exact match. Plex got it wrong, too, but it at least got the title correct (there were two different movies called "The List" that were released in 2007).

Edited March 11, 2019 by ember1205

March 12, 2019

The first example is due to the one year tolerance we are using with MovieDb.

I suppose we could consider not doing that anymore @@Happy2Play

March 12, 2019

The first example is due to the one year tolerance we are using with MovieDb.

I suppose we could consider not doing that anymore @@Happy2Play

I think that having a threshold like you do is reasonable. Maybe it could be configurable for how tight or loose the tolerance is or even turned on/off?

I would agree that the one year tolerance allowed that misidentification to have that particular title available, but I see the direct cause of it in the first place being that the local server does not compare all results against the search strings on the server itself. Instead, it merely accepts the highest ranked response as authoritative, even though TMDB is wrong a fair number of times (in my 2600 movies, I had to correct at least 50 titles after import). The process today appears to be:

Query TMDB with complete title from filename (possibly from metadata?) and year from filename (possibly from metadata?). Accept whatever TMDB returns as it's first response regardless of whether the name and year actually match or not.

The process SHOULD be (IMHO):

Query TMDB with complete title from filename (possibly from metadata?) and year from filename (possibly from metadata?). Iterate over the complete list of responses to determine if there is an exact match against the original query data or not. If there is, use it (the first one that is a perfect match). If not, use whatever is the first result like it does today. If it's possible to identify that A) there were multiple potential perfect matches or there were no perfect matches, then maybe put some sort of indicator on the movie card so that the admin can verify that it matched correctly.

It's never going to be perfect. But, what I describe above would definitely get a lot closer.

March 12, 2019

The process SHOULD be (IMHO):

Query TMDB with complete title from filename (possibly from metadata?) and year from filename (possibly from metadata?). Iterate over the complete list of responses to determine if there is an exact match against the original query data or not. If there is, use it (the first one that is a perfect match). If not, use whatever is the first result like it does today. If it's possible to identify that A) there were multiple potential perfect matches or there were no perfect matches, then maybe put some sort of indicator on the movie card so that the admin can verify that it matched correctly.

It's never going to be perfect. But, what I describe above would definitely get a lot closer.

Perhaps, but there could potentially be a significant performance impact to that on library scans.

The provider's should be returning the "best" result first. However, I guess maybe in some instances they are ranking the results on something other than straight search parameters (perhaps popularity...?).

March 12, 2019

Perhaps, but there could potentially be a significant performance impact to that on library scans.

The provider's should be returning the "best" result first. However, I guess maybe in some instances they are ranking the results on something other than straight search parameters (perhaps popularity...?).

What if it were to be done on initial scan only (meaning only when a title is initially found and not matched)? Once the title has a match, does it need to be "re-matched" on a regular basis?

As far as the sites and how they return results - they are clearly returning matches in a sorted order based on "score" of the specific title (popularity? rating? not sure...). If they were to add additional criteria (specific matches first), that would increase THEIR workload. If you multiply the increase by how many searches are going on per second, it would likely crash their systems.

Edited March 12, 2019 by ember1205

March 12, 2019

What if it were to be done on initial scan only (meaning only when a title is initially found and not matched)? Once the title has a match, does it need to be "re-matched" on a regular basis?

We already do that. It will only query the providers if we are looking to match it. However, there are options for people to re-scan their libraries including a re-match on all items.

March 12, 2019

And, just to catch your next suggestion of "make it optional" ...

That is also a possibility but that doesn't stop people from complaining. We already have options for items during the scan and label them specifically with a warning that it will increase scan times but people still turn them on and complain that the scan is too slow.

So, we have to weigh just how often this is really a problem against the cost of making it more "perfect".

March 12, 2019

Which option is that to rescan with re-match?

March 12, 2019

Which option is that to rescan with re-match?

"Replace all metadata"

March 12, 2019

"Replace all metadata"

Does it zero out the data and start from scratch? Or does it collect new data first before replacing what's there? I'm trying to figure out why -I- would use that option on a 2600 item library that I have had to manually correct quite a few entries... In other words, I'm wondering what the original intent of that scan option versus deleting and re-adding the media (other than convenience).

March 12, 2019

It is a way to start from scratch.

March 12, 2019

.. and it doesn't have to be done from the 'root' - you may choose to replace all metadata for a particular TV Series for example. I find this useful over time, as metadata providers change the Series 'banners' to different types through the lifetime of the series, so to get them all looking the 'same', I do a replace all metadata and it works great to re-align them all with the same 'style'.

March 12, 2019

.. and it doesn't have to be done from the 'root' - you may choose to replace all metadata for a particular TV Series for example. I find this useful over time, as metadata providers change the Series 'banners' to different types through the lifetime of the series, so to get them all looking the 'same', I do a replace all metadata and it works great to re-align them all with the same 'style'.

Gotcha. I can see where that would be useful.

I'm wondering, though, if the "match" (identity) should be considered replaceable under most circumstances. Thinking "out loud" here...

On an initial scan, we have no match, so we need one. If the match is wrong, we can manually adjust it. At that point, the item is properly identified and should remain that way unless manually overridden (more on this later).

A metadata refresh should be allowed to refresh EVERYTHING, but you can't pull metadata about an item that hasn't been identified. So, what if we left the identity in place on a refresh? That wouldn't overload the system since it would be on par with what's going on today.

Identity could be manually overridden directly in the item (like we do today) for creating a single new match or it could be overridden globally (on any items where it isn't locked) on a larger scan. If the user kicks off this type of scan, they need to be presented with multiple "Are you VERY sure?" warnings that they must acknowledge. That way, when they complain about the performance hit, they can be reminded of the multiple times they said it was ok before the scan kicked off.

I don't know if the matching ID can be locked today, but it should be able to be locked so that only a manual override could change it in the future.

Auto-Naming Having Issues: Seems to be a trend...

Recommended Posts

ember1205 23

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

rbjtech 4362

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

rbjtech 4362

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

rbjtech 4362

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

Luke 37345

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ebr 14973

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ebr 14973

Link to comment

Share on other sites

ebr 14973

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ebr 14973

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

ebr 14973

Link to comment

Share on other sites

rbjtech 4362

Link to comment

Share on other sites

ember1205 23

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Activity