More intelligent library change detection

September 22, 2023

If you change your library path names despite movie folders/files themselves being untouched, the entire library will rebuild. This can take considerable time.

e.g /root1/movies1 to /root/movies

Would it not be possible that upon scan a quick hash of file is performed. And then upon rescanning in the future it realises it already has this file metadata and therefor no need to nuke it from its metadata and rebuild it

i think you get the idea.

I know some might suggest putting metadata alongside the media but this is surely less optimal if your media is on slow storage

If I’m wrong on anything please correct me. Thanks.

Edited September 22, 2023 by embylad892746

September 22, 2023

This is another reason why I always suggest using UNC path names instead of local paths as it makes things 'portable' behind the scenes.

As an example, for as long as I can remember, my 'movie' path has always been the UNC path \\media\movies - what's 'behind' that share has been changed many times - but emby doesn't know any different, so it doesn't have to re-scan anything if and when I change it. Could be a different local filesystem, different host - doesn't matter, the UNC stays the same.

You can also do something similiar with symlinks (folder level) but thats restricted to the local machine - shares are universal.

September 22, 2023

I completely agree @rbjtechthat this is "the way". The reason i have this mess is that i've just done the equivalent thing, but on the filesystem level usering mergerfs - now all that emby knows about is /movies /tv which both respectively point to different merged drives.

However, for those not yet using UNC/mergerfs it just seems to me that a small amount of extra intelligent logic could save thousands of scan hours for people in the future when these things inevitably happen again.

September 22, 2023

OSHash may be a good hashing algo. for this. Plex also uses something similar with their relatively new scanners.

Further reference: https://trac.opensubtitles.org/projects/opensubtitles/wiki/HashSourceCodes

September 22, 2023

3 hours ago, rbjtech said:

This is another reason why I always suggest using UNC path names

And some sort of drive pooling and then you never have to have this issue.

Still, this is a valid request but it will probably come down to cost/benefit. One of the most common complaints we get is around library scan speed and every feature/computation we add into that mix increases that. Even when we make them optional and put red warnings on them, people still complain that their scan is slow and then we find they've enabled all these options that slow it down.

So, it comes down to is the cost on every single scan worth it for a situation that is really gonna occur < 1% of the time...?

September 22, 2023

The hash should only be calculated & used if the modtime & size match, both of which are present in the directory listing on every platform. And OSHash should be even faster than doing an ffprobe on most files since it needs to read even lesser data.

September 22, 2023

2 minutes ago, adminExitium said:

The hash should only be calculated & used if the modtime & size match, both of which are present in the directory listing on every platform. And OSHash should be even faster than doing an ffprobe on most files since it needs to read even lesser data.

The hash has to be calculated on initial ingestion or it will never exist to compare to. And, no mater how much faster it is than any other operation, it is an additional operation.

So, again, valid request, but would need to be very carefully vetted.

September 22, 2023

The suggested approach doesn't have to be followed for the request to remain valid. Basically if the media item can be identified by name and tracked atomically then its location is less relevant and can be decoupled. This may not account for multiple versions but that shouldn't be too much of a challenge to make part of the data. To me the big question is where is most of this time spent now? Is it removing the "old" and re-acquiring the metadata for the "new"? Does it repeat any processing for thumbs, chapters, intro-skip and the like? If so then that is time worth saving if one decides to move media around.

One thing about pooling and UNC paths (or any network shares) is that OSs like Linux lose the ability for RTM.

Edited September 22, 2023 by Q-Droid

September 22, 2023

The only thing that I can repair after a library rebuild are the playlist.xml files since i have these under version control. Luckily it's easy to search and replace the <path> elements with my updated paths to fix... However, this brings up another point (and with the spirit of this thread): surely the playlists should dynamically find the path names using an id lookup in the db for example, and should not require hard coded <path> elements in case they change? This to me is another example of a feature that breaks, but shouldn't upon library rebuilds.

My library is pretty modest compared to many people but for thumbnails, metadata, video thumbnails etc it's been scanning an entire day now and I'll be going to bed soon. I'd honestly be surprised if it's finished in the morning.

I completely sympathize with @ebrin regards to the question of adding another operation in the scanning workflow. I also agree that although the frequency of complete library rebuilds might be rare on average, the energy and time wasted when it does happen, are pretty significant IMO as aforementioned. If this happens even 1 time per user, then i personally think it's worth implementing or thinking about further. In reality, i suspect it happens more than this on average.

Every time an emby user has to rebuild their library unnecessarily, the earth warms by 0.1 degrees....

Thanks for the discussion so far.

September 23, 2023

10 hours ago, embylad892746 said:

The only thing that I can repair after a library rebuild are the playlist.xml files since i have these under version control. Luckily it's easy to search and replace the <path> elements with my updated paths to fix... However, this brings up another point (and with the spirit of this thread): surely the playlists should dynamically find the path names using an id lookup in the db for example, and should not require hard coded <path> elements in case they change? This to me is another example of a feature that breaks, but shouldn't upon library rebuilds.

On the beta at least, playlists have all been moved as db objects now - you can query them like any other item. The external presence of playlists is now a .m3u file - tbh I'm not 100% why as it now contains the path still, but also another value - which is not the item id, nor a ProviderId. Strange.

@LukeWhat is the value in the m3u - 6461 in the example below ?

#EXTM3U
#PLAYLIST:PlayList_Public
#EXTINF:6461,2 Fast 2 Furious
file:\\media\Films\2 Fast 2 Furious (2003) [tmdbId=584]\2 Fast 2 Furious (2003) - WEBDL-1080p.mkv

Thanks.

September 23, 2023

10 hours ago, rbjtech said:
On the beta at least, playlists have all been moved as db objects now - you can query them like any other item. The external presence of playlists is now a .m3u file - tbh I'm not 100% why as it now contains the path still, but also another value - which is not the item id, nor a ProviderId. Strange.

@LukeWhat is the value in the m3u - 6461 in the example below ?
#EXTM3U
#PLAYLIST:PlayList_Public
#EXTINF:6461,2 Fast 2 Furious
file:\\media\Films\2 Fast 2 Furious (2003) [tmdbId=584]\2 Fast 2 Furious (2003) - WEBDL-1080p.mkv
Thanks.

Is that the runtime in seconds?

September 24, 2023

13 hours ago, Luke said:

Is that the runtime in seconds?

It is - thanks. Stange thing to add to the m3u ?

October 3, 2023

In the *arr apps you can move your media so why not in emby?

October 3, 2023

2 hours ago, lorac said:

In the *arr apps you can move your media so why not in emby?

HI, what exactly are you asking?

October 4, 2023

The ability to move media within Emby / let Emby know the media has been moved w/o triggering a full scan. Using Metadata manager would seem the logical choice.

Sign In

More intelligent library change detection

Recommended Posts

embylad892746 31

rbjtech 5000

embylad892746 31

adminExitium 295

ebr 15672

adminExitium 295

ebr 15672

Q-Droid 881

embylad892746 31

rbjtech 5000

Luke 40115

rbjtech 5000

lorac 106

Luke 40115

lorac 106

Create an account or sign in to comment

Create an account

Sign in

Activity