Jump to content

Schedule for lightweight library scan


HawkXP71

Recommended Posts

HawkXP71

This came out of a discussion I was having on how to optimize CPU usage on my NAS.

Right now, for real time monitoring, if you add a "large" set of media at one time, it can really slow down the server.  

What if Emby had an option of "Lightweight Library Scan" during these hours?  I would expect this to be for "all libraries". Where if enabled, the user would have to set the hours that are for lightweight add, others would be heavyweight scans

Heavyweight, would follow the current library settings.

Lightweight, would disable the CPU/GPU high load steps of the library scan.  For instance, no Thumbnail generation (BIF generation), Disabling images being loaded, maybe only the primary?  No marker analysis for TV shows.

Essentially in lightweight mode, only the metadata analysis is done to add the file to the library, so the user could watch if they wanted immediately, even if with limited functionality (no thumbnails or markers). 

Thoughts?

Link to comment
Share on other sites

GrimReaper

Why don't you just disable BIF extraction when media added and schedule its task together with intro detection dufing off-hours, like nighttime? 

Edited by GrimReaper
Link to comment
Share on other sites

HawkXP71
8 minutes ago, GrimReaper said:

Why don't you just disable BIF extraction when media added and schedule its task together with intro detection dufing off-hours, like nighttime? 

Usually I do, but sometimes I forget.  Hence making it an option.  In a similar fashion synology has off hours for its indexing service, or its data scrubbing.

Link to comment
Share on other sites

GrimReaper
2 minutes ago, HawkXP71 said:

Usually I do, but sometimes I forget.

Not sure what you mean by that, once you configure both of those options to "as scheduled task only" as opposed to "as scheduled task and when media is added", what's there to forget? 

Link to comment
Share on other sites

HawkXP71
2 minutes ago, GrimReaper said:

Not sure what you mean by that, once you configure both of those options to "as scheduled task only" as opposed to "as scheduled task and when media is added", what's there to forget? 

Im saying, most times I want it done immediately, since I do most of my media stuff at night.  

So I will set it for "as scheduled task" and then turn it back on to "task an when media is added"

Link to comment
Share on other sites

GrimReaper

Tbh, personally don't see any benefits besides introducing another layer of complexity and more troubleshooting issues, as everything requested can already be achieved: keep on schedule only and when desired, run task manually, it's 3-4 clicks away, no need to constantly mess with library options. But Devs might like the idea, one never knows. 

Link to comment
Share on other sites

rbjtech

So a couple of options here as we discussed

a) move to a holding area first.  if you wanted to, you could even make this a 'bare bones' emby library - no bif's, no metadata etc - you can play from here if you like.  At night, move the files to the real library where it then generates all the bells and whistles.  Anything watched, remains watched as it's done by it's provider Id.

b) Keep the bif's etc on scheduled Only.  If you want to force it - then just refresh Metadata when all the files are available.

c) If you are handy with scripting, then use the API to force updates etc when the times suit you.   Look for users streaming for example and if idle - start processing.

I do like the idea of 'holding processing' if users are streaming as an option - I think that could be a good thing, especially if you are adding large amount of new items when the server is loaded. 

Link to comment
Share on other sites

The normal scan is intended to be the lightweight scan, depending on how you configure library options.

Link to comment
Share on other sites

HawkXP71
On 3/1/2023 at 11:17 AM, Luke said:

The normal scan is intended to be the lightweight scan, depending on how you configure library options.

Having seen this problem on a bunch of unofficial emby pages (facebook and reddit).  The problem being "I added my library and its taking forever to show up" or "I added a bunch of files and its taking forever to show up".

My thoughts were if the library scan were lightweight during the schedule, even if by default it had a rather heavy weight scan, it would do a lightweight scan during those hours, so people wouldnt appear to be waiting so long for the library to be populated.

When I had to rebuild my TV library, when trying to debug the missing "Missing Episodes" for South Park, it took over 6 hours for the rescan to show all the episodes.

My recommendation, if having "heavy vs lightweight" based on a schedule, what about doing something to change the order of the scan.

For me, from my experience, it appears the real delay is the thumbnail generation as well as the markers analysis for TV episodes.

It looks like the flow is as follows:

  1. Find the changed/new files based on the changed directories
  2. Determine the metadata
  3. For TV Libraries
    • Group episodes into Seasons and add the show and Season 
  4. For Movie Libraries
    • Add a subset (possibly all the newly found movies, but not always) of the movies
  5. For each file new file (in the current season for TV libraries, the current subset for movies)
    • Create each BIF file
    • Find images for the media file
    • Analyze markers
  6. For TV Libraries
    • Create the next season and repeat Step 4
  7. For Movies
    • Add the next subset and repeat step 4

The problem with this flow, is it appears to be taking forever to add very large libraries.

My recommendation, take it for what its worth, changing the flow to the following would give the appearance of Emby working much faster is the following:

  1. Find the changed/new files based on the changed directories
  2. Determine the metadata
  3. Add all movies, shows and episodes into the database, with no images
    1. Including Show/Season setup
  4. Find thumbnail images for newly added items
  5. For each newly added file
    • Find all images
  6. For each newly added file
    • Create BIF File and Markers

By doing it this way, the episodes and movies will show up quickly. 

If a user clicks a movie/episode that doesnt have the images yet, that media item should get moved to the front of the queue.  

I do think it would really improve the appearance of the library scanning runtime.  



 

Link to comment
Share on other sites

Quote

thumbnail generation as well as the markers analysis for TV episodes.

Right, so configure these options to only run during the scheduled task and then you will have your faster library scan.

I understand that you're resisting this idea a little bit, but adding another scan variation would do more harm than good in terms of the confusion that it would cause.

Link to comment
Share on other sites

HawkXP71
28 minutes ago, Luke said:

Right, so configure these options to only run during the scheduled task and then you will have your faster library scan.

I understand that you're resisting this idea a little bit, but adding another scan variation would do more harm than good in terms of the confusion that it would cause.

I get your point.  I just see this as pretty common issue/question people have. I know why its slow, Im just saying many dont.  They dont realize the ramifications of the library setup options in the time spend scanning a library.

  • Thanks 1
Link to comment
Share on other sites

Quote

They dont realize the ramifications of the library setup options in the time spend scanning a library.

Yes I agree, even when we mention it in the help text.

Link to comment
Share on other sites

HawkXP71
3 minutes ago, Luke said:

Yes I agree, even when we mention it in the help text.

I'm not really suggesting its changed to be faster, but rather it will appear faster. 

Link to comment
Share on other sites

rbjtech

Trying to do these tasks sequentially just leads to the perception that the library scan is slow - emby needs to change that imo.

 

I personally agree with the view this is a poor way to do the initial / large addition scan - having rebuilt emby MANY times - I do it manually the way suggested above.

 

It makes logical sense to just stack the 'processes' in order of importance, regardless of the options set.  Maybe have a simple 'initial scan' flag on each library ?

1. Scan the file with minimal metadata to completion.  Files will now 'show' fast giving the user the ability to USE emby or see progress.

2. Scan the library AGAIN - effectively using the 'Missing metadata' option - and it now creates the BIF files. (assuming these options have been set of course)

3. HOLD the core Intro detection option until both the above are complete.

 

Reporting of the progress is also very poor - short of tailing the log file, people just see maybe the % bar on the library (which is usually totally incorrect, jumping to 90+ % and then staying there for hours..) - reporting accurate progress might go some way to alleviating the perception that emby is 'slow' - when infact it is not. 

  • Like 1
Link to comment
Share on other sites

HawkXP71
8 hours ago, rbjtech said:

Trying to do these tasks sequentially just leads to the perception that the library scan is slow - emby needs to change that imo.

 

I personally agree with the view this is a poor way to do the initial / large addition scan - having rebuilt emby MANY times - I do it manually the way suggested above.

 

It makes logical sense to just stack the 'processes' in order of importance, regardless of the options set.  Maybe have a simple 'initial scan' flag on each library ?

1. Scan the file with minimal metadata to completion.  Files will now 'show' fast giving the user the ability to USE emby or see progress.

2. Scan the library AGAIN - effectively using the 'Missing metadata' option - and it now creates the BIF files. (assuming these options have been set of course)

3. HOLD the core Intro detection option until both the above are complete.

 

Reporting of the progress is also very poor - short of tailing the log file, people just see maybe the % bar on the library (which is usually totally incorrect, jumping to 90+ % and then staying there for hours..) - reporting accurate progress might go some way to alleviating the perception that emby is 'slow' - when infact it is not. 

Exactly.. Its 100% about perception.  Also, its a matter of what the user needs vs what is possible.

They need all the metadata download, analyzed sorted etc etc quickly.

They dont need nor could they even use, all the  bif files.  They cant even use them all at once.  I would 100% sacrafice thumbnails while watching the first 2 or 3 episodes/movies while they were being generated in the background.  Same for episode markers.



 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...