Jump to content

Refresh people needs way longer in new release


horstepipe
Go to solution Solved by softworkz,

Recommended Posts

What would you like me to say? The people refresh is checking every xml file to see if it's changed, and then it retries people with empty metadata that haven't been retried for at least 30 days. If you have 50,000 people or 100,000 people in your database this is something that is going to take time. So probably what's different from before is that more people are "up for retry" than in the past. We might just have to go in the direction of not downloading any metadata until you actually click on the person. Then your people refresh will be very fast because there won't be one at all. Or maybe we can just stop retrying people with empty metadata but then of course that brings out another crowd of users complaining that their people have no biographies. So i guess you tell us what you would like but don't say that you'd like 100,000 movie db queries handled in 5 minutes because it just doesn't work that way. 

 

One of the problems lies in the fact that the current implementation is quantizing people updates (or rather metadata updates in general). This means that the update due times for people items are aligned to certain dates over time. That behaviour is caused by several things. The 30 days metadata cache deletion task is one example here.

 

I think I should explain that a little more:

  • Think of a library that is continuously growing during Month N:
  • Every day, 1000 items are added
  • If the refresh interval would be 30 days, then in Month N+1:
  • Every day, only 1000 items would be refreshed

This way, the load for refreshing is distributed across daily library scans.

But this situation can change easily for a number of reasons.

 

Example:

  • You perform a forced metadata refresh on a complete library
  • Now all items have the same date (of last refresh)
  • When the 30 days refresh interval has elapsed, Emby would try to update 30.000 (30 days x 1000) items at once

This could explain, what's happened to the OP: Why people updates may have been quick and suddenly became slow...

  • Like 1
Link to comment
Share on other sites

The previous scenario is not specific to people. It applies to all cached metadata. Once the refresh intervals are aligned, library scan times can suddenly increase from low or moderate to extremely high durations. We should probably find a way to distribute refresh operations over several days.

Link to comment
Share on other sites

No that's not true. there is no interval based refresh for any other content type. for everything else it's just initial + manual afterwards.

Link to comment
Share on other sites

horstepipe

One of the problems lies in the fact that the current implementation is quantizing people updates (or rather metadata updates in general). This means that the update due times for people items are aligned to certain dates over time. That behaviour is caused by several things. The 30 days metadata cache deletion task is one example here.

 

I think I should explain that a little more:

 

  • Think of a library that is continuously growing during Month N:
  • Every day, 1000 items are added
  • If the refresh interval would be 30 days, then in Month N+1:
  • Every day, only 1000 items would be refreshed
This way, the load for refreshing is distributed across daily library scans.

But this situation can change easily for a number of reasons.

 

Example:

  • You perform a forced metadata refresh on a complete library
  • Now all items have the same date (of last refresh)
  • When the 30 days refresh interval has elapsed, Emby would try to update 30.000 (30 days x 1000) items at once
This could explain, what's happened to the OP: Why people updates may have been quick and suddenly became slow...

Understood

But you agree that if I force refreshing all metadata and then start the refresh people task twice immediate, and they BOTH need that long time, that the problem has to be somewhere else?

I'll try it tomorrow...

Link to comment
Share on other sites

No, you only make it to that stage if the library item itself is being triggered for a full refresh.

Link to comment
Share on other sites

Yes, and PeopleValidator.ValidatePeople performs a full refresh if the local item is older than 30 days.

Link to comment
Share on other sites

Teddy

Luke, sorry but I'm confusing with this topic. Please answer only if the behavior from now will be as follows:
People do a manual refresh and it takes 2 hours to finish. (in my case)
Immediately I make a second manual refresh people and takes 2 hours to finish.
Immediately I make a third manual refresh people and takes 2 hours to finish....... and so.
(Previously it took between 12 minutes to finish so I think that made use of some sort of cache)

Link to comment
Share on other sites

Understood

But you agree that if I force refreshing all metadata and then start the refresh people task twice immediate, and they BOTH need that long time, that the problem has to be somewhere else?

I'll try it tomorrow...

 

Yes, you are right. What I described does not explain why two subsequent runs of the "Refresh People" task can both take the same time.

 

I investigated and debugged this a little further and found another (different) problem:

 

It's actually not the identified people that are causing the slowdown of the process - it's the unidentified people that don't have a local cache file because they were never identified.

Since they were never identified there is no existing cache file from which Emby could determine the last time of refresh and the BaseItem.DateLastRefreshed property isn't written in this case as well.

 

In turn, the provider is queried for all unidentified persons again on each run of the "Refresh People" task, no matter how often you execute it on a single day.

 

 

IMO - to avoid this, we would need to set BaseItem.DateLastRefreshed even on unsuccessful refresh operations (or rather: unsuccessful identify operations):

 

Summary: Currently we are minimizing remote provider calls via local caching and bulk downloading (for Series). But we didn't implement a limit for metadata queries on unidentified items. These are attempted to identify over and over again, every time a "Refresh People" task is run...

Edited by softworkz
  • Like 2
Link to comment
Share on other sites

horstepipe

But why does this behavior only occur on newer releases, if, as Luke says, nothing has been changed here?

Link to comment
Share on other sites

I don't know which "older" release you are comparing to which "current" release.

 

On 2016-01-23 there was some throttling introduced for MovieDb person queries in the code. That's one possible reason.

I already explained another possible reason for such a change in my previous posts (alignment of refresh dates).

Link to comment
Share on other sites

You know, I have never ran the people task in.. a year plus at least. The reason is, emby will fetch people on-demand as well, if they arent present. So you can use the app, browse around and it will fill these in as needed. There isnt really a need to run the task every day, once a month would be sufficient.

 

Just though I would throw that in as it appears the task is being depended on too heavily, when instead use an app and browse to the new movies then the on-demand task can do its job too.

 

Sent from my Nexus 7 using Tapatalk

Link to comment
Share on other sites

You know, I have never ran the people task in.. a year plus at least. The reason is, emby will fetch people on-demand as well, if they arent present. So you can use the app, browse around and it will fill these in as needed. There isnt really a need to run the task every day, once a month would be sufficient.

 

Just though I would throw that in as it appears the task is being depended on too heavily, when instead use an app and browse to the new movies then the on-demand task can do its job too.

 

Sent from my Nexus 7 using Tapatalk

 

I totally agree and it's probably a better direction for us to start moving in.

Link to comment
Share on other sites

I totally agree and it's probably a better direction for us to start moving in.

For example, go to "cast & crew" button on the roku. Then it loads empty white images. This is when I know, be patient... wait. If all people are missing, and there are maybe 20 total, it might take a minute to fill them all in with images.

 

This is acceptable to me. If most are there, but two or three people missing it might take 20 seconds until all their images pop in. This is actually preferable to me since it reduces people you may never ever navigate to in an app. This means not having to download images that arent useful, that are just taking up space for no reason. Probably 80% of people downloaded are never used. Practically means on-demand is the best approach to alleviate this problem.

 

There should be a task for users with limited internet, think satellite users. There is a free bandwidth option with these usually midnight to five am or similar. The task can be scheduled to fetch during this "free internet" time.

 

So having an option to still run the task manually would still be preferred for that one size fits all approach. I prefer on-demand updates, some prefer the people task, some others use radeons people fetcher plugin. To each his own. Having this myriad of ways to obtain the data is one of the strong points of emby.

 

The best approach is have the people task educate users in some text, that people are also updated on-demand when apps request missing images.

 

Sent from my Nexus 7 using Tapatalk

Edited by speechles
Link to comment
Share on other sites

dcook

I am running: Version 3.0.5972.0

 

I have the Refresh People task running once a day, it has always been around 12 minutes

Link to comment
Share on other sites

I am running: Version 3.0.5972.0

 

I have the Refresh People task running once a day, it has always been around 12 minutes

Perhaps you rarely add new items? Have a relatively small library? Use radeons people downloader? Why do you want people downloaded you may never utilize? Why would you want that? I can see why those on limited internet may. Do you have limited internet?

 

The main point was for people who do have the task take a ludicrous amount of time to complete to instead use the on-demand task. Use an emby app then browse to the new items, check out their cast and crew. Thats all it takes. It doesnt require them spending 15 hours to scan their people all over again completely.

 

Sent from my Nexus 7 using Tapatalk

Link to comment
Share on other sites

Guest asrequested

You know, I have never ran the people task in.. a year plus at least. The reason is, emby will fetch people on-demand as well, if they arent present. So you can use the app, browse around and it will fill these in as needed. There isnt really a need to run the task every day, once a month would be sufficient.

 

Just though I would throw that in as it appears the task is being depended on too heavily, when instead use an app and browse to the new movies then the on-demand task can do its job too.

 

Sent from my Nexus 7 using Tapatalk

 

Great idea!

 

Also take a look at this option

 

57a2a87fb3294_Snapshot_199.jpg

Edited by Doofus
Link to comment
Share on other sites

I totally agree and it's probably a better direction for us to start moving in.

 

While I agree to this as well, we should still find a way to limit repeated identification of items.

 

@Luke: Do you need more information about this scenario?

Link to comment
Share on other sites

dcook

I have just over 18TB of movies and tv

I add content almost every day

I have unlimited internet with 130MB download an 10MB upload

 

I didn't setup the scheduled task for Refresh People to run daily, I believe that was default setting when I installed Emby

The only tasks I changed was to have my Library Scan run more often, all the other tasks such as Refresh People and Clean Database are default.

 

I was merely pointing out that with my version of Emby, the Refresh People task has always taken about 12 minutes

 

 

 

 

Perhaps you rarely add new items? Have a relatively small library? Use radeons people downloader? Why do you want people downloaded you may never utilize? Why would you want that? I can see why those on limited internet may. Do you have limited internet?

The main point was for people who do have the task take a ludicrous amount of time to complete to instead use the on-demand task. Use an emby app then browse to the new items, check out their cast and crew. Thats all it takes. It doesnt require them spending 15 hours to scan their people all over again completely.

Sent from my Nexus 7 using Tapatalk

Link to comment
Share on other sites

horstepipe

I am running: Version 3.0.5972.0

 

I have the Refresh People task running once a day, it has always been around 12 minutes

I bet 10 bucks that if you update to 3.0.6xxx you'll have the same problem.

I didn't have the problem on 3.0.5xxx either and I still feel confident that the problem only occurs on newer versions.

Edited by horstepipe
Link to comment
Share on other sites

horstepipe

just downgraded to v3.0.5986 and here the refresh is quick as usual.

I'll upload both log files when the refresh is complete

Link to comment
Share on other sites

horstepipe

v3.0.5986 just needed 20 minutes

v3.0.6020 needed 16 hours.

 

Here are the log files

 

 

edit: problem persists on 3.0.6030

Edited by horstepipe
  • Like 3
Link to comment
Share on other sites

shorty1483

Cannot give logs like horstepipe but my comparison is also ca. 5 mins to 1h15mins. And I run the task twice a day automatically since I set up Emby server.

Edited by shorty1483
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...