Jump to content

Increase thumbnail extraction concurrency?


Recommended Posts

ServerGuy
Posted

Is there a way to increase the number of files that thumbnails are extracted from concurrently?

 

I modified the thumbnail extraction interval on one library but can see its only processing one file at a time. After 24 hours it has made it to the letter E. I have another, larger, library that I will be modifying the thumbnail interval on next, but would like to increase the number of files processed concurrently before doing so. At times I see two files being processed but that only seems to be a small overlap at the end of one file and the beginning of another.

 

My overall server resource utilization is very low and I am fine with a bump in resource utilization by increasing concurrent extraction.

 

With the single file extraction running my server is only running around the following utilization:

 

Less than 13% CPU Usage

Less than 3% Active Disk time

Less than 0.02 Disk Queue on the drive storing media files

Memory usage is negligible 

 

This is a server class setup with no graphics card.

  • 1 year later...
Posted (edited)

ServerGuy you are right, its such a waste of time.

The whole day its creating BIFs - each single file after each other, for each movie, each tv show with dozen episodes. The Server is almost sleeping ...

This is only a working process for grown up movie collections over years - not creating everything for 1000 files at a time

Edited by speedy78
rbjtech
Posted (edited)

Exactly that - it's designed to be a lightweight process that is used to create thumbnails after adding media.  It was never designed to be a multi-threaded batch processor.

If you do wish to have multiple streams going - then there is nothing stopping you running multiple portable emby instances creating .BIF's on multiple libraries at the same time - assuming you store the .bif with the media.

You'll need to run a final scan when they are all done so your 'master' emby picks them all up - but that takes a fraction of the time vs the initial creation.

Edited by rbjtech
Posted (edited)

Thanks for the hint, but every coder should have thought about that case, that anybody is switching from whatever with hundreds movies to Emby and then its a lot of working time ... at least days or a week (even with a fast server) for the BIF is ridiculous !

Its not as difficult to add some multiprocessing, a least one per library. Its possible to get some settings too, to test 1-2-3-4-5 is the maximum for this server. Its a difference if I have DVDs with 4 GB files or bluray with 40 GB.

I have emby on Synology, its not possible to run multiple instances and I don´t set up multiple windows installations to get just files with BIF in less than a week, thats not the right way. Thats a software thing, it should save my time, not exhaust it with absurd work. Thats why we use software ...

Edited by speedy78
Posted

I've never used this myself, but maybe it will help?

 

Posted

Thank you, I saw similar threads, but I don´t want to try third party software - at the end there are problems with emby and I have to restart all.

Happy2Play
Posted (edited)

I don't see this changing as it is a one time process that only affects initial/new installs.  And if you are saving them with media it is a one time process so you never have to place it unless you replace the media.

But if this is added then users will complain why is my Emby so slow or not working.  There is zero chance of pleasing everyone.  And this will create more issues then it is work on low powered devices.  But to me 2-3 days to do 30+TB works for me on my PC as other usable is a factor.

Edited by Happy2Play
  • Like 2
Posted

There are more cases you are able to think about than the usual obviously ones ...

They are not complaining, if there is a setting for that - everybody can decide, some hours full power or 1 week waiting.
If I don´t change the settings, its not different to now at default.
Luke is more intelligent than a lot of people, he wouldn´t just built in a stupid rule for every server to run 5 at the same time, too.

Happy2Play
Posted (edited)

But that user that set it to full power will still complain, why does my Emby not work or is very slow.  As we have that right now with one process on some setups.

Edited by Happy2Play
Posted

There is zero chance of pleasing everyone, including the stupidest users 🙂

Its more stupid to have multiple cpu´s / cores, have multithreading, gpu-support and don´t use it for cases like this.

Happy2Play
Posted

And there is that Home user that runs a Emby Server on a Pi4.😁

  • Like 1
rbjtech
Posted (edited)

If you look at the actual process in detail - you will see that it is not the cpu/core that is the bottleneck - it is the disk I/O.

ffmpeg basically grabs a frame every 10 seconds, resizes it to a thumbnail ('easy' in performance terms) and then writes to a temporary file.  Do that for a 2hr movie and you have just created 720 image files.  The bif generator now puts all the files into one .bif file.  The I/O of writing those 720 image files is 'expensive' in terms of holding up other processes - especially if you have a single physical HDD.

re the GPU argument - There is little point trying to offload the 'scaling' to a GPU as what it is doing is 'easy' and you will spend more time loading it into the GPU than actually doing the work.

Could somebody write a batch bif generator to do the same using multiple threads using maybe a SSD or better still do it in memory - yes they could, and have - see my initial response and again just now by @roaku - but doing so within Emby is dangerous and unpredictable as while you add maybe a new TV series to your system, you will/could kill the streaming I/O in the background.

 

Edited by rbjtech
  • Like 2
Posted
19 minutes ago, rbjtech said:

you will/could kill the streaming I/O in the background.

This. Emby is a personal media server. The number one job it cannot fail at is streaming media. If generating the BIF were to put a crutch on streaming it would mean low power devices would not be ideal for Emby server. To keep the "ideal" at the right spot it must not impact the streaming side of Emby. I agree that having Emby throw everything at generating BIF would not be in the best interest of Emby and users of Emby.

  • Like 1
Posted

@rbjtech

In my case the hard disc uses only a few little MB/s - watching synology status  - most time only KB - while generating BIFs for hundreds of files.
Never saw using xx MB or more, while only generating BIF. Its not nearly on maximum, is on minimum usage.

 

@speechles

At least the first day, emby isn´t usable for big collections or while changing the movie system, because it not only must generate BIF - it downloads / generates a lot of images, too.

Therefore it doesn´t matter, If the system is slow or not - I can´t really watch something and at the end thats all a case of settings, nobody has to think about - its a user decision, there are not only stupid guys, they are able to make decisions and test the possible maximum. It makes no sense that emby uses only x% of my system possibilites.

Posted
On 3/4/2021 at 9:01 AM, speedy78 said:

In my case the hard disc uses only a few little MB/s - watching synology status  - most time only KB - while generating BIFs for hundreds of files.
Never saw using xx MB or more, while only generating BIF. Its not nearly on maximum, is on minimum usage.

Then you can install a couple portable versions of Emby.  Setup one for Movies and one for TV Shows and let them both work at the same time.  You could setup one portable version of Emby for each Library you have and let them all run assuming this doesn't cause you an IO issue.

So there is a way to pretty easily speed up BIF generation if you really need to do that.

  • Like 1
Posted

I can, but I don´t want to spend extra hours, for install a couple of portable version and set all up right, split it and a lot of stuff and maybe get others problems.

One for movies and one for tv shows - doesn´t help much, i still would take days.

I would have to split A-D / E-.. and so on - not a easy fast way.

Posted

Well I can install a portable version and config it shut down and copy it 4 or 5 times changing the system.xml for a different port.

Then fire up each one with and create one library and have it set to create bif while scanning.

I'm sure I could setup 20 libs this way in under an hour.

Posted

Thank you, nice and beautiful, but I use software like emby to spare time, not for playing around for one time testing situations or situations I have once after 3-4 years for some case.

Software should do the work, not me. I am not the only one with such problems.

You are a supporter, you would have problably done this a lot times in your spare time.

In my case I did a lot of Search & Replace to equalize things with some manual editing for correcting thats automatically not possible - and regenerating the images / BIF was the fast way for me personally - the rest is server time, not my wasted time, but annoying anyway.

  • 10 months later...
player8472
Posted
On 3/5/2021 at 9:24 PM, cayars said:

Well I can install a portable version and config it shut down and copy it 4 or 5 times changing the system.xml for a different port.

Then fire up each one with and create one library and have it set to create bif while scanning.

I'm sure I could setup 20 libs this way in under an hour.

Wouldn't they have an increased risk of creating bifs for the same item?
As far as can see, it starts the process writing into emby's cache from ffmpeg and only creates the bif afterwards (which is actually quite fast). So the other instances wouldn't realize that there is already a .bif in creation

I took a look into the command actually executed by emby:

/bin/ffmpeg -f matroska -threads 1 -skip_interval 10 -copyts -i file:<PATH TO MEDIA FILE> -an -sn -vf scale=w=320:h=180 -vsync cfr -r 0.1 -f image2 /config/cache/temp/3317ad5d772e43ec854d41f7ccb33ce3/img_%05d.jpg

wouldn't changing the "threads 1" already make the process faster?
Would be nice to get a few switches for an advanced configuration.
 

rbjtech
Posted
6 hours ago, player8472 said:

Wouldn't they have an increased risk of creating bifs for the same item?
As far as can see, it starts the process writing into emby's cache from ffmpeg and only creates the bif afterwards (which is actually quite fast). So the other instances wouldn't realize that there is already a .bif in creation

I took a look into the command actually executed by emby:

/bin/ffmpeg -f matroska -threads 1 -skip_interval 10 -copyts -i file:<PATH TO MEDIA FILE> -an -sn -vf scale=w=320:h=180 -vsync cfr -r 0.1 -f image2 /config/cache/temp/3317ad5d772e43ec854d41f7ccb33ce3/img_%05d.jpg

wouldn't changing the "threads 1" already make the process faster?
Would be nice to get a few switches for an advanced configuration.
 

This is just the ffmpeg extract part - there is also the bif creation which is not logged.

A search of this forum will lead to many ways to make this happen in parallel - I've even written a couple of scripts myself to create HDR>SDR BIF files - it's really not difficult. 

Details of how to do it are on Roku's site - as they are the originators of the BIF image container.

Posted
13 hours ago, player8472 said:

Wouldn't they have an increased risk of creating bifs for the same item?
As far as can see, it starts the process writing into emby's cache from ffmpeg and only creates the bif afterwards (which is actually quite fast). So the other instances wouldn't realize that there is already a .bif in creation

I took a look into the command actually executed by emby:

/bin/ffmpeg -f matroska -threads 1 -skip_interval 10 -copyts -i file:<PATH TO MEDIA FILE> -an -sn -vf scale=w=320:h=180 -vsync cfr -r 0.1 -f image2 /config/cache/temp/3317ad5d772e43ec854d41f7ccb33ce3/img_%05d.jpg

wouldn't changing the "threads 1" already make the process faster?
Would be nice to get a few switches for an advanced configuration.
 

That's the command for that specific file.  If you process a different file with a different container, codecs or resolution the command line could be different. 4K is different with tone mapping as well.

As @rbjtech mentioned that roughly 1/3 to 1/2 the total work as this just created a bunch of jpg file which then have to be put back together.

For the other question, I never saw a problem with different instances getting tripped up on each other. But if that worried you set one instance up to create thumbnails for all movies. Set the second instance up to only create thumbnails for TV Shows. You could fully control it pretty easily that way. I think I actually did something like that but for a different reason.  At the time I had files stored across I think 4 windows servers so I had each instance processing local content to avoid processing things across the network. 

are all bonded dual 10Gb, fiber channel or 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...