Jump to content

FR: Use HW Acceleration for Chapter Image extraction


Jdiesel

Recommended Posts

Jdiesel

Most would agree that chapter image extraction is one of the most demanding tasks Emby is required to do. It appears that Emby generates chapter thumbnails and Roku *.bif images using ffmpeg.

/var/lib/emby-server/ffmpeg/20170308/ffmpeg -i file:"/mnt/storage/media/show.avi" -threads 0 -v quiet -vf "fps=fps=1/10,scale=min(iw\,320):trunc(ow/dar/2)*2" -f image2 "/var/lib/emby-server/cache/temp/fd2c038a704f4f68b9a5c7934d9d75b8/img_%05d.jpg" 

Would it be possible to utilize hardware acceleration for the processing of thumbnails? 

Edited by Jdiesel
Link to comment
Share on other sites

anderbytes

Very good idea

 

Sent from my ASUS_Z017DA using Tapatalk

Link to comment
Share on other sites

Waldonnis

Using hardware decoding for Roku bif creation would be a huge win due to the sheer amount of frames involved.  I've been doing my own bif generation using a script and hardware decoding and, while it takes a little longer in some cases (I single-thread it), the greatly-reduced load on the system is well worth it.  Chapter frame extraction shouldn't be as demanding, but given how ffmpeg handles seeking, it might benefit quite a bit from hardware decoding (I never tried it, so I don't know).  I seriously doubt you'd gain much speed-wise, if anything, by using hardware decoding, so don't expect much in that regard.

 

I'd say that if hardware decoding is enabled, it should probably be used in both cases, though. No reason not to do so if it's being used when transcoding any way.

Link to comment
Share on other sites

@@Waldonnis i'm looking at this for the next beta.

Another enhancement that would be nice for chapters - since we know the chapter times in advance, if could put them all on the command line and extract with a single process launch that would certainly be more efficient than a separate process for every single chapter image.

Link to comment
Share on other sites

Jdiesel

Just a suggestion, it would be nice to have the option to only use HW acceleration for image extraction. I personally would like to still use software transcoding but have the ability to do create thumbnails using HW acceleration.

Link to comment
Share on other sites

Waldonnis

@@Waldonnis i'm looking at this for the next beta.

Another enhancement that would be nice for chapters - since we know the chapter times in advance, if could put them all on the command line and extract with a single process launch that would certainly be more efficient than a separate process for every single chapter image.

 

This shouldn't be hard.  Off the top of my head, something like this should work:

ffmpeg \
-ss 00:00:00 -i foo.mkv \
-ss 00:02:32 -i foo.mkv \
-ss 00:04:22 -i foo.mkv \
-map 0:v -vframes 1 foo_1.png \
-map 1:v -vframes 1 foo_2.png \
-map 2:v -vframes 1 foo_3.png

This allows for the fast seeking while still keeping it as one ffmpeg instance (it's quite fast, actually).  Downside is that it reopens the file several times, but it's a worthwhile trade-off compared to slower seeking.  I'm not sure how much faster it would be compared to just firing off several multiple ffmpegs in succession, though (worth checking out).  I also don't know if you'd have to add a hardware decoder prior to each input, but I would assume so (again, worth checking out; a high loglevel test should make it obvious). Edit: Just checked, any hardware decoder option must be replicated per input file, so just add it before each -ss.

 

On the Roku bif generation side, you can actually combine 320 and 240 width image extraction into a single command line as well so that the file is only read/decoded once if both width options are checked.  It's a rare case, I would assume, but definitely a time saver if they are as the same input frame is being scaled twice rather than reading the file twice.  You'd just need to concatenate the filter and output options of each width.  Something like this:

ffmpeg -i foo.mkv \
-vf "fps=fps=1/10,scale=min(iw\,320):trunc(ow/dar/2)*2:flags=lanczos" -f image2 "320_%08d.jpg" \
-vf "fps=fps=1/10,scale=min(iw\,240):trunc(ow/dar/2)*2:flags=lanczos" -f image2 "240_%08d.jpg"

Of course, you would still need to add the hardware decoder to the command lines if applicable.

Edited by Waldonnis
Link to comment
Share on other sites

I'm wondering if we can just deprecate 240 bif generation at this point. It is only for SD displays. We could just as easily say the feature is only available for HD and up.

 

As for chapters, I'm wondering if we should just use the same command as Roku to extract on an interval and then assign chapter images to the image that happens to be closest to the chapter point. If the interval is 10 seconds nobody's going to notice. The benefit of doing that is now we have those same images available for use as seeking previews in all apps.

 

In fact if we do that, then we can just merge the roku bif generation into the core and have them all be created by the single chapter images feature, which of course would now be renamed as something else.

Link to comment
Share on other sites

Waldonnis

I'm wondering if we can just deprecate 240 bif generation at this point. It is only for SD displays. We could just as easily say the feature is only available for HD and up.

 

As for chapters, I'm wondering if we should just use the same command as Roku to extract on an interval and then assign chapter images to the image that happens to be closest to the chapter point. If the interval is 10 seconds nobody's going to notice. The benefit of doing that is now we have those same images available for use as seeking previews in all apps.

 

In fact if we do that, then we can just merge the roku bif generation into the core and have them all be created by the single chapter images feature, which of course would now be renamed as something else.

Frankly, yes, you could deprecate 240 width bif support at this point.  I'm pretty sure the only models that couldn't ever handle 320 are already EOL'ed, and IIRC all newer models/firmware scale the interface now anyway if you're on a lower resolution display (if it's running a SG-friendly firmware rev, it scales).  That would simplify things quite a bit.

 

You could do a timed extraction like that for sure. There are drawbacks of course (mostly increased disc space use and much longer extraction time/system load if software decoding), but if that's acceptable then I see no real technical roadblocks.  I doubt the corner cases where it could get weird are common enough to care about either (sub-10sec chapters are very rare).  It's an interesting idea for sure and the result may end up being a better representative of a particular chapter than the current extraction.  I lost count of how many of my mini-series' chapter thumbs all show the same exteriour shot of the same damn building for every chapter making it harder to tell where to scrub to...

 

If you were to just do a timed extraction at a given resolution and wanted to resize for bif generation as well (width 320), then it's just a matter of concatenating the Roku-specific scaling filter/outfile options to the end of your regular-resolution extraction.  That would make it so that at least the image extraction part of generating BIFs wouldn't add any real load (and virtually no extra time) to the process.  Nice and easy  :)   If you used the same image size universally, you wouldn't even have to do a separate scaling/outfile for BIFs...but Roku's 320 pixel width is awfully tiny/detail-light in the 2160p era in my opinion.

 

Just a suggestion, it would be nice to have the option to only use HW acceleration for image extraction. I personally would like to still use software transcoding but have the ability to do create thumbnails using HW acceleration.

I'm not sure how much would be gained simply by adding hardware decoding to the current chapter extraction method.  Assuming Emby is using the faster seek method, ffmpeg is seeking to the keyframe closest to the requested seek time and only actually decoding from that point to the seek point.  At worst, you're looking at only decoding a few dozen frames per image (basically, you're at the mercy of the GOP length), so the actual decoding is only part of it.  Any speed gains seen would be dependent on the usual factors (codec, system capability/load, etc) and, for many situations, you probably wouldn't see any speed benefits at all because we're just not decoding that much of the stream.

 

Now, if Luke decides to start extracting frames on a time interval, hardware decoding will be a much larger benefit since the entire video stream needs to be decoded and we wouldn't be seeking at all.  Anyone who generates Roku BIFs currently knows that an operation like that takes much more time and has a much higher system load impact by comparison because of this (since it extracts a frame every 10secs in the video stream).  Even using the hardware blocks, decoding the entire file takes more time compared to just dumping frames at specific timestamps, but using hardware will make it so that the system load isn't impacted nearly as much.  Aside from HEVC, I haven't noticed any speed benefit to using hardware decoding for my BIF generation (if anything, it takes slightly longer), but the reduced burden on the system when decoding the entire video is well worth it.

Link to comment
Share on other sites

There is a big savings though in that with only one ffmpeg process, we are doing away with the overhead of spawning a process X number of times, the overhead of ffmpeg reading and decoding repeatedly.

 

Now as you say, that savings will be offset by the increased number of images, although if we wanted a cheap baseline, we could probably figure out an interval that completes in a similar amount of time as the current chapter image extraction method.

Link to comment
Share on other sites

Waldonnis

There is a big savings though in that with only one ffmpeg process, we are doing away with the overhead of spawning a process X number of times, the overhead of ffmpeg reading and decoding repeatedly.

 

Now as you say, that savings will be offset by the increased number of images, although if we wanted a cheap baseline, we could probably figure out an interval that completes in a similar amount of time as the current chapter image extraction method.

 

Spawning just one ffmpeg process should definitely help matters a bit, and even rereading the same file multiple times shouldn't incur the same overhead as spawning ffmpeg multiple times just to do the same operation (.NET overhead, watching the output/exit code, etc per process adds up, I would assume).  I'm not sure, though, if you could find an interval easily that would complete in a similar time period, though, since you're decoding and plucking the frames out of the decoded stream rather than just decoding small portions of it after seeking to the rough position.  The heavy lifting isn't really the seeking, it's the decoding, so the more of that you're doing, the longer it's going to take (I know you know this...I'm just doing the forum version of thinking aloud here).

 

To get a firmer idea of the time factor for my own reference, I just ran a test extracting a frame every 10secs on my last encode of Shin Godzilla since I still had it in my working directory (4.5GB h.264 file, 1:59:50 runtime @23.98 fps, so roughly 172k frames in the stream - fairly average for a modern movie).  Total elapsed time for extracting the 720 individual frames was 5m18s (software decoding; hw was slightly faster, but probably because my system is busier than normal right now).  Extracting one frame per minute (121 individual frames) for the same file took 4m53s, so the interval didn't matter much - it's that we're decoding the whole thing and dropping unwanted frames that's causing the bulk of the longer execution time.  Obviously the more frames we output, the longer it takes due to image encoding and I/O, but the extracted frame count appears to be a minor factor comparatively.  To confirm that, I extracted only I-frames from the same file rather than dealing with a specific time interval, since I know the extracted frame count will be much higher - elapsed time was 6m5s (1604 individual frames).

 

By comparison, using a single command line to extract a frame at each chapter start took only 3.4sec (15 chapters/frames in this case...although I used the US release's chapter list rather than the 29 chapters this Japanese version had to save my fingers a bit of work).  Spawning ffmpeg once per chapter took more time: 6.1sec, significantly higher than a single command line, but still far less than a frame every 10sec or even 60sec.

 

The worst I can see it getting is on initial extraction of a new library/installation, since you're most likely to be iterating over a lot of files sequentially.  My IB i5 is longer in the tooth these days, but I'd still be looking at 4-5mins per file with or without hardware assistance regardless of whether you do 10sec or 60sec intervals.  That might be a bit longer than many are comfortable with, and considering that it's sustained load at that.

Edited by Waldonnis
Link to comment
Share on other sites

@@Waldonnis, I don't suppose there's an alternative command line that could be used to do the interval based extraction using fast seeking as opposed to slow seeking?

Link to comment
Share on other sites

Waldonnis

@@Waldonnis, I don't suppose there's an alternative command line that could be used to do the interval based extraction using fast seeking as opposed to slow seeking?

 

There is, but it's going to be lengthy for an average movie.  Consider the single line for chapters, but with one input/map line pair per seek.  Even extracting one frame per minute on that same file would result in a line that has 119 seeks (and 119 -map x:v options corresponding to those).  I was going to try that earlier with a five minute interval to mimic what Emby does with a file without chapters (and to shorten the command line a bit), but was too tired.  I'll see what I can whip up today and time it when I get the chance.  I suppose you could compromise a bit if the command line gets too long and break it it up into several tasks per file.  Another option is to do extractions on time ranges (such as having a -ss per 10min then letting it run the filter that 10min segment (using -t).  Ignore that last bit, I wasn't awake and didn't think it through...it would just be the same as decoding the whole file  :blink:

 

I'll also keep looking to see if there are any other ways of doing this.  I tried earlier telling the demuxer to drop non-keyframes prior to decoding, but that didn't really help. 

Edited by Waldonnis
Link to comment
Share on other sites

Yea but as you said, that is opening the file each time per image. I was wondering if there might be another command that could do it while only opening the file once.

Link to comment
Share on other sites

Waldonnis

Yea but as you said, that is opening the file each time per image. I was wondering if there might be another command that could do it while only opening the file once.

 

Not that I can think of.  It treats any -ss that isn't before an input file as an output seek, which is much slower.  The only way I can see to make it work is to tell it to open the same file again.  It's not terrible, but I agree that it would be nice to find a way to just seek into the demuxed stream multiple times rather than having to reopen the file to do so...or without having to resort to seeking into the decoded stream.  

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...