Jump to content

Suggestion for BIF file generation


HawkXP71

Recommended Posts

HawkXP71

The speed up for the custom switch -skip_interval is great, and makes a tremendous difference in runtime improvement but I have another potential time saver

Use hardware decoding.
I tested this on two machines
* i9-12950HX with 64gb ram nVidia RTX A1000 running Windows 11
* Synology 920+

Movie : Oblivion, 1920x804, hevc, ffprobe reports a bitrate of 2.28 MB/s, 2gb in size

Settings Runtime (hh:mm:ss)
No-Skip Interval 00:17:24 (150-200fps)
w/ skip interval 00:00:20
w/ -hwaccel cuda  00:00:05
w/ -hwaccel auto  00:00:10
w/ -hwaccel dxva 00:00:08

QSV apparently doesnt use the GPUs' decoder, but dxva 2 uses the CPU based GPU, auto also used the intel on chip GPU however, it was always 1 or 2 seconds slower. 
All hwaccel also used skip interval

Same movie, but on my Synology 920+ with 16gb ram

Settings Runtime (hh:mm:ss)
No-Skip Interval NA (didnt bother)
w/ skip interval 00:00:57
   

I wasnt able to get the hwaccel option to work on the Synology box with skip interval, it works fine for transcoding.  I am using the emby supplied version of ffmpeg.

For windows, its clear its a 4x speed up for nvidia and 2x for intel (I dont have an amd CPU or GPU to test with),  for synology I would expect a similar improvement if the devs can figure out the hwaccel options

Just a thought

 

  

Link to comment
Share on other sites

HawkXP71

I used the exact same ones shown in the log file, I'm not at my laptop now but I'll get the options to you in an hour or so. 

  • Thanks 1
Link to comment
Share on other sites

HawkXP71

For the non-hardware accelerated
time ~/bin/emby/ffmpeg -hide_banner -f matroska -threads 1 -skip_interval 10 -copyts -i C:/MGCNoScan/sb/github/towel42-com/Oblivion.mkv -an -sn -vf scale=w=320:h=134 -vsync cfr -f image2 C:/MGCNoScan/sb/github/towel42-com/TempDir-AfMifY/img_%05d.jpg

For the hardware accelerated using cuda 
time ~/bin/emby/ffmpeg -hide_banner -f matroska -hwaccel cuda -threads 1 -skip_interval 10 -copyts -i C:/MGCNoScan/sb/github/towel42-com/Oblivion.mkv -an -sn -vf scale=w=320:h=-2147483648 -vsync cfr -f image2 C:/MGCNoScan/sb/github/towel42-com/TempDir-AfMifY/img_%05d.jpg

Which I got straight from the logs with the minor change or removing the -r 0.1

Edited by HawkXP71
Link to comment
Share on other sites

@HawkXP71In most cases, hw acceleration doesn't provide any better performance in combination with skip_interval. The reason is that skip_interval tries to use the nearest key frames to the target positions and decoding keyframes is trivial and needs no hw acceleration.

There are some cases - and I'm sure yours is one of those - where a video is encoded with sparse key frames, typically in case of HEVC encoded files - but not all HEVC files are like that. When key frames are rare, it is required to decode a sequence of images starting from the last key frame before the target (or even earlier in case the video uses back-references). In such cases, it is possible to see improvements with hw acceleration enabled. 

We've done many tests with different kinds of files and these have shown that there's no benefit in using hw decoding for image extraction. I just did two tests again right now with the same two command lines on two files:

A movie of almost 3h length (H264/HD):

With SW decoding: elapsed=00:00:41.94 frame= 1014 fps= 24 q=0.0 Lsize=N/A time=02:46:20.00 bitrate=N/A dup=0 drop=78 throttle=off speed= 238x
With Nvidia decoding: elapsed=00:01:01.20 frame= 1014 fps= 17 q=0.0 Lsize=N/A time=02:47:10.00 bitrate=N/A dup=0 drop=78 throttle=off speed= 164x

This means that it took 45% more time with HW decoding

 

With the BigBuckBunny video:

With SW decoding: elapsed=00:00:04.35 frame=   64 fps= 15 q=0.0 Lsize=N/A time=00:10:00.00 bitrate=N/A dup=0 drop=28 throttle=off speed= 138x
With Nvidia decoding: elapsed=00:00:05.00 frame=   64 fps= 13 q=0.0 Lsize=N/A time=00:10:10.00 bitrate=N/A dup=0 drop=28 throttle=off speed= 122x

 

Some more notes:

  • You cannot remove the -r 0.1 parameter. Sometimes it make no difference, but for certain file types it's crucial that it's there
  • Image extraction is a background operation and not intended to max out your system, it is rather intended to run while you are using your Emby server without causing any interruption to the experience, which is another reason for not using hw acceleration as it might impact your viewing experience

 

Please see the following conversations for a more in-depth discussion:

 

 

Link to comment
Share on other sites

HawkXP71
2 minutes ago, softworkz said:

@HawkXP71In most cases, hw acceleration doesn't provide any better performance in combination with skip_interval. The reason is that skip_interval tries to use the nearest key frames to the target positions and decoding keyframes is trivial and needs no hw acceleration.

There are some cases - and I'm sure yours is one of those - where a video is encoded with sparse key frames, typically in case of HEVC encoded files - but not all HEVC files are like that. When key frames are rare, it is required to decode a sequence of images starting from the last key frame before the target (or even earlier in case the video uses back-references). In such cases, it is possible to see improvements with hw acceleration enabled. 

We've done many tests with different kinds of files and these have shown that there's no benefit in using hw decoding for image extraction. I just did two tests again right now with the same two command lines on two files:

A movie of almost 3h length (H264/HD):

With SW decoding: elapsed=00:00:41.94 frame= 1014 fps= 24 q=0.0 Lsize=N/A time=02:46:20.00 bitrate=N/A dup=0 drop=78 throttle=off speed= 238x
With Nvidia decoding: elapsed=00:01:01.20 frame= 1014 fps= 17 q=0.0 Lsize=N/A time=02:47:10.00 bitrate=N/A dup=0 drop=78 throttle=off speed= 164x

This means that it took 45% more time with HW decoding

 

With the BigBuckBunny video:

With SW decoding: elapsed=00:00:04.35 frame=   64 fps= 15 q=0.0 Lsize=N/A time=00:10:00.00 bitrate=N/A dup=0 drop=28 throttle=off speed= 138x
With Nvidia decoding: elapsed=00:00:05.00 frame=   64 fps= 13 q=0.0 Lsize=N/A time=00:10:10.00 bitrate=N/A dup=0 drop=28 throttle=off speed= 122x

 

Some more notes:

  • You cannot remove the -r 0.1 parameter. Sometimes it make no difference, but for certain file types it's crucial that it's there
  • Image extraction is a background operation and not intended to max out your system, it is rather intended to run while you are using your Emby server without causing any interruption to the experience, which is another reason for not using hw acceleration as it might impact your viewing experience

 

Please see the following conversations for a more in-depth discussion:

 

 

Thanks for all the info. 

The reason I went down this rabbit hole, is about a year ago I wanted to learn more about the big format, so I created a windows preview add in.  

So I also created a bif generator, that used your command line (and the Emby shipped ffmpeg for the skip interval) to create the stills of the image. 

The app I have, also does transcoding, and I went and tried the hardware decoding. 

The missing -r was a mistake, but the images (for my movie, with admittedly limited experiments) were the same (using a binary diff compare) 

But I saw significant improvements in runtime so I figured I would bring it up here. 

I'm surprised by hw decoding taking longer, I've only seen that (again much less experience here than you guys, and I do defer to your expertise) when the codec isn't supported by the gpu, which h264 should have no issues with.

Since from the transcode logs, I see that most of the actual transcoding (not image extraction) uses hardware decoding. 

Would the two videos you are using as examples transcode for streaming better with a sw decode? 

 

Tia. 

 

Link to comment
Share on other sites

8 minutes ago, HawkXP71 said:

I'm surprised by hw decoding taking longer

What's expensive is the data transfer between CPU and GPU memory and with your command line, the images are still scaled in software, so the full uncompressed 4k frames need to be transferred continuously and an uncompressed 4k frame takes like 30-50MB. For playback, we are doing the scaling in hw as well and ideally also the encoding and everything in-between, so that only encoded video stream data is transferred but not raw frames.

Also, HW decoding works more proactively/decoupled, so despite the -threads 1 setting, it often decodes many more frames than in case of sw decoding, where we can precisely control which frames to decode and when to stop and seek to the next one, so HW decoding does a lot of unnecessary work in this scenario (skip_interval).

 

16 minutes ago, HawkXP71 said:

Since from the transcode logs, I see that most of the actual transcoding (not image extraction) uses hardware decoding. 

Would the two videos you are using as examples transcode for streaming better with a sw decode? 

No - not at all.

It's important to understand how video compression is working. To make it very simple: There are key frames in a video which are like a jpeg: Simple to decode. Let's assume you have a keyframe interval of 3 seconds and 24fps. This means that in those 3 seconds (72 video frames) there's one jpeg and the other 71 frames are encoded as differences to the key frame and the previous frames. 

As a result: The key frames are easy to get (like decoding a jpeg), but decoding the other 71 frames requires complex calculations, and that's where hw decoding is useful.
But it's not useful if you want to get only the keyframes but not the frames in-between.
That's the difference to playback.

(it's not really jpeg - that was just for illustration)

Link to comment
Share on other sites

BTW: I wouldn't invest too much into BIF. It's a dying format. Even its inventors (Roku) have deprecated it and are now recommending the HLS method for providing thumbnail images.
Sooner or later, Emby move away from it as well. 

Link to comment
Share on other sites

HawkXP71
1 minute ago, softworkz said:

BTW: I wouldn't invest too much into BIF. It's a dying format. Even its inventors (Roku) have deprecated it and are now recommending the HLS method for providing thumbnail images.
Sooner or later, Emby move away from it as well. 

It was more for intellectual curiosity than anything else. 

 

  • Like 1
Link to comment
Share on other sites

Just now, HawkXP71 said:

Makes perfect sense.  I didn't realize the skip interval worked to get the key frame. 

It tries to - as much as possible - so it's not as exact as the classic method, but this doesn't matter for the purpose.

Link to comment
Share on other sites

1 minute ago, HawkXP71 said:

It was more for intellectual curiosity than anything else. 

One of the drawbacks is that it needs to pre-exist for the whole file, which makes it unsuitable for live TV for example.

Link to comment
Share on other sites

HawkXP71
Just now, softworkz said:

It tries to - as much as possible - so it's not as exact as the classic method, but this doesn't matter for the purpose.

Yeah, if it's off by a millisecond or two, who cares when it's a 10 second clip.... 

Link to comment
Share on other sites

HawkXP71
1 minute ago, softworkz said:

One of the drawbacks is that it needs to pre-exist for the whole file, which makes it unsuitable for live TV for example.

Makes sense.  Reading about the HLS system now. 

Link to comment
Share on other sites

2 minutes ago, HawkXP71 said:

Yeah, if it's off by a millisecond or two, who cares when it's a 10 second clip.... 

You cannot position the slider by that precision anyway when scrubbing/seeking.

Link to comment
Share on other sites

HawkXP71
1 minute ago, softworkz said:

You cannot position the slider by that precision anyway when scrubbing/seeking.

Yep.  Understood. 

 

Link to comment
Share on other sites

rbjtech

Just to add when @Cheesegeezerand I made the HDR>SDR BIF Generator plugin (tonemapping) - we did experiments on hw encoding as well and found very little difference.  Could have been the speed of my cpu cloaked the advantages but frankly there were too many variables to warrant full testing.   It was writing the temp images at a very fast rate anyway - so the bottleneck may have even been the disk I/O.

Anyway - all in the past now, as the tonemapping is now done in the core for the BIF images.

  • Like 3
Link to comment
Share on other sites

Cheesegeezer
26 minutes ago, rbjtech said:

Just to add when @Cheesegeezerand I made the HDR>SDR BIF Generator plugin (tonemapping) - we did experiments on hw encoding as well and found very little difference.  Could have been the speed of my cpu cloaked the advantages but frankly there were too many variables to warrant full testing.   It was writing the temp images at a very fast rate anyway - so the bottleneck may have even been the disk I/O.

Anyway - all in the past now, as the tonemapping is now done in the core for the BIF images.

Is it in Release version or just beta, i can’t remember… but if its in the release, we should remove it from The plugin. 
 

Link to comment
Share on other sites

Cheesegeezer
32 minutes ago, Cheesegeezer said:

Is it in Release version or just beta, i can’t remember… but if its in the release, we should remove it from The plugin. 
 

@softworkzcan you answer this for me. Cheers

Link to comment
Share on other sites

12 hours ago, Cheesegeezer said:

@softworkzcan you answer this for me. Cheers

There was a phase where Luke had tried to update the release more often, so I'm not sure whether he has backported tone mapping for image extraction during that phase.

@Luke?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...