Jump to content

Performance difference between embedded ffmpeg in docker images for 3.5.3.0 and 4.0.1.0


ken-ji

Recommended Posts

Just noticed this in 4.0.1.0 docker container while trying to watch something in Hi10p with subtitles

frame= 1659 fps= 16 q=-0.0 size=N/A time=00:03:18.24 bitrate=N/A speed=1.92x
frame= 1664 fps= 16 q=-0.0 size=N/A time=00:03:18.31 bitrate=N/A speed=1.91x
frame= 1671 fps= 16 q=-0.0 size=N/A time=00:03:18.81 bitrate=N/A speed= 1.9x
 
So I shutdown the container, started a new container with the last stable version 3.5.3.0:
frame= 1596 fps= 80 q=23.0 size=N/A time=00:01:06.85 bitrate=N/A speed=3.37x
[segment @ 0x7c4740] Opening '/transcoding/transcoding-temp/7bfe456162a15e3463f845961edcc264.m3u8.tmp' for writing
[segment @ 0x7c4740] Opening '/transcoding/transcoding-temp/7bfe456162a15e3463f845961edcc26422.ts' for writing
frame= 1645 fps= 81 q=23.0 size=N/A time=00:01:08.90 bitrate=N/A speed=3.38x
[segment @ 0x7c4740] Opening '/transcoding/transcoding-temp/7bfe456162a15e3463f845961edcc264.m3u8.tmp' for writing
[segment @ 0x7c4740] Opening '/transcoding/transcoding-temp/7bfe456162a15e3463f845961edcc26423.ts' for writing
frame= 1691 fps= 81 q=23.0 size=N/A time=00:01:10.95 bitrate=N/A speed= 3.4x
 

Why is there such a great disparity in performance? files I could play without issue in the previous stable no longer play without stalling...

 

For quick reference, I'm running on a Pentium G4620 with /dev/dri mapped into the docker

ffmpeg-transcode-3.5.3.0.txt

ffmpeg-transcode-4.0.1.0.txt

Link to comment
Share on other sites

Strange, seems like the only difference is the chosen vaapi device. Can you play from the beginning on 4.0 and provide a log? Just to rule that out.

 

Can you also attach the hardware detect log. Thanks.

Link to comment
Share on other sites

What Emby app are you playing from? The other difference is the technique being used to burn in the subtitles.

Link to comment
Share on other sites

The initial log snippets are using the Roku client

The full log, was using the browser client as i generated the logs remotely.

Edited by ken-ji
Link to comment
Share on other sites

Ok so in your case you will need to keep that unchecked. When it is checked, we don't burn them in and instead extract them as text and render them directly in the video player. However, if you're not seeing subtitles, that means the extraction is too slow in your environment which means you need to disable it.

 

If you also turn off the subtitles prior to playback, how does that perform?

Link to comment
Share on other sites

Its actually very fast when subtitles are turned off... about the same speed as in 3.5.3

Maybe there's some issue with ffmpeg rendering the subtitles?

Link to comment
Share on other sites

I don't think there's an issue, it's just that burning in subtitles is a very expensive process. It brings quite a bit of additional overhead. If you would consider using external text subtitles instead this problem then goes away because the subtitles can be delivered to the video player separately.

Link to comment
Share on other sites

My point is that in 3.5.3 I could transcode with subtitles burned in running at 90fps while in 4.0 it goes below 30fps. It's like in 3.5.3 there's next to no overhead with subtitles burned in. But there's a huge perf drop in 4.0

Link to comment
Share on other sites

The problem is the gpu transcoding failed in 3.5 and then we automatically switched to CPU transcoding in order to allow the video to play. You can see this in your 3.5 log:

[Parsed_subtitles_2 @ 0x23826c0] Shaper: FriBidi 1.0.4 (SIMPLE)
[Parsed_subtitles_2 @ 0x23826c0] Loading font file '/config/fonts/DroidSansFallback.ttf'
[Parsed_subtitles_2 @ 0x23826c0] Using font provider fontconfig
Impossible to convert between the formats supported by the filter 'Parsed_hwupload_1' and the filter 'auto_scaler_1'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:1
[aac @ 0x7d8600] Qavg: 28048.062
[aac @ 0x7d8600] 2 frames left in the queue on closing
Conversion failed!

So now in 4.0 we have corrected these problems so that the GPU can be used, however in your case, at least for this one example, you are actually better off with CPU.

Link to comment
Share on other sites

What I don't get is why the CPU encoding is very fast in 3.5.3 when turning off HW acceleration in 4.0 is very slow.

I'll look into making sure I've accounted for all the variables and differences (and looking for other test files)

Link to comment
Share on other sites

CPU should be the same in both releases, but if you can provide log examples for comparison then i can provide some explanation. In post #1 you already have a CPU log of 3.5.3, so if you could provide one from 4.0+ under identical conditions that would be great, thanks.

Link to comment
Share on other sites

Right so just to be sure everything is as identical as I can make it, I made two new containers, one for 3.5.3 and one for 4.0.1

I grabbed the transcode logs for both of them supposedly using HW acceleration VAAPI.

I also turned off the subtitles in the same file for 4.0.1 so that I could also compare the raw transcode speeds agains 3.5.3 (as it seems to be just a tad faster than 3.5.3 with subtitles)

 

ffmpeg-transcode-emby353.txt

ffmpeg-transcode-emby4.txt

ffmpeg-transcode-emby4-nosubs.txt

Link to comment
Share on other sites

I already explained this to you. 3.5 is not actually using vaapi even though you have it turned on. It is trying to use vaapi, then failing and automatically switching to CPU.

 

That makes this comparison apples to oranges.

 

The only comparison you could make would be to disable hardware acceleration on both.

Link to comment
Share on other sites

I already explained this to you. 3.5 is not actually using vaapi even though you have it turned on. It is trying to use vaapi, then failing and automatically switching to CPU.

 

That makes this comparison apples to oranges.

 

The only comparison you could make would be to disable hardware acceleration on both.

 

 

So I disabled acceleration for both. and the results are almost the same with each other which should be expected, except for the fact 3.5.3 w/o hw acceleration is a lot slower than the one with hw acceleration turned on but failed and continued to use CPU according to you.

 

I'm afraid I'm not sure what you're asking about?

 

Your latest logs are both without hwa (CPU only) and both with subtitle burn-in. 

The transcoding speeds are almost identical, just as @@Luke had said.

Link to comment
Share on other sites

from his latest logs:

 

Log from 3.5:

frame=34094 fps= 49 q=-1.0 Lsize=N/A time=00:23:42.01 bitrate=N/A speed=2.06x

Log from 4.0

frame=14065 fps= 52 q=28.0 size=N/A time=00:09:47.30 bitrate=N/A speed=2.16x
Link to comment
Share on other sites

Yes, @@Luke states the original log for 3.5.3 with HWA turned on didn't use HWA due to some incompatibility during the transcode.

But I didn't get the same performance when I turned off HWA.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...