Jump to content

Limit Hardware Transcode Streams?


Recommended Posts

Posted

Oh that repo was indeed replaced with a private one when I took over, it's now a simple dockerfile based off of a busybox image.

Posted

Haha, I thought so. Nothing in there seemed to match with what I was seeing inside the container 😂

Posted

After discussing with our ffmpeg developer, it seems AMF does not yet meet all our requirements and will have to wait a bit longer.

However, I just pushed a new beta version that includes the mesa drivers (r600 and radeonsi), and a fully upgraded vaapi stack, could you give it a try? The mesa drivers should largely be the same as the amdgpu drivers but older cards may not be up to speed yet depending on your host's kernel. I'm trying to resuscitate an old laptop so I can give them a try myself, while I wait for the new graphics card.

I will also soon expose an env variable named LIBVA_DRIVER_NAME which will let you override which driver is used. Since libva may incorrectly default to r600, it can be useful to force radeonsi.

  • Like 1
Posted

Hey, that's amazing news!

I'll be all over this in an hour or two, just have to get my offspring to bed and prep a couple of things for tomorrow.

I appreciate your work on this, it's really cool.

Is there anything in particular you'd like me to test?

I know the Linuxserver.io Emby docker with Mesa drivers I tested failed to hardware transcode hevc sources, so I will make sure to test that. And I will also test transcoding multiple streams simultaneously.

Posted

@alucryd So I just managed to do some testing, nice work!

I switched the docker to the latest tag, added the device /dev/dri, and added my 'video' group id to the GID list.

It detected the codecs without a problem.

I did 3 main transcode tests, all were to native Android and Android TV clients, transcode throttling was turned on in the settings:

  1. I transcoded a 1080p h264 source to h264 @ 5Mbps - Playback was immediate and great, transcoded at like 150fps.
  2. Next I tried a 4k HEVC source to h264 @ 5Mbps - Playback was fast and great, transcoded at ~100fps.
  3. Finally, I tried 3 x simultaneous 4k HEVC source to h264 @ 5Mbps - Each was started a minute or two apart. Playback was fast and there was no stuttering, transcoded at roughly 30fps. I left transcoding of the 3 streams going for around 10 mins total. Playback continued for the whole time, but at one point the overview on the Emby Server dashboard, which shows active playbacks glitched out. The transcode stats disappeared from each of the tiles, and it took a few minutes for them all to return. I don't know if this is a UI failure, or Emby backend failure, or if the graphics stack reset, and didn't kick the transcode in until it hit the next throttle by segment request. Playback continued throughout.

I'm attaching the hardware detection logs, and the most recent transcode log for one of the videos.

Edit: I was just about to hit send on this message, when I decided to check some of the other transcode logs in search of info on the issue I saw above.

It seems that the graphics stack did indeed blow up.

After a certain point in the log, after the throttling has been kicking in and resuming a few times, the transcode errored, and I saw our old friend of a bug from last month, that is supposedly fixed in my kernel version.

Transcodes did eventually manage to come back up in what looks like a new thread, before playback could stall on the clients.

I'm attaching that transcode log too, and pulled out a relevant snippet below:

>> ThrottleBySegmentRequest: RequestPosition: 00:47:39 - TranscodingPosition: 00:49:42 - ThrottleBuffer: 124s (Treshold: 120s)
Enter throttle value: <int, ms wait per input cycle> (0 to disable)

Setting throttle value to 100 ms
21:40:09.291 frame= 8530 fps= 44 q=-0.0 size=  217185kB time=00:49:44.34 bitrate=5008.0kbits/s throttle=100 speed=1.83x    
21:40:09.726 frame= 8531 fps= 44 q=-0.0 size=  217207kB time=00:49:44.37 bitrate=5008.1kbits/s throttle=100 speed=1.83x    
21:40:10.230 frame= 8531 fps= 44 q=-0.0 size=  217210kB time=00:49:44.44 bitrate=5007.3kbits/s throttle=100 speed=1.82x    
21:40:10.802 frame= 8534 fps= 44 q=-0.0 size=  217270kB time=00:49:44.47 bitrate=5008.2kbits/s throttle=100 speed=1.82x    
>> ThrottleBySegmentRequest: RequestPosition: 00:47:39 - TranscodingPosition: 00:49:44 - ThrottleBuffer: 125s (Treshold: 120s)
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: amdgpu_cs_query_fence_status failed.
21:40:12.330 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0
21:40:12.330 frame= 8535 fps= 44 q=-0.0 size=  217272kB time=00:49:44.50 bitrate=5007.8kbits/s throttle=100 speed=1.81x    
21:40:12.430 frame= 8535 fps= 43 q=-0.0 size=  217272kB time=00:49:44.50 bitrate=5007.8kbits/s throttle=100 speed= 1.8x    
21:40:12.934 frame= 8535 fps= 43 q=-0.0 size=  217275kB time=00:49:44.57 bitrate=5007.0kbits/s throttle=100 speed= 1.8x    
>> ThrottleBySegmentRequest: RequestPosition: 00:47:39 - TranscodingPosition: 00:49:44 - ThrottleBuffer: 126s (Treshold: 120s)
21:40:13.437 frame= 8535 fps= 43 q=-0.0 size=  217278kB time=00:49:44.63 bitrate=5006.1kbits/s throttle=100 speed=1.79x    
21:40:13.940 frame= 8535 fps= 43 q=-0.0 size=  217279kB time=00:49:44.66 bitrate=5005.7kbits/s throttle=100 speed=1.79x    
21:40:14.444 frame= 8535 fps= 43 q=-0.0 size=  217282kB time=00:49:44.73 bitrate=5004.9kbits/s throttle=100 speed=1.79x    
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: The CS has been cancelled because the context is lost.
amdgpu: The CS has been cancelled because the context is lost.
21:40:14.847 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0
amdgpu: The CS has been cancelled because the context is lost.
amdgpu: The CS has been cancelled because the context is lost.
21:40:14.949 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0
21:40:14.949 frame= 8537 fps= 43 q=-0.0 size=  217284kB time=00:49:44.76 bitrate=5004.5kbits/s throttle=100 speed=1.78x    
amdgpu: The CS has been cancelled because the context is lost.
amdgpu: The CS has been cancelled because the context is lost.
21:40:15.050 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: The CS has been cancelled because the context is lost.
>> ThrottleBySegmentRequest: RequestPosition: 00:47:39 - TranscodingPosition: 00:49:44 - ThrottleBuffer: 126s (Treshold: 120s)
amdgpu: The CS has been cancelled because the context is lost.
21:40:15.151 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0
amdgpu: The CS has been cancelled because the context is lost.
amdgpu: The CS has been cancelled because the context is lost.
21:40:15.251 [mpegts @ 0xb1d240] H.264 bitstream error, startcode missing, size 0

 

hardware_detection-63743664301.txt ffmpeg-transcode-497706ce-ca92-459c-905c-332dcb82a349_1.txt errored - ffmpeg-transcode-0f919787-cdc1-48af-8ae2-870cb580675f_1.txt

Posted

@Luke Had a quick question that is a little bit aside from what alucryd is working on here, but it's related, and is the original title of this topic.

From what I've read, the consumer Nvidia GPU's such as the GTX 1080 are driver limited to just a few HW transcode sessions (3 in fact), it was reported previously that if the GPU refuses a transcode session, it will fall back to software transcoding.

Unlike the Nvidia cards, my AMD GPU doesn't have such a low limit.

It will continue to allow new transcode jobs, although it can only comfortably handle 3 x 4K HEVC -> h264 transcodes at once.

Add any more HW transcodes after this point, and playback performance degrades significantly, leaving video no longer watchable with any fluidity.

 

I do however have untapped performance available in the form of a Ryzen CPU, which is sat there idling, and most certainly capable of handling a few more transcodes.

 

Is there any way to replicate the behaviour seen on Nvidia GPU's, and set a hard GPU transcoding limit for AMD cards, say a max of 3, and then force software transcoding for any further parallel transcodes?

  • 2 months later...
Posted

@alucryd This thread went very quiet, understandably so given the time of year, so I just wanted to get back in touch with an update.

I've been happily using the beta Docker from the point you added the AMD drivers back in December.

On 15/12/2020 at 18:46, alucryd said:

However, I just pushed a new beta version that includes the mesa drivers (r600 and radeonsi), and a fully upgraded vaapi stack, could you give it a try?

It seems to have been running really quite well, I've only had a single occasion when a user switched bitrate and the entire docker stack locked up, but I'm not too worried about it, and it's probably not anything to do with your work specifically. It could be unique to my hardware combination, or the infancy of GPU driver inclusion in Unraid.

The transcoding performance is admirable, in general I'm able to see something around 65 fps transcodes on 4k h265 sources (varying bitrates) to h264 output.

It handles 2-3 trancodes at this level before starting to get bogged down, which is nice - Still wish there was a way in Emby to limit the maximum number of hardware transcodes and then fall back to the CPU though, rather than GPU transcodes just grinding till they're no longer able to continue playback.

So good work, and thank you for everything you did and all the help provided!

I've been hesitant to update my docker container since then, as I didn't know what might have changed, or if these drivers made it back into the stable branch, etc...

Is there a way to keep up to date with any changes to the docker builds, like a changelog?

On 15/12/2020 at 18:46, alucryd said:

I will also soon expose an env variable named LIBVA_DRIVER_NAME which will let you override which driver is used. Since libva may incorrectly default to r600, it can be useful to force radeonsi.

This never made it into the docker hub documentation, so I haven't been sure whether it was added, to be able to properly test.

Also, did your Radeon GPU ever arrive? I hope it didn't get lost in all the christmas deliveries! 👍

Thanks again dude!

  • 2 weeks later...
Posted
Quote

Is there a way to keep up to date with any changes to the docker builds, like a changelog?

Hi @flexage I would suggest following the general emby server release notes, whether that be the blog for the stable channel, or the testing area for the beta channel if that's what you're using: 

 

@alucryd has been out for the last couple weeks but hopefully he will be able to respond soon. Thanks.

flexage
Posted

Hi @Luke Thanks for the info, I had a look through those again, and found no mention of AMD driver support for docker.

It's just that I haven't upgraded my docker container since December, due to not knowing what may have changed, since I've never seen mention of it in the changelogs, and it seems totally undocumented at this point in time (it's new and I totally get it).

No biggie though, I'm happy to sit here until @alucryd is back and available to allay my concerns 👍

alucryd
Posted

@flexage I'm back :) I was truly out so the docker image hasn't changed much, apart from having a better layering which should reduce the size of subsequent pulls. The next big change will be the addition of OpenCL, with at least a runtime for Intel, maybe AMD (if the OCL filter is better than its VAAPI counterpart), for tone mapping of HDR content.

In any case it shouldn't impact VAAPI and the only changes you should expect for a while will be driver updates, so it should only get better with newer drivers (provided they have no regressions).

I'll try to work on automated changelogs for the packaging side and maybe have them added to the release notes (or a separate one) when I find the time.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...