Jump to content

Server not using scale_cuda for 4k HEVC


terahz

Recommended Posts

Hi,

 

I just want to report that for some? HEVC files, Emby doesn't use full cuda acceleration and thus can't keep up going from 4K HEVC -> 1080 h264 single stream. 

 

ex:

4k h264( h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 3840x2160 [sAR 1:1 DAR 16:9], 82970 kb/s, Level 51, 30 fps, 30 tbr, 30 tbn, 60 tbc (default)) to 1080 h264 - about 150fps (h264_to_h264.txt.gz)

 

/opt/emby-server/bin/ffmpeg -hwaccel cuda -hwaccel_device 0 -hwaccel_output_format cuda  -c:v h264_cuvid  -f mp4 -i file:"/nfs/testfile.mp4" -threads 0 -map 0:0 -map 0:1 -sn -c:v:0 h264_nvenc -filter_complex "[0:0]scale_cuda=w=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:h=trunc(ow/dar/2)*2"  -b:v:0 14680001 -maxrate 14680001 -bufsize 29360002 -profile:v:0 high -g:v:0 90 -keyint_min:v:0 90 -sc_threshold:v:0 0  -copyts -vsync -1 -codec:a:0 copy -disposition:a:0 default -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_write_temp 1 -segment_list_type m3u8 -segment_start_number 0 -segment_list "/tmp/transcoding-temp/fbc2e74a36a640812fcb214d61147d5d.m3u8" -y "/tmp/transcoding-temp/fbc2e74a36a640812fcb214d61147d5d%d.ts"

 

4k HEVC (hevc, none, 3840x2160, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn (default)) to 1080 h264 - about 15fps (hevc_to_h264.txt.gz)

 

/opt/emby-server/bin/ffmpeg -ss 00:41:45.000 -c:v hevc_cuvid -f matroska -i file:"/nfs/testfile.mkv" -threads 0 -map 0:0 -map 0:3 -sn -c:v:0 h264_nvenc -filter_complex "[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -pix_fmt yuv420p  -b:v:0 14616000 -maxrate 14616000 -bufsize 29232000 -profile:v:0 high -g:v:0 72 -keyint_min:v:0 72 -sc_threshold:v:0 0  -copyts -vsync -1 -codec:a:0 libmp3lame -metadata:s:a:0 language=eng -disposition:a:0 default -ac:a:0 2 -ab:a:0 192000 -af:a:0 "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_write_temp 1 -segment_list_type m3u8 -segment_start_number 835 -segment_list "/tmp/transcoding-temp/93b669574d545f9d56fbffcdd3abe839.m3u8" -y "/tmp/transcoding-temp/93b669574d545f9d56fbffcdd3abe839%d.ts"

 

Modifying the command to include cuda like this results in steady 80fps:

 

/opt/emby-server/bin/ffmpeg -ss 00:41:45.000 -hwaccel cuda -hwaccel_device 0 -hwaccel_output_format cuda -c:v hevc_cuvid -f matroska -i file:"/nfs/testfile.mkv" -threads 0 -map 0:0 -map 0:3 -sn -c:v:0 h264_nvenc -filter_complex "scale_cuda=w=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:h=trunc(ow/dar/2)*2,hwdownload,format=p010le" -pix_fmt yuv420p  -b:v:0 14616000 -maxrate 14616000 -bufsize 29232000 -profile:v:0 high -g:v:0 72 -keyint_min:v:0 72 -sc_threshold:v:0 0  -copyts -vsync -1 -codec:a:0 libmp3lame -metadata:s:a:0 language=eng -disposition:a:0 default -ac:a:0 2 -ab:a:0 192000 -af:a:0 "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_write_temp 1 -segment_list_type m3u8 -segment_start_number 835 -segment_list "/tmp/transcoding-temp/93b669574d545f9d56fbffcdd3abe839.m3u8" -y "/tmp/transcoding-temp/93b669574d545f9d56fbffcdd3abe839%d.ts"

 

 

Emby version 4.2.1.0

CentOS 7

Nvidia drivers 440.26

Cuda 10.2

 

Also, thanks for the nice work so far! Based on my limited testing, I've already purchased emby premium! 

hevc_to_h264.txt.gz

h264_to_h264.txt.gz

Link to comment
Share on other sites

@@terahz - Handling color conversions is currently a weak point. When the source video is 10bit, we're avoiding hw scaling because we can't handle it properly. 

 

I understand that one might think that it would be easy to fix - which is obviously true for that exact situation. But there are hundreds of different cases that we need to account for and that's where things are getting a bit more complex. Not unsolvable, but for historic reasons we are handling input, filtering and output more or less independently from each other, and that model isn't suited anymore for handling color conversions in combination with hw acceleration. That's why it is a huge step for us - but we're already working on it!

 

There's no doubt that the command line you're showing is doing much better than what Emby currently does. But it's still not the most desirable solution because it involves copying all video data back from GPU memory to system memory after hw scaling, then converting color format using CPU, and afterwards transferring video data to GPU memory again for encoding.

Ideally, the color conversion will happen in hardware to avoid the copying and cpu processing.

 

We will get there - stay tuned!

Edited by softworkz
Link to comment
Share on other sites

@@softworkz, I completely understand. I'm sure you guys will figure it out. I can't even imagine the amount of combinations you have to worry about when tuning ffmpeg.

 

I've seen some other threads about using resizing in the decoder. That yields the best performance for me and there is no need to use a scale filter. I hope that's still on the table, especially when someone selects a manual quality setting.

 

Meanwhile, consider putting a checkbox in the Transcoding section of the settings for HW scaling of HEVC. As it is, those files are not watchable for me on anything that can't handle the direct stream. I'll take inferior quality/color over 10fps :) Also, I can't migrate to Emby until I can get my kids' videos playable by grandparents' tablets and computers. Half of them are done by a camera that shoots HEVC 10bit. My next step is to figure out how to tell Emby to reuse the Optimized Versions I've already generated using that other platform ;)

 

 

Also, thanks for the detailed and quick response. This right here is one of the main reasons I'm planning to switch to Emby. 

Link to comment
Share on other sites

All I can say is that it's a product management decision. If it was about me, we would already have a few more options to configure. 

While I don't advocate having an abundant set of options that would require users to be transcoding experts, there are some which would be reasonable and where Emby cannot automatically make the right decisions.to adapt to a user's requirements. I hope we'll make some progress in that area soon.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...