Jump to content


Photo

Slow hardware decode with HEVC and nVidia P4000


  • Please log in to reply
51 replies to this topic

#21 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 24 February 2018 - 03:21 PM

That option too much troubleshooting for us. You can just replace the emby executables if you want to. I'll try to incorporate the resize option.

#22 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 24 February 2018 - 03:25 PM

Thanks, I will be on the lookout for the resize option, I guess I will keep the P4000 at this point seeing your willing to look at the resize option as a solution for Nvidia hardware.



#23 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 24 February 2018 - 05:54 PM

Well actually it looks like based on your research it should be scale_cuda instead?



#24 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 24 February 2018 - 07:56 PM

@Luke

scale_cuda does work and works well, however it would require the end user to compile FFmpeg with Cuda SDK - resize works with the included version of FFmpeg that is distributed with Emby today.

I think for the highest level of compatibility using resize will be advantageous to all users as doing your own compile is not for everyone.

If you want to take on both you could update the current Nvidia profile to support resize and add a new one that uses scale_cuda for advanced users that can replace the bundled version of FFmpeg.

I would be happy with either one or both just depends on what you want to take on.


Sent from my iPhone using Tapatalk

Edited by mbnwa, 24 February 2018 - 07:58 PM.


#25 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 24 February 2018 - 08:27 PM

@Luke I updated the post on the last page to reflect everything in a single post however to sum up my post I will post the relevant parts here.

 

Current Emby FFMpeg can support the following --resize HxW (with the removal of -vf autoscale)

C:\Users\$username\AppData\Roaming\MediaBrowser-Server\System\ffmpeg.exe -c:v hevc_cuvid -resize 1280x720 -i file:"S:\Media Storage\$videos\$media_file\$media_file (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -vf "scale=trunc(min(max(iw\,ih*dar)\,640)/2)*2:trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset default -b:v 416000 -maxrate 416000 -bufsize 832000 -profile:v high -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\1355279e13b40fa96934820627bde999.m3u8" -y "T:\transcoding-temp\1355279e13b40fa96934820627bdeecf9999%d.ts"

 

The following two commands would require custom compiled versions of FFMpeg due to the inclusion of --enable-cuda-sdk --enable-nonfree

The updated command in order to use autoscaler for HEVC would be the following:

C:\Users\$username\AppData\Roaming\MediaBrowser-Server\System\ffmpeg.exe -hwaccel cuvid -c:v hevc_cuvid  -i file:"S:\Media Storage\$videos\$media_file\$media_file (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -vf "scale_cuda=trunc(min(max(iw\,ih*dar)\,640)/2)*2:trunc(ow/dar/2)*2,hwdownload,format=p010le" -pix_fmt yuv420p -preset default -b:v 416000 -maxrate 416000 -bufsize 832000 -profile:v high -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\1355279e13b40fa96934820627bdeecf.m3u8" -y "T:\transcoding-temp\1355279e13b40fa96934820627bdeecf%d.ts"

 

The updated command in order to use autoscaler for H264 (based on one of my tests) would be the following:

C:\Users\$username\AppData\Roaming\MediaBrowser-Server\System\ffmpeg.exe -hwaccel cuvid -c:v h264_cuvid  -i file:"S:\Media Storage\$videos\$media_file\$media_file.m4v" -threads 0 -map 0:1 -map 0:0 -map -0:s -codec:v:0 h264_nvenc -vf "scale_cuda=trunc(min(max(iw\,ih*dar)\,640)/2)*2:trunc(ow/dar/2)*2,hwdownload,format=nv12" -pix_fmt yuv420p -preset default -b:v 674565 -maxrate 674565 -bufsize 1349130 -profile:v high -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 copy -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\527943752781c5890fd839740c8b28db.m3u8" -y "T:\transcoding-temp\527943752781c5890fd839740c8b28db%d.ts"

 

In short you have to tell the decoder to upload the frames to the GPU for processing, then during your autoscaler -vf call download the the frames back into system ram and you MUST provide the input format that was uploaded to the GPU so in this case due to it being 4K it was p010le however for my m4v's that I use I had to use format=nv12 (format I assume will be dynamic based on the source video.)

 

I do have to say that performance is outstanding using this method (HEVC) even if it's a PITA to build, this method yielded the following:

HEVC scale_cuda: frame= 8428 fps=138 q=35.0 Lsize=N/A time=00:05:51.72 bitrate=N/A speed=5.74x

HEVC resize: frame=  521 fps= 84 q=24.0 Lsize=N/A time=00:00:21.93 bitrate=N/A speed=3.53x

HEVC no optimization: frame=  128 fps= 23 q=23.0 size=N/A time=00:00:05.42 bitrate=N/A speed=0.958x

 

H264 scale_cuda: frame= 1857 fps=598 q=27.0 Lsize=N/A time=00:01:18.69 bitrate=N/A speed=25.3x

H264 resize: frame= frame= 2478 fps=570 q=35.0 Lsize=N/A time=00:01:44.59 bitrate=N/A speed=24.1x

H264 no optimization: frame= 1216 fps=173 q=20.0 size=N/A time=00:00:51.98 bitrate=N/A speed=7.39x


Edited by mbnwa, 24 February 2018 - 10:37 PM.


#26 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 03:17 PM

Does resize support expressions similar to scale?



#27 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 03:28 PM

No it strictly supports only HxW no expressions, you basically need to run the calculation outside of the execution of the FFMpeg command to get the output HxW values and pass that inside the --resize parameter during execution. 


Edited by mbnwa, 25 February 2018 - 03:28 PM.


#28 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 04:16 PM

So resize can only be used when both the hw decoder and encoder are used together?



#29 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 04:17 PM

I can test that use case, however I think resize would be used if decoder is selected. However seeing I have not tested that outright let me setup a test case and I will report back.

 

A little off topic, I am working on HDR -> SDR during transcode seeing that HDR is lost - before I spend much time on that do you already have a solution or would it help if I did some ground work and report that back as well?


Edited by mbnwa, 25 February 2018 - 04:19 PM.


#30 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 04:28 PM

@Luke

 

ffmpeg.exe -c:v hevc_cuvid -resize 640x360 -i file:"S:\Media Storage\$Media\$Video\$Video (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset veryfast -crf 18 -maxrate 29616000 -bufsize 59232000 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+2))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3.m3u8" -y "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3%d.ts"

 

--resize works just fine with software encoder while doing the decoding in hardware

 

Hardware Decoding / Software Encoding (-vf autoscale present): frame= 1483 fps= 97 q=-1.0 Lsize=N/A time=00:01:02.11 bitrate=N/A speed=4.08x

 

Edit: Oh now that's interesting... I did not remove the -vf and I got hardware decoding :/ with no errors with the software encoder.. Let me rerun this with out -vf

 

Edit 2: Removing  -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" still works as expected with slightly less usage on the CPU assuming it's not trying to scale it at this point seeing it was scaled by the decoder. 

 

ffmpeg.exe -c:v hevc_cuvid -resize 640x360 -i file:"S:\Media Storage\$Media\$Video\$Video (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264  -pix_fmt yuv420p -preset veryfast -crf 18 -maxrate 29616000 -bufsize 59232000 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+2))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3.m3u8" -y "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3%d.ts"

 

Hardware Decoder / Software Encoder (-vf autoscale removed): frame=  934 fps= 92 q=-1.0 Lsize=N/A time=00:00:39.16 bitrate=N/A speed=3.87x


Edited by mbnwa, 25 February 2018 - 04:35 PM.


#31 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 05:29 PM

Ok please try the next beta, thanks.



#32 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 05:35 PM

Ok please try the next beta, thanks.

 

 

Any thoughts on the HDR transcoding question? in my tests it does not look like it's something that can be done in software only - even with hardware decoding the FPS are cut drastically, however it at least does not present washed out frames on SDR screens  for example HDR -> SDR conversion using -resize kills the transcode speed to just above 1x +/- 25fps while CPU based encode / decode based HDR -> SDR yields around 7.2FPS - If you did decide to implement HDR -> SDR conversion I would say at this point it's something that only hardware transcoding folks will be able to utilize due to the demands put on the system with software only.

 

 

I will install Emby on a test computer as I can not install the beta on the home server the wife would kill me if something broke. 



#33 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 05:37 PM

That's another topic for another day, thanks.



#34 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 05:38 PM

No worries I will post that data into the testing area as it's not a feature in the core code at this point anyway, I'll let you know when I get the beta setup on my staging server.



#35 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 06:33 PM

@Luke

 

I assume you want me to test the beta that will be after version: 3.3.0.1, correct?



#36 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 10:03 PM

It's up now in 3.3.0.2.



#37 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 10:50 PM

@Luke,

 

Notes: This system has a Geforce GTX1080 installed in it, and not the Quadro P4000 so results will be different on my production server.

 

Thanks, that seems to work really really well and does not require the external compiled version of FFMpeg. Here are some of the results from the testing based on different resolutions during transcode (All HEVC 4K W/ TrueHD) - it also looks like 1080p @ 4mbps uses the same res as 720p @ 4mbps

 

Home network quality Auto: -resize 3840x2160 frame= 986 fps= 30 q=9.0 size=N/A time=00:00:41.28 bitrate=N/A speed=1.28x

Home network quality 1080p 30Mbps: -resize 1920x1080 frame= 1233 fps= 81 q=9.0 size=N/A time=00:00:51.57 bitrate=N/A speed=3.38x

Home network quality 1080p 4Mbps: -resize 1278x718 frame= 1263 fps=120 q=9.0 size=N/A time=00:00:52.92 bitrate=N/A speed=5.04x

Home network quality 720p 4Mbps: -resize 1278x718 frame= 1240 fps=111 q=9.0 size=N/A time=00:00:51.86 bitrate=N/A speed=4.63x

Home network quality 720p 2Mbps: -resize 720x404 frame= 1259 fps=140 q=10.0 size=N/A time=00:00:52.70 bitrate=N/A speed=5.85x

Home network quality 720p 1Mbps: -resize 640x360 frame= 1286 fps=147 q=19.0 size=N/A time=00:00:53.76 bitrate=N/A speed=6.14x

Home network quality 480p 720kbps: -resize 426x238 frame= 1270 fps=148 q=11.0 size=N/A time=00:00:53.06 bitrate=N/A speed=6.17x



#38 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 10:55 PM

Here are a few results from H264 / 5.1DD

 

Home network quality 720p 4Mbps: -resize 1280x534 frame= 1279 fps=233 q=10.0 size=N/A time=00:00:53.44 bitrate=N/A speed=9.72x

Home network quality 720p 1Mbps: -resize 638x266 frame= 1206 fps=268 q=11.0 size=N/A time=00:00:50.23 bitrate=N/A speed=11.2x



#39 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 157224 posts
  • Local time: 01:03 AM

Posted 25 February 2018 - 11:09 PM

Excellent, thanks. Sounds encouraging.



#40 mbnwa OFFLINE  

mbnwa

    Advanced Member

  • Alpha Testers
  • 478 posts
  • Local time: 10:03 PM

Posted 25 February 2018 - 11:11 PM

agreed, thanks for putting this in the beta, I can not fully test it as my staging server does not have access to my TV tuner network but I do not really expect any issues with the streams from the HDHR






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users