Jump to content

Slow hardware decode with HEVC and nVidia P4000


mbnwa

Recommended Posts

No it strictly supports only HxW no expressions, you basically need to run the calculation outside of the execution of the FFMpeg command to get the output HxW values and pass that inside the --resize parameter during execution. 

Edited by mbnwa
Link to comment
Share on other sites

I can test that use case, however I think resize would be used if decoder is selected. However seeing I have not tested that outright let me setup a test case and I will report back.

 

A little off topic, I am working on HDR -> SDR during transcode seeing that HDR is lost - before I spend much time on that do you already have a solution or would it help if I did some ground work and report that back as well?

Edited by mbnwa
Link to comment
Share on other sites

@@Luke

 

ffmpeg.exe -c:v hevc_cuvid -resize 640x360 -i file:"S:\Media Storage\$Media\$Video\$Video (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset veryfast -crf 18 -maxrate 29616000 -bufsize 59232000 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+2))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3.m3u8" -y "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3%d.ts"

 

--resize works just fine with software encoder while doing the decoding in hardware

 

Hardware Decoding / Software Encoding (-vf autoscale present): frame= 1483 fps= 97 q=-1.0 Lsize=N/A time=00:01:02.11 bitrate=N/A speed=4.08x

 

Edit: Oh now that's interesting... I did not remove the -vf and I got hardware decoding :/ with no errors with the software encoder.. Let me rerun this with out -vf

 

Edit 2: Removing  -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" still works as expected with slightly less usage on the CPU assuming it's not trying to scale it at this point seeing it was scaled by the decoder. 

 

ffmpeg.exe -c:v hevc_cuvid -resize 640x360 -i file:"S:\Media Storage\$Media\$Video\$Video (4K UHD).mkv" -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264  -pix_fmt yuv420p -preset veryfast -crf 18 -maxrate 29616000 -bufsize 59232000 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+2))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3.m3u8" -y "T:\transcoding-temp\93f4873ae545ed8ec9998881764689b3%d.ts"

 

Hardware Decoder / Software Encoder (-vf autoscale removed): frame=  934 fps= 92 q=-1.0 Lsize=N/A time=00:00:39.16 bitrate=N/A speed=3.87x

Edited by mbnwa
Link to comment
Share on other sites

Ok please try the next beta, thanks.

 

 

Any thoughts on the HDR transcoding question? in my tests it does not look like it's something that can be done in software only - even with hardware decoding the FPS are cut drastically, however it at least does not present washed out frames on SDR screens  for example HDR -> SDR conversion using -resize kills the transcode speed to just above 1x +/- 25fps while CPU based encode / decode based HDR -> SDR yields around 7.2FPS - If you did decide to implement HDR -> SDR conversion I would say at this point it's something that only hardware transcoding folks will be able to utilize due to the demands put on the system with software only.

 

 

I will install Emby on a test computer as I can not install the beta on the home server the wife would kill me if something broke. 

Link to comment
Share on other sites

No worries I will post that data into the testing area as it's not a feature in the core code at this point anyway, I'll let you know when I get the beta setup on my staging server.

Link to comment
Share on other sites

@@Luke,

 

Notes: This system has a Geforce GTX1080 installed in it, and not the Quadro P4000 so results will be different on my production server.

 

Thanks, that seems to work really really well and does not require the external compiled version of FFMpeg. Here are some of the results from the testing based on different resolutions during transcode (All HEVC 4K W/ TrueHD) - it also looks like 1080p @ 4mbps uses the same res as 720p @ 4mbps

 

Home network quality Auto: -resize 3840x2160 frame= 986 fps= 30 q=9.0 size=N/A time=00:00:41.28 bitrate=N/A speed=1.28x

Home network quality 1080p 30Mbps: -resize 1920x1080 frame= 1233 fps= 81 q=9.0 size=N/A time=00:00:51.57 bitrate=N/A speed=3.38x

Home network quality 1080p 4Mbps: -resize 1278x718 frame= 1263 fps=120 q=9.0 size=N/A time=00:00:52.92 bitrate=N/A speed=5.04x

Home network quality 720p 4Mbps: -resize 1278x718 frame= 1240 fps=111 q=9.0 size=N/A time=00:00:51.86 bitrate=N/A speed=4.63x

Home network quality 720p 2Mbps: -resize 720x404 frame= 1259 fps=140 q=10.0 size=N/A time=00:00:52.70 bitrate=N/A speed=5.85x

Home network quality 720p 1Mbps: -resize 640x360 frame= 1286 fps=147 q=19.0 size=N/A time=00:00:53.76 bitrate=N/A speed=6.14x

Home network quality 480p 720kbps: -resize 426x238 frame= 1270 fps=148 q=11.0 size=N/A time=00:00:53.06 bitrate=N/A speed=6.17x

Link to comment
Share on other sites

Here are a few results from H264 / 5.1DD

 

Home network quality 720p 4Mbps: -resize 1280x534 frame= 1279 fps=233 q=10.0 size=N/A time=00:00:53.44 bitrate=N/A speed=9.72x

Home network quality 720p 1Mbps: -resize 638x266 frame= 1206 fps=268 q=11.0 size=N/A time=00:00:50.23 bitrate=N/A speed=11.2x

Link to comment
Share on other sites

agreed, thanks for putting this in the beta, I can not fully test it as my staging server does not have access to my TV tuner network but I do not really expect any issues with the streams from the HDHR

Link to comment
Share on other sites

@@Luke

 

I was having a discussion with another member who is having issues with HEVC on CPU decoding / encoding and looking for HW alternatives, after reviewing the logs I noticed they are using subtitles. I generally do not use subtitles so never tested it, however using subtitles with the new beta results in an error inside of Emby stating "Playback Error No compatible streams are currently available. Please try again later or contact your system administrator for details." looking at the logs results in a 400 error and no FFMpeg log generated. 

 

Parameter name: startIndex

     at System.String.Substring(Int32 startIndex, Int32 length)
     at MediaBrowser.Controller.MediaEncoding.EncodingHelper.GetGraphicalSubtitleParam(EncodingJobInfo state, EncodingOptions options, String outputVideoCodec)
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.GetVideoArguments(StreamState state, EncodingOptions encodingOptions)
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.GetCommandLineArguments(String outputPath, EncodingOptions encodingOptions, StreamState state, Boolean isEncoding)
     at Emby.Server.MediaEncoding.Api.BaseStreamingService.GetOutputFilePath(StreamState state, EncodingOptions encodingOptions, String outputFileExtension)
     at Emby.Server.MediaEncoding.Api.BaseStreamingService.<GetState>d__82.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.<GetMasterPlaylistInternal>d__25.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at Emby.Server.Implementations.Services.ServiceExecGeneral.<Execute>d__2.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
     at Emby.Server.Implementations.Services.ServiceHandler.<ProcessRequestAsync>d__15.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at Emby.Server.Implementations.HttpServer.HttpListenerHost.<RequestHandler>d__70.MoveNext()
    System.ArgumentOutOfRangeException
     at System.String.Substring(Int32 startIndex, Int32 length)
     at MediaBrowser.Controller.MediaEncoding.EncodingHelper.GetGraphicalSubtitleParam(EncodingJobInfo state, EncodingOptions options, String outputVideoCodec)
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.GetVideoArguments(StreamState state, EncodingOptions encodingOptions)
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.GetCommandLineArguments(String outputPath, EncodingOptions encodingOptions, StreamState state, Boolean isEncoding)
     at Emby.Server.MediaEncoding.Api.BaseStreamingService.GetOutputFilePath(StreamState state, EncodingOptions encodingOptions, String outputFileExtension)
     at Emby.Server.MediaEncoding.Api.BaseStreamingService.<GetState>d__82.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
     at Emby.Server.MediaEncoding.Api.Hls.DynamicHlsService.<GetMasterPlaylistInternal>d__25.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at Emby.Server.Implementations.Services.ServiceExecGeneral.<Execute>d__2.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
     at Emby.Server.Implementations.Services.ServiceHandler.<ProcessRequestAsync>d__15.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
     at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
     at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
     at Emby.Server.Implementations.HttpServer.HttpListenerHost.<RequestHandler>d__70.MoveNext()

Link to comment
Share on other sites

 

The following two commands would require custom compiled versions of FFMpeg due to the inclusion of --enable-cuda-sdk --enable-nonfree

 

 

As i knew, and i supose i knew better than others we non beed anymore to enable flags on build options because are defaut included in ffmpeg.

when i build ffmpeg i do some sdimple steps ( we spoke about nvenc support)

1-install dependecies

2-install al libs for audio,ecc

3-insal cuda full from run file

4-link the path's

5-compile ffmpeg with no flag regarding cuda/nvenc only exception is --enable_npp for let us using the potential of this filter.

 

doing that way no need anymore to enable flags!!!

to fully transcode using nvenc ( all oin gpu) we need to pass more params to the cuda cores before "-i"

 

as resize, deinterlace,ecc

also we can do mix of them to use gpu+cpu eventualy use a npp filter!

 

PS: we do not need anymore dowenload of cuda SDK and copy all libs

used for test : Own system build with i7 7700k + 16ram DDR 4 + P4000/p400/GTX16xx

 

Edited by libnc
Link to comment
Share on other sites

The updated command in order to use autoscaler for H264 (based on one of my tests) would be the following:


C:\Users\$username\AppData\Roaming\MediaBrowser-Server\System\ffmpeg.exe -hwaccel cuvid -c:v h264_cuvid  -i file:"S:\Media Storage\$videos\$media_file\$media_file.m4v" -threads 0 -map 0:1 -map 0:0 -map -0:s -codec:v:0 h264_nvenc -vf "scale_cuda=trunc(min(max(iw\,ih*dar)\,640)/2)*2:trunc(ow/dar/2)*2,hwdownload,format=nv12" -pix_fmt yuv420p -preset default -b:v 674565 -maxrate 674565 -bufsize 1349130 -profile:v high -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 copy -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\527943752781c5890fd839740c8b28db.m3u8" -y "T:\transcoding-temp\527943752781c5890fd839740c8b28db%d.ts"

 

For DOWNLOAD something we need first to UPLOAD.

So that' part "hwdownload" and "format" have non sense here!

Link to comment
Share on other sites

 

 

The following two commands would require custom compiled versions of FFMpeg due to the inclusion of --enable-cuda-sdk --enable-nonfree

 

 

As i knew, and i supose i knew better than others we non beed anymore to enable flags on build options because are defaut included in ffmpeg.

when i build ffmpeg i do some sdimple steps ( we spoke about nvenc support)

1-install dependecies

2-install al libs for audio,ecc

3-insal cuda full from run file

4-link the path's

5-compile ffmpeg with no flag regarding cuda/nvenc only exception is --enable_npp for let us using the potential of this filter.

 

doing that way no need anymore to enable flags!!!

to fully transcode using nvenc ( all oin gpu) we need to pass more params to the cuda cores before "-i"

 

as resize, deinterlace,ecc

also we can do mix of them to use gpu+cpu eventualy use a npp filter!

 

PS: we do not need anymore dowenload of cuda SDK and copy all libs

used for test : Own system build with i7 7700k + 16ram DDR 4 + P4000/p400/GTX16xx

 

 

 

 

--libnpp will not compile without cuda-sdk in addition its part of the --enable-non-free so the binary would not be able to be distributed per the license. libnpp does not support 10bit formats as of the latest testing and scale_cuda would have been a good option if it could be distributed.

Link to comment
Share on other sites

The updated command in order to use autoscaler for H264 (based on one of my tests) would be the following:

C:\Users\$username\AppData\Roaming\MediaBrowser-Server\System\ffmpeg.exe -hwaccel cuvid -c:v h264_cuvid  -i file:"S:\Media Storage\$videos\$media_file\$media_file.m4v" -threads 0 -map 0:1 -map 0:0 -map -0:s -codec:v:0 h264_nvenc -vf "scale_cuda=trunc(min(max(iw\,ih*dar)\,640)/2)*2:trunc(ow/dar/2)*2,hwdownload,format=nv12" -pix_fmt yuv420p -preset default -b:v 674565 -maxrate 674565 -bufsize 1349130 -profile:v high -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 copy -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "T:\transcoding-temp\527943752781c5890fd839740c8b28db.m3u8" -y "T:\transcoding-temp\527943752781c5890fd839740c8b28db%d.ts"

 

For DOWNLOAD something we need first to UPLOAD.

So that' part "hwdownload" and "format" have non sense here!

 

The below is based off ffmpeg version N-90148-g0419623cdc

 

The upload happens when you utilize -hwaccel cuvid if you remove ,hwdownload,format=nv12 the command fails with the following:

Impossible to convert between the formats supported by the filter 'Parsed_scale_cuda_0' and the filter 'auto_scaler_0'

Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:1
 
The only time this command completes is when you also include hwdownload and the format of the hwframe by utilizing ,hwdownload,format=nv12
Edited by mbnwa
  • Like 1
Link to comment
Share on other sites

  • 1 month later...
sirfragalot

Sorry to revive an old(ish) thread but having recently purchased a GTX 1060 to do hardware transcoding on my media server (because I have an old AMD 6800K APU that handles everything bar HEVC / h.265 stuff)

 

I am still struggling to get higher than ~10-20fps using NVENC on HEVC Main 10 video...

 

 

Using the parameters from the transcode log I can see its using NVENC ok, but its unwatchable:

 

frame=  173 fps= 14 q=10.0 size=N/A time=00:00:07.46 bitrate=N/A speed=0.615x
I have had a read of this thread and ran the ffmpeg commands manually with -resize and it achieves ~35-50+fps (both using the emby shipped ffmpeg and the latest from Git):
 
-resize 1920x1080: frame= 1588 fps= 36 q=43.0 size=N/A time=00:01:06.48 bitrate=N/A speed= 1.5x

-resize 1280x720: frame=  467 fps= 49 q=29.0 size=N/A time=00:00:19.70 bitrate=N/A speed=2.05x
I have not tried scale_cuda option as I have not custom compiled version of ffmpeg to try this with.
 
My question is a two parter;
 
a) is it possible to set Emby (stable) to use the -resize option by default?
B) if answer to a is yes where can I get a scale_cuda compiled version of ffmpeg? (or even if its no, would be interested in testing) 
 
I am running Emby Server 3.3.1.0 on Windows Server 2016.
 
Thanks in advance for any advice given.
Link to comment
Share on other sites

mbnwa

I am going to try to answer this however I might be out of line and @@Luke can correct me, the updated FFMpeg commands will be in the next release of Emby stable it's currently in the beta branch and should help with the performance issues you are seeing.

 

scale_cuda requires FFMpeg compiled with non-free and due to that it can not be distributed, you would need to compile CUDA from your local installation. 

Link to comment
Share on other sites

sirfragalot

Thanks for the swift reply...I can confirm that the beta has resolved my HEVC transcoding issues.

 

 

Thank you very much.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...