Jump to content

HEVC / x265 quicksync / hardware transcoding not working?


Devdroid
 Share

Recommended Posts

Well, we can't allow a windows service to be the reason that we can't use dxva2. If that's the only reason then we should just go ahead and use it, and use qsv when running as a service.

  • Like 1
Link to comment
Share on other sites

Well, we can't allow a windows service to be the reason that we can't use dxva2. If that's the only reason then we should just go ahead and use it, and use qsv when running as a service.

 

It's more the context (for lack of a better word; really, access to an interactive desktop is the delimiter) than it actually being a service.  I suppose you could give it another go, but I'd increase the loglevel in Emby's ffmpeg call during local testing if you run into any failures like before.  If there is an issue with it not having access to an interactive desktop (big if, but who knows), it would be impossible to test from the command line and info-level just wouldn't show enough info to tell what happened.  Of course, rolling out a build with higher loglevel would impact performance, so a temporary "just for debugging" loglevel toggle available only on test builds may be a good idea.

 

Honestly, I'm not even sure if we ever conclusively determined what the issue ended up being before, but d3d9's limitations could've been part of it (my memory's hazy on it and I haven't looked back at the logs from before).  Worth another try, I guess, with a more thorough examination of failures.  Lots of things have changed in ffmpeg, so maybe we'll find out that the prior issues were just bugs.

  • Like 1
Link to comment
Share on other sites

Maybe if you can compile a test version i can help with the testing too. I hope this get resolved soon.

Edited by aahmyu
  • Like 1
Link to comment
Share on other sites

@@Waldonnis, do you think we ought to have the qsv selection utilize dxva, or just break out dxva into a separate option?

 

@@Luke

 

Tough call, but for strange reasons that are really tough to explain...and I'm not sure if it really matters in the end.  dxva2 would actually work for nVidia as well, and is one of the few ways to use hardware decoding on AMD GPUs with Windows (pretty sure d3d11va would work as well for AMD, but no capacity to test that here and it would only work on Win8+).  There are some corner cases where I can see dxva2 (or d3d11va) not being a great idea, but none of those apply here because of how Emby transcodes (read: not entirely done in hardware since frames are copied back to system memory for filtering).

 

If you do go the dxva2 hwaccel route, you should be able to remove the input decoder specification option entirely since DS/d3d should pick the device that has registered a decoder for a given codec (if all are capable, then it picks the "primary display device" in my experience).  If you really wanted to target a specific device's decoder, you could fall back to using the input decoder options, but I can't think of many reasons for doing so considering how Emby transcodes.

ffmpeg -hwaccel dxva2 -i input.mkv -map v -c:v rawvideo -f NULL -

The above works well, with decoding clearly being done on my primary GPU when I check CPU/GPU usage (the nVidia card in my case, but if my iGPU was primary or the only device, it should be used instead).  I'd say that since dxva2 is vendor-agnostic, you can probably offer it as a codec/vendor-independent hardware decoding option in place of the existing toggles that control the current vendor-specific input decoder options.  Obviously, all of this only applies to Windows (we all know this, but worth noting anyway).  Also worth noting that even though the above uses my nVidia card for decoding, I can use the QuickSync encoder for the output with no problems (monitor shows both GPUs' video blocks in use).

Link to comment
Share on other sites

Great, thanks for the info.

Thanks for implementing the dxva. I tested today and it's working fine. But can I suggest that you change the name in the setting to "Windows dxva" or something? I'm not sure how the dxva is related to AMD.

Edited by aahmyu
Link to comment
Share on other sites

  • 3 weeks later...

Hello @@Luke I know we are done with this thread after you enabled the dxva but as you know the qsv option is broken and today i did some tests it turns out you need to add this "-load_plugin hevc_hw" before the input 

for example:

.\ffmpeg.exe -c:v hevc_qsv -load_plugin hevc_hw -i input.mkv -threads 0 -map 0:0 -map 0:1 -map -0:s -c:v h264_qsv -vf "scale=trunc(min(max(iw\,ih*dar)\,1280)/2)*2:trunc(ow/dar/2)*2" -preset 7 -look_ahead 0 -b:v 3153540 -maxrate 3153540 -bufsize 6307080 -profile:v high -level 4.1 test.mkv

I copied this from emby logs and adding the load plugin fixed the issue and i got like 200fps.

 

"Moreover, hevc_qsv needs to add "-load_plugin" tags, try "-load_plugin hevc_hw" or "-load_plugin 2". At intel's forums there're many threads that get into detail about it (my GPU doesn't support HEVC so I know very little about it)."

 

for more info: https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=5204#p13249

 

edit: the only thing that im not sure of is why the hell it uses the cpu too. kaybe lake is supposed to encode/decode with hw but the cpu is still being used.

Edited by aahmyu
Link to comment
Share on other sites

Hello @@Luke I know we are done with this thread after you enabled the dxva but as you know the qsv option is broken and today i did some tests it turns out you need to add this "-load_plugin hevc_hw" before the input 

for example:

.\ffmpeg.exe -c:v hevc_qsv -load_plugin hevc_hw -i input.mkv -threads 0 -map 0:0 -map 0:1 -map -0:s -c:v h264_qsv -vf "scale=trunc(min(max(iw\,ih*dar)\,1280)/2)*2:trunc(ow/dar/2)*2" -preset 7 -look_ahead 0 -b:v 3153540 -maxrate 3153540 -bufsize 6307080 -profile:v high -level 4.1 test.mkv

I copied this from emby logs and adding the load plugin fixed the issue and i got like 200fps.

 

"Moreover, hevc_qsv needs to add "-load_plugin" tags, try "-load_plugin hevc_hw" or "-load_plugin 2". At intel's forums there're many threads that get into detail about it (my GPU doesn't support HEVC so I know very little about it)."

 

for more info: https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=5204#p13249

 

edit: the only thing that im not sure of is why the hell it uses the cpu too. kaybe lake is supposed to encode/decode with hw but the cpu is still being used.

 

It'll always use the CPU to some degree, since you still need to demux the video stream from the container and copy it across the bus to VRAM.  Since Emby does filtering operations that the GPU hardware cannot do, decoded frames need to be copied back to system memory so the filterchain can be applied, then recopied back to VRAM for encoding...so you're still looking at a decent degree of CPU use (plus, consider other operations like audio transcoding, re-muxing, etc, which GPUs cannot do).  The load looks bad when you're just doing one stream, but where hardware decoding shines is when you're dealing with more than one stream at a time - if you did two simultaneous streams, the load isn't likely to be double that of a single stream (it's complicated).

 

You could do the entire video transcode within the GPU's pipeline (and it ends up being even faster), but the only filtering you could do is simple stuff like scaling and deinterlacing on most hardware.  If you wanted to overlay subtitles, for example, you'd have to go back to doing it the way Emby does it.  Bear in mind that hardware encoders are fixed-function, i.e. they are designed to do one thing very well, but aren't very flexible and don't support a lot of the operations that software like Emby would need (decoders are also fixed-function, but there's no real need/case for them to be flexible).

 

Thanks for the info. Yea we discussed that param in the past but ran into some problems. @@Waldonnis what do you think?

 

This seems to be a recurring problem with Kaby Lake and later with HEVC (particularly Main10).  I'm not sure what's going wrong with detection in the driver and/or wrapper that's causing the plug-in to not be auto-loaded, but if force loading the plugin works around that problem, then it probably should be done if dxva2 isn't being used (if it is, then there's no reason to specify the decoder).  Unfortunately, I don't have the hardware to check into it further.

Link to comment
Share on other sites

Yea then i would say let's just use the dxva2 param instead. I think that would be better.

Link to comment
Share on other sites

  • 6 months later...

Hey i have a bit strange problem.

Emby is running on intel nuc. Hardware is i3 with intel HD620. In transkoding i setup up to use Intel Quick sync first and second the dxva.

 

Intel quick sync runs for all h265 fine until they are not in 4k. I have multiple h265 1080p files which start transcoding over intel quicksync without porblem.

 

As soon as i start a 4k video transkoding fall back to software. If i disable intel quicksync and use dxva only it runs a bit better but still with low fps (25-36) while intel quick sync does more than 200 with 1080p h265.

 

Question 1: why it does not use dxva when intel quick sync failed

Question 2: why does 4k not work with intel quick sync, intel spec sheet shows that intel hd620 supports 4k@60fps with hdr 

 

EDIT:

I found in ffmpeg log:

Affected codecs:
>>>>>> QuickSync Intel® HD Graphics 620 - H.265 (HEVC)
Adapter #0: 'Intel® HD Graphics 620' Id:22806 (Driver: 1572884.6559886, Vendor: 32902)
Frame Sizes: 16x16...16384x16384 - Width Alignment: 2 - Height Alignment: 2
Color Formats: NV12, P010LE, NV16
Profiles: Main Profile (Level 6.2 (Main)), Main 10 Profile (Level 6.2 (Main)), Main Still Picture (Level 6.2 (Main))

>>>>>> QuickSync Intel® HD Graphics 620 - H.264 (AVC)
Adapter #0: 'Intel® HD Graphics 620' Id:22806 (Driver: 1572884.6559886, Vendor: 32902)
Frame Sizes: 32x32...1920x1088 - Width Alignment: 2 - Height Alignment: 2
Color Formats: NV12, P010LE, QSV
Profiles: Baseline Profile (Level 5.2), Main Profile (Level 5.2), High Profile (Level 5.2), Constrained Baseline Profile (Level 5.2)

>>>>>> libx264 Software Encoder
Color Formats: YUV420P, YUVJ420P, YUV422P, YUVJ422P, YUV444P, YUVJ444P, NV12, NV16, NV21, YUV420P10LE, YUV422P10LE, YUV444P10LE, NV20LE
Profiles: Baseline Profile (Level 6.2), Main Profile (Level 6.2), High Profile (Level 6.2), High 10 Profile (Level 6.2), High 4:2:2 Profile (Level 6.2), High 4:4:4 Profile (Level 6.2)


>>>>>> FindVideoDecoder - MediaType: hevc, Mode: 2
Info FindVideoDecoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.265 (HEVC)' (Priority: 100)
Info FindVideoDecoder - Check successful - selecting 'QuickSync Intel® HD Graphics 620 - H.265 (HEVC)'

>>>>>> FindVideoEncoder - Media: h264, UseHardwareCodecs: True, Mode: 2
Info FindVideoEncoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.264 (AVC)' (Priority: 100)
NoMatch Frame width (3840) exceeds maximum supported width (1920)

NoMatch Encoder does not support input stream
NoMatch FindVideoEncoder - Encoder does not match
Info FindVideoEncoder - Checking: 'libx264 Software Encoder' (Priority: 0)
Info Encoder supports input stream
Info FindVideoEncoder - Check successful - selecting 'libx264 Software Encoder'

>>>>>> FindVideoDecoder - MediaType: hevc, Mode: 2
Info FindVideoDecoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.265 (HEVC)' (Priority: 100)
Info FindVideoDecoder - Check successful - selecting 'QuickSync Intel® HD Graphics 620 - H.265 (HEVC)'

>>>>>> FindVideoEncoder - Media: h264, UseHardwareCodecs: True, Mode: 2
Info FindVideoEncoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.264 (AVC)' (Priority: 100)
NoMatch Frame width (3840) exceeds maximum supported width (1920)
NoMatch Encoder does not support input stream
NoMatch FindVideoEncoder - Encoder does not match
Info FindVideoEncoder - Checking: 'libx264 Software Encoder' (Priority: 0)
Info Encoder supports input stream
Info FindVideoEncoder - Check successful - selecting 'libx264 Software Encoder'

 

But this is a bit strange as Intel HD620 can du MPEG4 h264 AVC with max resolution 2160p

and a bit later in the file you can see:

 

Please use -profile:a or -profile:v, -profile is ambiguous
Stream mapping:
Stream #0:0 -> #0:0 (hevc (hevc_qsv) -> h264 (libx264))
Stream #0:1 -> #0:1 (ac3 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Impossible to convert between the formats supported by the filter 'Parsed_null_0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
[libmp3lame @ 000001dd4aaf9200] 4 frames left in the queue on closing
Conversion failed!

 

And than it starts to use only software decode and encode and my performance is totaly weak :)

 

Affected codecs:
>>>>>> libx264 Software Encoder
Color Formats: YUV420P, YUVJ420P, YUV422P, YUVJ422P, YUV444P, YUVJ444P, NV12, NV16, NV21, YUV420P10LE, YUV422P10LE, YUV444P10LE, NV20LE
Profiles: Baseline Profile (Level 6.2), Main Profile (Level 6.2), High Profile (Level 6.2), High 10 Profile (Level 6.2), High 4:2:2 Profile (Level 6.2), High 4:4:4 Profile (Level 6.2)

Info Previous transcoding attempt failed. Falling back to software transcoding.

>>>>>> FindVideoDecoder - MediaType: hevc, Mode: 2
Info FindVideoDecoder - Checking: 'Automatic software decoder' (Priority: 0)
Info FindVideoDecoder - Check successful - selecting 'Automatic software decoder'

>>>>>> FindVideoEncoder - Media: h264, UseHardwareCodecs: False, Mode: 2
Info FindVideoEncoder - Checking: 'libx264 Software Encoder' (Priority: 0)
Info Encoder supports input stream
Info FindVideoEncoder - Check successful - selecting 'libx264 Software Encoder'

 

When i select Hardware Decode h265 DXVA Hd620 i got the following:

 

Affected codecs:
>>>>>> DXVA2 Intel® HD Graphics 620 - H.265 (HEVC)
Adapter #0: 'Intel® HD Graphics 620' Id:22806 (Driver: 24.20.100.6286, Vendor: 32902)
Frame Sizes: max 8192x4320
Color Formats: NV12, P010LE
Profiles: Main Profile (Level 6 (Main)), Main 10 Profile (Level 6 (Main))

>>>>>> QuickSync Intel® HD Graphics 620 - H.264 (AVC)
Adapter #0: 'Intel® HD Graphics 620' Id:22806 (Driver: 1572884.6559886, Vendor: 32902)
Frame Sizes: 32x32...1920x1088 - Width Alignment: 2 - Height Alignment: 2
Color Formats: NV12, P010LE, QSV
Profiles: Baseline Profile (Level 5.2), Main Profile (Level 5.2), High Profile (Level 5.2), Constrained Baseline Profile (Level 5.2)

>>>>>> libx264 Software Encoder
Color Formats: YUV420P, YUVJ420P, YUV422P, YUVJ422P, YUV444P, YUVJ444P, NV12, NV16, NV21, YUV420P10LE, YUV422P10LE, YUV444P10LE, NV20LE
Profiles: Baseline Profile (Level 6.2), Main Profile (Level 6.2), High Profile (Level 6.2), High 10 Profile (Level 6.2), High 4:2:2 Profile (Level 6.2), High 4:4:4 Profile (Level 6.2)


>>>>>> FindVideoDecoder - MediaType: hevc, Mode: 2
Info FindVideoDecoder - Checking: 'DXVA2 Intel® HD Graphics 620 - H.265 (HEVC)' (Priority: 100)
Info FindVideoDecoder - Check successful - selecting 'DXVA2 Intel® HD Graphics 620 - H.265 (HEVC)'

>>>>>> FindVideoEncoder - Media: h264, UseHardwareCodecs: True, Mode: 2
Info FindVideoEncoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.264 (AVC)' (Priority: 100)
NoMatch Frame width (3840) exceeds maximum supported width (1920)
NoMatch Encoder does not support input stream
NoMatch FindVideoEncoder - Encoder does not match
Info FindVideoEncoder - Checking: 'libx264 Software Encoder' (Priority: 0)
Info Encoder supports input stream
Info FindVideoEncoder - Check successful - selecting 'libx264 Software Encoder'

>>>>>> FindVideoDecoder - MediaType: hevc, Mode: 2
Info FindVideoDecoder - Checking: 'DXVA2 Intel® HD Graphics 620 - H.265 (HEVC)' (Priority: 100)
Info FindVideoDecoder - Check successful - selecting 'DXVA2 Intel® HD Graphics 620 - H.265 (HEVC)'

>>>>>> FindVideoEncoder - Media: h264, UseHardwareCodecs: True, Mode: 2
Info FindVideoEncoder - Checking: 'QuickSync Intel® HD Graphics 620 - H.264 (AVC)' (Priority: 100)
NoMatch Frame width (3840) exceeds maximum supported width (1920)
NoMatch Encoder does not support input stream
NoMatch FindVideoEncoder - Encoder does not match
Info FindVideoEncoder - Checking: 'libx264 Software Encoder' (Priority: 0)
Info Encoder supports input stream
Info FindVideoEncoder - Check successful - selecting 'libx264 Software Encoder'

 

Please use -profile:a or -profile:v, -profile is ambiguous
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (ac3 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
[libx264 @ 000001716287c440] using SAR=1/1
[libx264 @ 000001716287c440] frame MB size (240x120) > level limit (8192)
[libx264 @ 000001716287c440] DPB size (4 frames, 115200 mbs) > level limit (1 frames, 32768 mbs)
[libx264 @ 000001716287c440] VBV buffer (77464) > level limit (62500)
[libx264 @ 000001716287c440] MB rate (690509) > level limit (245760)
[libx264 @ 000001716287c440] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 000001716287c440] profile Main, level 4.1, 4:2:0, 8-bit
[libx264 @ 000001716287c440] 264 - core 157 r2935 545de2f - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=1 deblock=1:0:0 analyse=0x1:0 me=dia subme=0 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=4 chroma_me=0 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=1 keyint=250 keyint_min=23 scenecut=40 intra_refresh=0 rc_lookahead=10 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=38732 vbv_bufsize=77464 crf_max=0.0 nal_hrd=none filler=0 ip_ratio=1.40 aq=1:1.00
[segment @ 0000017162c7f4c0] Opening 'C:\Users\ThePlayaa\AppData\Roaming\Emby-Server\programdata\transcoding-temp\9d9b2a57f754ca5a9c85e4d619c64ec90.ts' for writing
Output #0, segment, to 'C:\Users\ThePlayaa\AppData\Roaming\Emby-Server\programdata\transcoding-temp\9d9b2a57f754ca5a9c85e4d619c64ec9%d.ts':
Metadata:
encoder : Lavf58.12.100
Stream #0:0: Video: h264 (libx264), yuv420p, 3840x1920 [sAR 1:1 DAR 2:1], q=-1--1, 23.98 fps, 90k tbn, 23.98 tbc
Metadata:
encoder : Lavc58.18.100 libx264
Side data:
cpb: bitrate max/min/avg: 38732000/0/0 buffer size: 77464000 vbv_delay: -1
Stream #0:1(ger): Audio: mp3 (libmp3lame), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
encoder : Lavc58.18.100 libmp3lame
frame= 14 fps=0.0 q=0.0 size=N/A time=00:00:00.88 bitrate=N/A speed=1.74x
frame= 26 fps= 25 q=28.0 size=N/A time=00:00:01.39 bitrate=N/A speed=1.35x
frame= 38 fps= 23 q=28.0 size=N/A time=00:00:01.92 bitrate=N/A speed=1.18x

 

So i have Hardware decoding power but not encoding power. In both secnarios it said that it exceed the maximum resolution input for h264 AVC.

Any suggestion is welcome :)

Edited by Am0kSepp
Link to comment
Share on other sites

Hi there @@Am0kSepp, please try a different file that isn't 4k input. Or lower the in-app quality setting to a 1080p value. Your CPU can't convert to 4K, so that is why it is using software.

  • Like 1
Link to comment
Share on other sites

Hi there @@Am0kSepp, please try a different file that isn't 4k input. Or lower the in-app quality setting to a 1080p value. Your CPU can't convert to 4K, so that is why it is using software.

 

Can you confirm that the Intel Iris 640 can handle this? Or can you send me the ffmpeg cli command to check by my self. thanks.

Edited by Am0kSepp
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...