Tone-mapping in transcoding HDR for playback on SDR screens??

May 5, 2021

No, only one. Tried again and this one gave more info for some reason. "Numerical Value out of range". I tried -0.2 to get more saturation, but this might not work as i though it would, saw someone in a previous post with negative values so thought it would work. It "works" with positive values, though I cant see any change no matter the value, 0.2, 1, 5, 50. What does this value represent?

ffmpeg-transcode-a86cf076-da9b-4cdc-b3d3-b5d043137522_1.txt

May 5, 2021

The behavior of those values is different for various implementations.

You need to try values like: 0.1, 0.01, 0.001, 0.0001

May 5, 2021

Ah I see, though setting below 0.01 is not possible due the number box using <input step="0.01">

May 5, 2021

I'm having some issues getting the tone mapping setting to show up. I've installed the diagnostic plug-in and I'm running with the following:

CPU: Intel(R) Celeron(R) G5920
GPU: Intel(R) UHD Graphics 610
Linux: 5.10.34-1-lts
Emby: 4.6.0.41 (Latest available on Arch User Repository)

I've made sure that I'm using the iHD driver but I get a blank screen under Tone Mapping Options:

image.png.75767486a6f7972b70c486f9fd0e879e.png

Edited May 5, 2021 by softworkz
Removed URL from image

May 5, 2021

@bweston6 - I removed the URL from you screenshot

Otherwise I'm not sure..could you try

a different browser
press F12 in the browser and look for errors in the console tab
do you see anything when you go to 'Diagnostic Options' or 'User Sessions'?

May 5, 2021

10 minutes ago, softworkz said:

@bweston6 - I removed the URL from you screenshot

Otherwise I'm not sure..could you try

a different browser

press F12 in the browser and look for errors in the console tab

do you see anything when you go to 'Diagnostic Options' or 'User Sessions'?

Thanks for the edit.

I just tried chromium and it is the same.
No errors/warnings in console.
User sessions shows fine.

This may be applicable from ffmpeg-transcode log:

>>>>>>  Affected codecs
Encoder VAAPI Intel Corporation Device 39848 - H.264 (AVC)
        Adapter #0: 'Intel Corporation Device 39848' Id:39848 (Driver: Intel iHD driver for Intel(R) Gen Graphics - 21.1.3 (), Vendor: Intel Corporation)
        Frame Sizes: 32x32...4096x4096
        Color Formats: NV12
        Profiles: Main Profile (Level 6), High Profile (Level 6), Constrained Baseline Profile (Level 6)

Decoder VAAPI Intel Corporation Device 39848 - H.265 (HEVC)
        Adapter #0: 'Intel Corporation Device 39848' Id:39848 (Driver: Intel iHD driver for Intel(R) Gen Graphics - 21.1.3 (), Vendor: Intel Corporation)
        Frame Sizes: max 8192x8192
        Color Formats: NV12, P010
        Profiles: Main Profile (Level 6.1), Main 10 Profile (Level 6.1)


>>>>>>  FindVideoEncoder - Media: h264, UseHardwareCodecs: True, Mode: Automatic
Info    Checking: 'VAAPI Intel Corporation Device 39848 - H.264 (AVC)'
Info    Check successful - selecting 'VAAPI Intel Corporation Device 39848 - H.264 (AVC)'

>>>>>>  FindVideoDecoder - MediaType: hevc, Mode: Automatic
Info    Checking: 'VAAPI Intel Corporation Device 39848 - H.265 (HEVC)'
Info    Check successful - selecting 'VAAPI Intel Corporation Device 39848 - H.265 (HEVC)'
Info    Tone Mapping would be desired, but hardware tone mapping is disabled

>>>>>>  Processing Plan
Info    Name                                        CanDoInHardware      WillDoInHardware     Reason                                  
Info    VAAPI Intel Corporation Device 39848 ... >> True                 True                 Hardware Codec                           
Info    VideoInput                               >> True                 True                 Same adapter (/dev/dri/renderD128), s... 
Info    Scaling                                  >> True                 True                                                          
Info    ToneMapping (when possible)              >> True                 True                                                          
Info    VideoOutput                              >> True                 True                 Hardware encoder                         
Info    VAAPI Intel Corporation Device 39848 ... >> True                 True                 Hardware Codec                           

>>>>>>  Projected Processing Formats
Info    Previous                HW-Context   Format       SW-Format           Size   Next
Info    hevc                 >> VAAPI        vaapi_vld    yuv420p10      3840x1754 >> scale_vaapi
Info    scale_vaapi          >> VAAPI        vaapi_vld    nv12            1280x584 >>

Edited May 5, 2021 by bweston6
Adding partial log

May 5, 2021

Could you please post the hardware detection log and the full ffmpeg log?

May 5, 2021

3 minutes ago, softworkz said:

Could you please post the hardware detection log and the full ffmpeg log?

Sure.

ffmpeg-transcode-b56a9367-e5d5-47bd-8ca7-ee91b284fb60_1.txt hardware_detection-63755846870.txt

May 5, 2021

Thanks. How did you install Emby - I mean, using which package?

May 5, 2021

Just now, softworkz said:

Thanks. How did you install Emby - I mean, using which package?

Here's the AUR link: https://aur.archlinux.org/packages/emby-server-beta/

May 5, 2021

OK. I've never seen this (but it's not my area anyway)

The point is: everything is working correctly! The ffmpeg that is included in that package is not capable to do any of those tone mappings.

Is there any reason why you can't install the regular package for Arch Linux?

May 5, 2021

1 minute ago, softworkz said:

OK. I've never seen this (but it's not my area anyway)

The point is: everything is working correctly! The ffmpeg that is included in that package is not capable to do any of those tone mappings.

Is there any reason why you can't install the regular package for Arch Linux?

I was under the impression that this feature is currently in the beta only. https://archlinux.org/packages/?name=emby-server is currently at 4.5.4.0-5.

May 5, 2021

I just checked back. In case of arch Linux you'll need to wait for the release, I'm afraid.

May 6, 2021

Just reading this for the first time. This will be great once it's ready. Really looking forward to it. Thanks for the hard work!

I run Emby server in a Docker container on my server. It has an MAD Threadripper 2950X for the CPU and no GPU currently. As of now, what kind of GPU would I need to allow for 5-6 simultaneous transcodes with 2-3 of those being tone mapped?

May 6, 2021

I won't make a recommendation but can say I run Emby Server on a 10 year old 1st gen i7 that doesn't even support quicksync. I have a a GTX 1650 turing GPU with the driver unlock and it will do what your asking for.

So that will give you an idea. So anything Nvidia of that or better should get the job done for you.

May 14, 2021

On 2/19/2021 at 4:11 AM, rbjtech said:

I had assumed the limitation was GPU memory - with just less than 1Gb required per h/w transcode - so I'm surprised to see your 4Gb card managing 6.

Was this using CUVID as opposed to NVDEC which I understand uses less GPU memory ?

I hit 7.5Gb on my GTX 1070 (using NVDEC) - so didn't try to run #9 - perhaps I should have tried and see what it did .. hmmm.

https://www.elpamsoft.com/?p=Plex-Hardware-Transcoding

It would be interesting to find out the per transcode memory amount is - as by the looks of things, Emby is using significantly less memory per transcode vs the competition - according to the above link (under 'Streams for VRAM') is using 1.3Gb for a 4K (68Mbps) > 1080p (8Mbps) stream but emby must be using less than 1Gb for my results to stand.

@softworkz I'll dive into this a bit more myself, but are there any guidelines for GPU memory usage - presumably it's bitrate and resolution based but as emby is doing the TM 'internally' and not having to pass around data, the standard transcoding metrics still apply ? Also - I presume if you hit the memory restrictions on your GPU card, then it gracefully falls back to CPU ? Thanks !

@softworkzwould you mind answering the above question?

Edited May 14, 2021 by cryzis

May 15, 2021

6 hours ago, cryzis said:

@softworkzwould you mind answering the above question?

Memory "usage" is a subject of extreme complexity as soon as you dive into this a bit more deeply (the following is very simplified):

Operating systems are typically doing memory management in a way that they are are providing memory that applications are requesting from the OS in a rather generous manner; especially when applications are "returning" memory, this is often left to be accounted to the application that was using it.
That behavior - which maximizes performance - changes only, when there's "memory pressure", i.e. when there's no longer an abundant amount of memory available. In those cases, the OS does something called "trim", which effectively takes memory that is still accounted to certain processes (even though already deallocated) and makes it available again for use by other processes. On Windows Desktop OS versions, you can sometimes watch this 'trimming' by looking at the memory usage metrics from a certain application in perfmon, where the usage suddenly drops for an application that has been active and focused and you minimize it (it's due to 'foreground application boost', on server OS variants the behavior is different).

The takeaway from this: those memory usage numbers don't really tell you what a process actually has allocated and needs to work at a certain point in time. It just tells what's currently accounted to that process. Most likely that's not much different with memory management on GPU boards. When a certain process shows 1 GB memory, it _could_ be that it really needs that amount, but it could also be that it only needs 500 MB and there just hasn't been a reason for the GPU to trim it away because there hasn't been a reason yet.

The only way to get closer to the truth, is not to stop once you're close to maxing out the memory usage, but instead try to add more and see what happens.

As for the memory requirements in a transcoding pipeline: Most video codecs use inter-frame compression, which means that there are dependencies between frames, and as such it's not possible to encode or decode those codecs just frame-by-frame. Both, at the encoding and decoding side, you need to have memory available for a set of uncompressed frames.
How many - that depends on the codec and it's encoding strategy. Not all frames need to be kept, only some are serving as reference for others.

Let's do some math: A 4k video frame has about 8M pixels. 10 bit per component (yuv) makes 30 bits, so you need 4 bytes per pixel. 4B * 8M = 32MB per frame. When we assume the decoder needs 4 frames and the encoder 8, that makes 384MB, but at this point I don't know enough about the implementation details. It could also be that the decoder needs 8 frames and the encoder 16, then we already got 768 MB, making a figure of 1 GB realistic.

But that's 4k to 4k. For 4k10bit to 1080 yuv420p, it's different: We have only 2M pixels and 2 bytes per pixel, makes 4MB per frame, so in that case, I wouldn't expect that 1 GB of memory would be really required.

For the in-between processing of video frames, you can assume something like 2-4 full frames in memory to be required for each filter, so that doesn't make a big change, and that's also the reason why it doesn't make much difference whether it's with or without tone mapping. TM is more about computing resources than about memory.

Edited May 15, 2021 by softworkz

May 15, 2021

8 hours ago, softworkz said:

Memory "usage" is a subject of extreme complexity as soon as you dive into this a bit more deeply (the following is very simplified):

Operating systems are typically doing memory management in a way that they are are providing memory that applications are requesting from the OS in a rather generous manner; especially when applications are "returning" memory, this is often left to be accounted to the application that was using it.
That behavior - which maximizes performance - changes only, when there's "memory pressure", i.e. when there's no longer an abundant amount of memory available. In those cases, the OS does something called "trim", which effectively takes memory that is still accounted to certain processes (even though already deallocated) and makes it available again for use by other processes. On Windows Desktop OS versions, you can sometimes watch this 'trimming' by looking at the memory usage metrics from a certain application in perfmon, where the usage suddenly drops for an application that has been active and focused and you minimize it (it's due to 'foreground application boost', on server OS variants the behavior is different).

The takeaway from this: those memory usage numbers don't really tell you what a process actually has allocated and needs to work at a certain point in time. It just tells what's currently accounted to that process. Most likely that's not much different with memory management on GPU boards. When a certain process shows 1 GB memory, it _could_ be that it really needs that amount, but it could also be that it only needs 500 MB and there just hasn't been a reason for the GPU to trim it away because there hasn't been a reason yet.

The only way to get closer to the truth, is not to stop once you're close to maxing out the memory usage, but instead try to add more and see what happens.

As for the memory requirements in a transcoding pipeline: Most video codecs use inter-frame compression, which means that there are dependencies between frames, and as such it's not possible to encode or decode those codecs just frame-by-frame. Both, at the encoding and decoding side, you need to have memory available for a set of uncompressed frames.
How many - that depends on the codec and it's encoding strategy. Not all frames need to be kept, only some are serving as reference for others.

Let's do some math: A 4k video frame has about 8M pixels. 10 bit per component (yuv) makes 30 bits, so you need 4 bytes per pixel. 4B * 8M = 32MB per frame. When we assume the decoder needs 4 frames and the encoder 8, that makes 384MB, but at this point I don't know enough about the implementation details. It could also be that the decoder needs 8 frames and the encoder 16, then we already got 768 MB, making a figure of 1 GB realistic.

But that's 4k to 4k. For 4k10bit to 1080 yuv420p, it's different: We have only 2M pixels and 2 bytes per pixel, makes 4MB per frame, so in that case, I wouldn't expect that 1 GB of memory would be really required.

For the in-between processing of video frames, you can assume something like 2-4 full frames in memory to be required for each filter, so that doesn't make a big change, and that's also the reason why it doesn't make much difference whether it's with or without tone mapping. TM is more about computing resources than about memory.

@softworkz awesome write up thank you for taking time to explain this in detail!

Looks like people to try and max out their sessions to see how far we can push the hardware.

May 15, 2021

Hi All, I'm considering moving away from PLEX because of a couple of reasons. Just to be sure, do I need a Premium subscription to test tone mapping? FYI running Windows 10 Pro on a 4790K + nVidia P1000. Cheers.

May 15, 2021

Just now, ProMace67 said:

Hi All, I'm considering moving away from PLEX because of a couple of reasons. Just to be sure, do I need a Premium subscription to test tone mapping? FYI running Windows 10 Pro on a 4790K + nVidia P1000. Cheers.

HW transcoding requires Emby Premiere. It you don't have Emby Premiere, you can only test SW transcoding.

May 15, 2021

1 minute ago, vdatanet said:

HW transcoding requires Emby Premiere. It you don't have Emby Premiere, you can only test SW transcoding.

Thanks, I confirmed that the server is doing software transcoding by checking Task Manager and GPU-Z. Specifically with regard to tone mapping, will I get the respective server configuration items when subscribing to Premiere? And to what extent can I expect tone mapping to be offloaded to my P1000?

May 15, 2021

Never mind, I decided to take the plunge and bought a lifetime sub. All options are now available. And from what I've tested so far, it's working brilliantly! This, in combination with the Live TV stuff that appears to be a hell of a lot more flexible than PLEX's offering, is making me a happy camper.

May 15, 2021

Happy camping.

May 15, 2021

Welcome. Let us know if we can help with anything else.

Thanks.

July 10, 2021

Scratch that...I think it's the original media with the problem.

Edited July 10, 2021 by jlficken

Sign In

Tone-mapping in transcoding HDR for playback on SDR screens??

Recommended Posts

PontusN 12

softworkz 4568

PontusN 12

bweston6 0

softworkz 4568

bweston6 0

softworkz 4568

bweston6 0

softworkz 4568

bweston6 0

softworkz 4568

bweston6 0

softworkz 4568

bozrdnag 75

Carlo 4552

cryzis 7

softworkz 4568

cryzis 7

ProMace67 4

vdatanet 1586

ProMace67 4

ProMace67 4

Carlo 4552

ebr 15664

jlficken 3

Create an account or sign in to comment

Create an account

Sign in

Activity