Jump to content

Tone-mapping in transcoding HDR for playback on SDR screens??


griffindodd
Go to solution Solved by Luke,

Recommended Posts

Very nice! To offer some stats for Linux if needed:
Running PopOS (Ubuntu) on an Intel NUC7i5 (Kaby Lake Iris Plus 640).

This setup seems to only be able to handle 1 HDR->SDR transcode at a time. (Not an issue for me)

With Tonemapping ON an Animated Movie but with low bitrate (4K HDR~12Mb/s) to 1080 20Mb/s:

Spoiler

tonemapon.PNG

With Tonemapping OFF an Animated Movie but with low bitrate (4K HDR~12Mb/s) to 1080 20Mb/s:

Spoiler

tonemapoff.PNG

With Tonemapping ON an "Normal" Movie with higher bitrate (4K HDR~28Mb/s) to 1080 20Mb/s:

Spoiler

tonemaponhigh.PNG

With Tonemapping OFF an "Normal" Movie with higher bitrate (4K HDR~28Mb/s) to 1080 20Mb/s:

Spoiler

tonemapoffhigh.PNG

For some reason these seem to behave very differently in terms of GPU usage.

 

EDIT: Put the images in spoilers

Edited by PontusN
  • Like 1
Link to comment
Share on other sites

@PontusN - I would need to see the ffmpeg logs to make some sense of it.

General Note: The bitrate doesn't have an influence on the tone mapping performance. Alone the target video resolution is relevant here.
(though, the source and target bitrates have an influence on decoding and encoding performance)

BTW: Do you see the QuickSync codecs detected? There's no tone mapping yet (will follow in 1 or 2 weeks), but I'm curious how the performance compares to VAAPI on various platforms (without tone mapping).

Link to comment
Share on other sites

38 minutes ago, PontusN said:

For some reason these seem to behave very differently in terms of GPU usage.

Thanks for the logs. The difference is that the "men" get scaled down to 1080 (with an d without TM) and the "cars" are kept at 4k. 

I think, 39 fps for a 4k-to-4k transcoding with tone mapping is quite a good result..

Link to comment
Share on other sites

32 minutes ago, softworkz said:

Yea, that's what I meant. 

OK, thanks, it might not be in this version yet.

Didn't quite understand, what might not be in this version yet?

Link to comment
Share on other sites

32 minutes ago, PontusN said:

Didn't quite understand, what might not be in this version yet?

Out-of-the-box support for QuickSync or more precisely: Intel MediaSDK. 
(we always had this, but it required some user effort to set it up)

It is an alternative HWA layer for Intel CPUs which is  more feature-rich compared to VAAPI (which had a few advantages over QuickSync in turn).


On a Tiger Lake test system, I had recently seen a typical Emby transcoding running 1.5x faster compared to VAAPI and I'm curious to see whether those figures will hold up across different systems.

It's not yet included in the Linux package, but it probably will be in the next update.

Link to comment
Share on other sites

First sensations are very good, playback start is very fast and seeking too, even if target resolution is 4K. In that situation purples and oranges cannot play. Very good!

  • Like 2
Link to comment
Share on other sites

On Nvidia hardware, the TM performance is exceptional imo - as shown below, during the testing, I had EIGHT 4K HDR > 1080p SDR with TM simultaneous streams running from an unlocked GTX 1070 - 8Gb - a GPU from 2016 .. !

A huge well done to @softworkz for this impressive achievement.

 

cpu.png

4clients.png

4moreclients.png

Edited by rbjtech
  • Like 3
Link to comment
Share on other sites

5 hours ago, rbjtech said:

On Nvidia hardware, the TM performance is exceptional imo - as shown below, during the testing, I had EIGHT 4K HDR > 1080p SDR with TM simultaneous streams running from an unlocked GTX 1070 - 8Gb - a GPU from 2016 .. !

A huge well done to @softworkz for this impressive achievement.

I get similar performance on my GTX 1650 4Gb that's unlocked.  I run out of proper clients to test on once I hit 6 then have to start using Theater or web browsers "doubled up" which isn't the best way to test.  But the fact I have to do this to try and find a limit is great!

In the past and also with experience based on purple and orange tone mapping, I've ALWAYS advocated keeping separate 4K and 1080p versions of files. From a puritan standpoint I still think that may be best but softworkz has really thrown a monkey wrench (in a very good way) into this thinking for those with Nvidia GPUs and only in need of 4 or 5 transcodes (most family/friend installs of Emby).  Once there is more control over the process from the client 2 versions of movie/shows is likely a thing of the past for most people.

Link to comment
Share on other sites

1 hour ago, cayars said:

I get similar performance on my GTX 1650 4Gb that's unlocked.  I run out of proper clients to test on once I hit 6 then have to start using Theater or web browsers "doubled up" which isn't the best way to test.  But the fact I have to do this to try and find a limit is great!

I had assumed the limitation was GPU memory - with just less than 1Gb required per h/w transcode - so I'm surprised to see your 4Gb card managing 6.

Was this using CUVID as opposed to NVDEC which I understand uses less GPU memory ?

I hit 7.5Gb on my GTX 1070 (using NVDEC) - so didn't try to run #9 - perhaps I should have tried and see what it did .. hmmm.

https://www.elpamsoft.com/?p=Plex-Hardware-Transcoding

It would be interesting to find out the per transcode memory amount is - as by the looks of things, Emby is using significantly less memory per transcode vs the competition - according to the above link (under 'Streams for VRAM') is using 1.3Gb for a 4K (68Mbps) > 1080p (8Mbps) stream but emby must be using less than 1Gb for my results to stand.

@softworkz I'll dive into this a bit more myself, but are there any guidelines for GPU memory usage - presumably it's bitrate and resolution based but as emby is doing the TM 'internally' and not having to pass around data, the standard transcoding metrics still apply ?   Also -  I presume if you hit the memory restrictions on your GPU card, then it gracefully falls back to CPU ?  Thanks !

 

 

Edited by rbjtech
Link to comment
Share on other sites

6 hours ago, raudraido said:

Do I understand correctly that AMD cards wont work?

Yes, that's correct at this time. The situation is different between Windows and Linux, though:

  • Windows
    We don't even have hardware scaling or deinterlacing for AMD, because there are still a number of things missing in ffmpeg to get this working.
    ("this" means: transcoding using a full hardware pipeline)
    .
  • Linux
    We support AMD acceleration via AMD's VAAPI drivers; this allows full hardware pipeline processing for normal operations, but they don't have native tone mapping like Intel (for newer CPU models)
    But we will probably be able to provide hw tone mapping for AMD via OpenCL in the next or another subsequent update.

 

  • Like 1
Link to comment
Share on other sites

Personally, think such limitations should be pointed out when TM is announced. Better to be upfront and say what it does and doesn't work on than leave it to users, who are likely t announce their disappointment quite loudly.

Link to comment
Share on other sites

Anybody who is using AMD on Windows already knows that video processing (scaling, deinterlacing, color conversion w/o TM) doesn't work in hardware on Windows.

That's nothing new.

  • Like 1
Link to comment
Share on other sites

I'm probably reading it wrong, but I would have thought the reason raudraido was asking was because he didn't know.

Link to comment
Share on other sites

Very true. Am just saying not everybody will know (I didn't, I just happened to buy an Nvidia graphics card to go with my Ryzen processor), so it seems better to tell people of any limitations and maybe have them say fair enough, instead of them going, yay, tone mapping, and then being disappointed when it doesn't work.

Kind of the same with Vision, which I think you said will (mostly) work.

Its awesome stuff, and people are very impressed, am just suggesting an extra caveat or two.

Link to comment
Share on other sites

I think a lot of people get lucky and just don't purchase AMD GPUs as they know from using 3rd party utils it's not nearly supported like QuickSync or Nvidia cards.  Or maybe they checked our KB area and see no mention of AMD while Intel and Nvidia settings are shown. :)

Link to comment
Share on other sites

 

9 hours ago, arrbee99 said:

I'm probably reading it wrong, but I would have thought the reason raudraido was asking was because he didn't know.

I agree.
I knew that it probably wont work, but would be nice if pointed out clearly.

Right now, I have an AMD potato gpu on my Emby server and an Nvidia 1080ti waiting for switch, but it is on water cooling. 
Now seeking for air cooler because there is no way I could sleep when water would be in my server :)

  • Like 1
Link to comment
Share on other sites

10 minutes ago, raudraido said:

I agree.
I knew that it probably wont work, but would be nice if pointed out clearly.

This is still beta testing.

I'm sure we will eventually have a support matrix, indicating which tone mapping accelerations are supported on each platform.

Edited by softworkz
  • Like 1
Link to comment
Share on other sites

Intel VAAPI in a Docker container working fine for me. The final product will definitely need some algorithm options. Blacks are way too crushed for my tolerance level right now.

What's the logic around being able to always force software tonemapping? Is the reasoning similar to software transcoding, where software always offers a better result than hardware (except performance)?

Link to comment
Share on other sites

19 minutes ago, kribby said:

Intel VAAPI in a Docker container working fine for me. The final product will definitely need some algorithm options. Blacks are way too crushed for my tolerance level right now.

What's the logic around being able to always force software tonemapping? Is the reasoning similar to software transcoding, where software always offers a better result than hardware (except performance)?

There is no forcing of software tone mapping.

To be honest, I'm not sure whether the dropdown selection is a good choice. The original concept is what you can see in the Diagnostics Plugin. There are two checkboxes, one for enabling software tone mapping and another one for enabling hardware tone mapping:

image.png.5b12743dda69ad9b7d2d2e60232ea7e0.png

...which translates to: "use sw tone mapping when sw transcoding" and "use hw tone mapping when hw transcoding"
But those options have no influence on whether sw or hw transcoding is used.

Would the two-checkbox variant be more clear?

Edited by softworkz
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...