Jump to content

GPU Transcoding (Intel QuickSync and nVidia NVENC)


witteschnitte

Recommended Posts

Gerrit507

Basically get anything with a HD530 or HD630. It's the best bang for the buck for that kind of use-case imho.

 

I agree with you that only four 1080p h264 encoding streams with a GTX1060 is pretty disappointing. Maybe there is really a hardware limitation, like it has only a limited amount of encoding circuits compared to a Quadro.

Edited by Gerrit507
Link to comment
Share on other sites

brett.mclain@gmail.com

It just seems crazy to me that a GTX 1060 can get outperformed by the iGPU on a modern intel chip...I'll keep poking around trying to find out why the performance is so bad but I feel like I've looked into most potential problem areas.

Link to comment
Share on other sites

@@Luke @@softworkz

 

So is that patch going to make troubleshooting more difficult as users expecting more from their hardware?

 

I don't think that it will change much. People can also hit a physical limit with 2 high-bitrate stream transcodings when they have some older board or bad BIOS settings or slow network, or want to encode HEVC (in case we would support that at some time) or whatever. There are so many different types of limits that can be hit....

 

Regarding 'expectations': Things are already getting wrong in the very moment when someone tries to measure performance/expectation by means of "Number of simultaneous 1080p/h264 transcodings'. There's two parts involved: decoding and encoding:

  • Decoding Performance

    depends on the source material. A 1h 1080p h264 video can have a size of 1 GB or 10 GB

  • Encoding Performance

    same story here: That video can be encoded at very different bitrates, again resulting in either 1 GB or 10 GB

    Beyond output bitrate, there are a lot of different encoding options resulting in different output quality, where the required processing performance can easily vary by a factor of 10 or more

    (I hope we'll be able to allow more and better settings to configure those codec options in the future)

I'll follow up with more details on the actual case in a response to the OP  brett.mclain

Edited by softworkz
  • Like 1
Link to comment
Share on other sites

yepitro1986

If you're use Apple Devices you can use Infuse as your media player and Emby/Plex as your Media Server, it will save your Wallet to spend on hardware and your time only thing about transcoding

Link to comment
Share on other sites

I installed a GTX 1060 (3GB) card into my media server (i7-3770) and was trying to do hardware transcoding with plex but found out too late that they only support encoding, not decoding.

 

So I'm testing Emby now instead and at first I was pretty pleased with the performance but after further testing I'm a bit underwhelmed. I was able to do 4 x 1080p streams (x264) that were 18 gb files each, but one of the streams kept buffering, so really only 3. Then when I tried an HEVC Main 10 movie (15.5 mbps) it is super garbled. Server is mostly idling as the GPU is doing all the work.

 

My expectation was that I would be able to run 6 or 7 1080p streams without breaking a sweat on a GTX 1060. Does anyone have any ideas on what could be holding back the performance? The GPU's memory usage never goes above 2GB, and the voltatile GPU-Util stays below 15% as well.

 

It's a common misconception with Nvidia boards, to think that you would have all of your GPU's power available for transcoding videos. But this is not the case.

 

Instead, nvidia boards include additional integrated circuits ('chips') for video encoding and decoding. I suppose, the rationale behind this is to allow video playback and recording in games without affecting 3D performance. 

As a result, there is no direct relation between 3D performance and video performance. An older professional board may have better video performance than the latest GeForce - physically! So, buying a consumer gpu and unlocking the driver might not be as clever as it sounds.... Unfortunately I don't have any figures, though.

 

Back to Emby: At the side of Emby we have not fully optimized transcoding for nvidia yet. That means, data will be transmitted back and forth between main memory and GPU memory for both decoding and encoding, which is not the most perfect variant. 

 

We're working on this and you can expect some improvement in the near future, but you need to keep in mind that it's not the GPU cores that are doing the video acceleration (except some basic supporting tasks and CUDA operations, e.g. for hardware filtering).

 

 

BTW - For all those who were wondering about Intel's video performance: Intel is following an approach that is similar at least in a way that dedicated asics are doing the video processing.

That explains why video performance Intel/Nvidia can be similar for certain combinations even though they may strongly differ by 3D performance.

  • Like 1
Link to comment
Share on other sites

brett.mclain@gmail.com

It's a common misconception with Nvidia boards, to think that you would have all of your GPU's power available for transcoding videos. But this is not the case.

 

Instead, nvidia boards include additional integrated circuits ('chips') for video encoding and decoding. I suppose, the rationale behind this is to allow video playback and recording in games without affecting 3D performance.

As a result, there is no direct relation between 3D performance and video performance. An older professional board may have better video performance than the latest GeForce - physically! So, buying a consumer gpu and unlocking the driver might not be as clever as it sounds.... Unfortunately I don't have any figures, though.

 

Back to Emby: At the side of Emby we have not fully optimized transcoding for nvidia yet. That means, data will be transmitted back and forth between main memory and GPU memory for both decoding and encoding, which is not the most perfect variant.

 

We're working on this and you can expect some improvement in the near future, but you need to keep in mind that it's not the GPU cores that are doing the video acceleration (except some basic supporting tasks and CUDA operations, e.g. for hardware filtering).

 

 

BTW - For all those who were wondering about Intel's video performance: Intel is following an approach that is similar at least in a way that dedicated asics are doing the video processing.

That explains why video performance Intel/Nvidia can be similar for certain combinations even though they may strongly differ by 3D performance.

I'm glad Emby is working on it. I used to host a Plex server for about 15 people and bought the GTX1060 to help when 5 or 6 people are streaming at once. Unfortunately Plex only has linux nvidia hardware support for encoding, not decoding.

 

Thanks for clarifying that the GTX probably has an onboard asic or something similar that may have no correlation to it's 3d rendering abilities. Good to know!

 

With regards to "how many streams can I do at once with xyz hardware" I understand that can be frustrating due to the plethora of variables involved. It might be helpful to select 10-15 clips of varying quality/size/resolution/codec and ask users to benchmark how many they can transcode, or have some tool to measure performance. This would allow the community to crowdsource the info. For example have a few 480p videos, 720p, 1080p, 4k, and have each in x264 and x265, with a few different sizes and upload them up somewhere for us to download and test against. There would still be lots of variables but could help give ballpark figures.

Link to comment
Share on other sites

Thanks for clarifying that the GTX probably has an onboard asic or something similar that may have no correlation to it's 3d rendering abilities. Good to know!

 

It's a fact and it's true for all nvidia boards. Look here: https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

There's a column indicating the "number of chips". But that doesn't mean that it's always the same 'chip'. Of course there are different versions with different performance parameters depending on the board.

 

So, now that this point is understood, I can come to explain some additional specifics of nvidia hardware acceleration. 

On Windows, there's DXVA2 hardware decoding supported by nvidia as well. And this is not using NVDEC chips for decoding but actual GPU cores instead. Also there is no limit for the number of streams being decoded simultaneously.

(doesn't exist on Linux, though)

 

With regards to "how many streams can I do at once with xyz hardware" I understand that can be frustrating due to the plethora of variables involved. It might be helpful to select 10-15 clips of varying quality/size/resolution/codec and ask users to benchmark how many they can transcode, or have some tool to measure performance. This would allow the community to crowdsource the info. For example have a few 480p videos, 720p, 1080p, 4k, and have each in x264 and x265, with a few different sizes and upload them up somewhere for us to download and test against. There would still be lots of variables but could help give ballpark figures.

 

 

This is in fact a good idea and actually something that we're already planning for.

Link to comment
Share on other sites

  • 1 month later...
  • 1 month later...
Jdiesel

Any more information on this front? I was thinking of adding a GPU to my dual slot Xeon server to help out with transcoding but it sounds like slapping in a consumer Nvidia card might not be the best idea.

 

Would be nice to get some feedback from those running consumer Nvidia hardware with the restriction patch. Also how does AMD currently fare in this area?

Link to comment
Share on other sites

AmericanCrisis

I tried HW transcoding for the first time the other day. I think I liked it... It seems more energy efficient, at least I think...

 

I had a problem with Quicksync that I don't have with SW in that when the media file (movie in this case) ends, playback jumps back 10 minutes or so. Or, the file ends, but Emby doesn't send user back to media file info screen. Basically, screen goes black then on dashboard, it just shows file still playing. User has to manually stop it and press back button.

 

Anyhow, not a huge problem. Still usable. I turned it off for the time being. I'm still trying to determine what the OVERALL benefit to HW transcoding is over SW transcoding.

Link to comment
Share on other sites

  • 3 weeks later...
Jdiesel

Can anyone speak the the quality/performance differences between the Pascal and Turing NVENC implementations. I've found some information from the video game streaming community that suggests that there is a significant quality improvement in Turing chips.

 
Link to comment
Share on other sites

Quality improvements are unlikely because quality depends on how you configure the encoding (that's a way too big story to explain quickly).

Even with old GPUs or very weak CPUs it's not problem to achieve high quality encoding. It would just take its time and might be slower than realtime (playback speed).

 

The differences are rather about encoding performance, so you'll be able to do more simultaneous encodings or the same number of encodings with higher quality.

Link to comment
Share on other sites

Jdiesel

As someone who doesn't currently have hw transcoding setup but is in the market for a GPU to do so, is it possible to adjust the presets with NVENC like you can do with software encoding?

 

As someone who is more interested in quality over quantity this is something I would be wanting to take advantage of. If it is already possible, disregard.  

 

The article that I have been reading.

 

https://unrealaussies.com/tech/nvenc-x264-quicksync-qsv-vp9-av1/#Introduction 

Edited by Jdiesel
Link to comment
Share on other sites

We already have some internal prototyping for allowing codec specific quality settings (each hw codec implementation does this a bit different), so this is at least something we're having in the pipeline, but I cannot say anything about a timeline for this. @@Luke

Link to comment
Share on other sites

Yes he has a lot on his to do list, but perhaps later after some other things. Thanks.

Link to comment
Share on other sites

  • 2 months later...

The upcoming Emby Server 4.2 release will have improved hardware transcoding support for both RockPro64 and Raspberry Pi. Thanks guys.

Link to comment
Share on other sites

Lawrage

Just an update on this.

 

Have been using my Intel i5 in my media server up till now but was getting slow / stuttering with large transcodes / 4k mainly and all cores at 100%.

 

Bought a 1050ti, installed the patch to remove the 2 stream limit and tested 4 X 4k transcodes and all 4 played without issue, GFX at aroung 70%.

 

Very happy with the purchase.

  • Like 1
Link to comment
Share on other sites

Guest asrequested

I just added a GTX 1060 to my server, thinking that it would handle HEVC media better than my Threadripper. To my amazement, there was no difference. Except that using NVENC/NVDEC is almost unusable when playing in chrome (slower transcoding and freezing playback). Are there expected improvements to this in 4.2? 

Link to comment
Share on other sites

I just added a GTX 1060 to my server, thinking that it would handle HEVC media better than my Threadripper. To my amazement, there was no difference. Except that using NVENC/NVDEC is almost unusable when playing in chrome (slower transcoding and freezing playback). Are there expected improvements to this in 4.2? 

 

Please see how to report a media playback issue. thanks.

Link to comment
Share on other sites

Guest asrequested

I figured that was coming. I was intending to, I'm just in the middle of a bunch of things. When I'm finished, I'll try to run some specific testing and post my results with logs  :)

Link to comment
Share on other sites

  • 5 weeks later...
KamElita

all Intel processors from the last decade or even earlier use speculative execution, and thats where the exploits have been discovered.

Link to comment
Share on other sites

pir8radio

I figured that was coming. I was intending to, I'm just in the middle of a bunch of things. When I'm finished, I'll try to run some specific testing and post my results with logs  :)

 

1060 is a pretty weak card no?

Link to comment
Share on other sites

Guest asrequested

1060 is a pretty weak card no?

 

It's a pretty good card, but there are some bugs with how the server is handling HEVC over HLS, which appears to be much worse with HWA. So for my present usage, it's not practical.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...