Jump to content

GPU Transcoding (Intel QuickSync and nVidia NVENC)


witteschnitte

Recommended Posts

Here's the same TV stream with QS enabled on 3770k. Noticed the CPU load is HIGHER using QS! Maybe it's just too old to judge by. But it's the same generation (I think) as the i3-3225 someone uses with success!? These system differences are strange. Can it be because my system is confused by having both a GTX970 and the HD4000?

 

 

59cab58cf3089_QS_enable.jpg

Link to comment
Share on other sites

Guest asrequested

OK TV streams are much less than what I tested. I'll run a test with Live TV, tonight. The test I did was to give it as much load as I could. I'll test live TV transcoding at 2 Mb/s and post the results. Then you can have a good comparison.

 

You should use GPU-z to monitor your GPU.

Edited by Doofus
Link to comment
Share on other sites

lifespeed

OK TV streams are much less than what I tested. I'll run a test with Live TV, tonight. The test I did was to give it as much load as I could. I'll test live TV transcoding at 2 Mb/s and post the results. Then you can have a good comparison.

 

You should use GPU-z to monitor your GPU.

I thought it took more processing to transcode to a lower bitrate, transcoding to a higher bitrate actually requires less horsepower.  BICBW.

Link to comment
Share on other sites

I have a 1050Ti too. My card handles two 4K hevc 10 bit streams at the same time with ease. Although I've found out that NVDEC is unable to handle HDR streams with interlaced 4:2.0 chroma subsampling a.k.a 4:2:0 Type 2. It produces atrifacts and choppy output. So I guess your issue is either caused by the input file or is some driver issue.

 

By the way: Which hw-acceleration do you have selected in MPC? Emby uses NVDEC/NVENC and I think standard for MPC is DXVA.

 

I've tried both DXVA2 (both options) and NVDEC in MPC on the server to troubleshoot, and they're about the same.  What I'm testing isn't HDR since I don't have an HDR TV yet, only my phone supports HDR.  But yeah I still lean towards it being some weird Windows Server quirk.

Link to comment
Share on other sites

Guest asrequested

I thought it took more processing to transcode to a lower bitrate, transcoding to a higher bitrate actually requires less horsepower. BICBW.

I shall endeavor to ascertain what is actually happening, tonight. I'll run various bitrate transcodes and compare them. I may be wrong, it's happened before... Did I say that out loud?

 

But if you think about it. Transcoding is laborious. First the data stream is analyzed using an algorithm to figure what data to remove. So the more data, the more work. The files I tested with, were all around 35 Mb/s. Live TV is around 8-12 Mb/s. So it would stand to reason that it takes more to analyze the files I used as opposed to a Live TV stream. Then it has to compress it. At 10 Mb/s that's still a lot of data to write. Have you tried copying a 9GB file to another drive? Its not instant, and that's with no processing.

Edited by Doofus
Link to comment
Share on other sites

Waldonnis

I thought it took more processing to transcode to a lower bitrate, transcoding to a higher bitrate actually requires less horsepower.  BICBW.

 

I wish it were true...I could make lossless and near-lossless encodes without sidelining my machine for a day  :P 

 

It's actually the opposite due to how many encoders work and how they handle compression.  Lower bitrates = less detail to worry about/preserve/account for = less work do = faster encoding (generally).  Things get more complicated when you start adding encoder options and presets to the mix, of course.  It seems counter-intuitive until you think about how encoding a DVD source is faster than encoding a BRD source (reduced resolution naturally means a reduced bitrate, which is why some downscale before encoding when resolution doesn't matter as much as encoding speed).

Link to comment
Share on other sites

Waldonnis

Here's the same TV stream with QS enabled on 3770k. Noticed the CPU load is HIGHER using QS! Maybe it's just too old to judge by. But it's the same generation (I think) as the i3-3225 someone uses with success!? These system differences are strange. Can it be because my system is confused by having both a GTX970 and the HD4000?

 

I've noticed ffmpeg has issues detecting the iGPU when a dGPU is present in some circumstances (more pronounced in more recent builds).  I don't use hardware transcoding with Emby, so this is all just my experience doing manual transcodes (full disclosure).  Others have reported that it works fine when the iGPU is the only GPU present, but I can't confirm.

 

I keep meaning to look at it to see what's going on in the source code, but haven't had the time (too busy dodging a hurricane and dealing with the aftermath  :P ).  I also have a 970 along with an IB CPU and have problems with it falling back to software encoding silently when I try to force QS decoding.  There's a workaround, but it's rather arcane, system-specific, and hasn't been too reliable for me so I can't recommend it.

Link to comment
Share on other sites

Guest asrequested

So here I tested playback of 2 live tv shows forced to a 2 Mb/s bandwidth. 

 

59cb0885dca05_Snapshot_234.jpg

 

 

With one show playing. You need to know that the GPU base clock speed is 350MHz, so no change. And take a look at the clock speed of the CPU

 

59cb08df455c7_Singletvstreamtranscode.jp

 

 

With both streams playing. Here we see a much higher CPU usage. The GPU usage also increases. Where it says 650MHz, it alternates between the base clock of 350MHz and 650MHz

 

59cb09a4cd4d9_Twotvstreamstranscodingat2

 

 

An important thing to note between the two tests that I posted is that Live Tv transcoding is done in real time. Whereas with transcoding a movie is done (with throttling enabled) in large chunks, and also the bitrate is much higher.

Edited by Doofus
Link to comment
Share on other sites

For all the GPU fans out there you should try our newly announced Asustor and NetGear ReadyNAS packages:

 

https://emby.media/nas-server.html

 

And TerraMaster in testing:

 

https://emby.media/community/index.php?/topic/50855-emby-for-terramaster-nas/

 

We specifically tested VAAPI with all three of these packages. Enjoy.

Link to comment
Share on other sites

I've noticed ffmpeg has issues detecting the iGPU when a dGPU is present in some circumstances (more pronounced in more recent builds).  I don't use hardware transcoding with Emby, so this is all just my experience doing manual transcodes (full disclosure).  Others have reported that it works fine when the iGPU is the only GPU present, but I can't confirm.

 

I keep meaning to look at it to see what's going on in the source code, but haven't had the time (too busy dodging a hurricane and dealing with the aftermath  :P ).  I also have a 970 along with an IB CPU and have problems with it falling back to software encoding silently when I try to force QS decoding.  There's a workaround, but it's rather arcane, system-specific, and hasn't been too reliable for me so I can't recommend it.

 

 

The inability to determine gpu sounds plusible in my case. An interesting observation.

 

I thought the 970 generation of CPUs (I have an 870 in one of the PCs) is too old to have QS?

Link to comment
Share on other sites

Waldonnis

The inability to determine gpu sounds plusible in my case. An interesting observation.

 

I thought the 970 generation of CPUs (I have an 870 in one of the PCs) is too old to have QS?

 

I was referring to the GTX-970, which supports NVENC...sorry, I should've been specific about that (long day).  Your system is pretty much equal to mine, from the sound of it - HD-4000 (IB i5-3570k, in my case) + GTX-970  :P 

 

NVENC does a decent job and is quite fast, so it may be worth a try since you have that GTX-970 in the system already.  Codec support is slightly better with the GTX-970 compared to the HD-4000 built into the CPU: the GTX-970 can encode HEVC Main as well (it cannot decode HEVC at all in hardware, however)...but transcoding to HEVC is still uncommon for most and probably doesn't matter in this case.  Downside is the 2-stream limit, so that's something to consider...but I believe Emby falls back to software encoding now if ffmpeg isn't able to init the hardware context.

 

As for the QS detection issue, there was a trac ticket about it, but I think it was closed for a reason that I just can't recall and would have to dig it up again.  I remember feeling like the problem wasn't really acknowledged fully as a problem because the reporter suspected a specific patch caused it that couldn't have (it just added the output of warning messages).  I'll have to look that up again and refresh my memory.

Link to comment
Share on other sites

I don't use the iGPU for my monitors, just the GTX-970.

I think the iGPU was activated when I upgraded to win10, as I had disabled it in Win7. Maybe I should try to disable the iGPU driver / device, and see how Emby then transcodes when using the GTX-970. any other ideas to dampen my jealousy of the i3-3225 owner who has this great performance?  :P

Link to comment
Share on other sites

Waldonnis

I don't use the iGPU for my monitors, just the GTX-970.

I think the iGPU was activated when I upgraded to win10, as I had disabled it in Win7. Maybe I should try to disable the iGPU driver / device, and see how Emby then transcodes when using the GTX-970. any other ideas to dampen my jealousy of the i3-3225 owner who has this great performance?  :P

 

I use the GTX-970 for both of my monitors as well.  I do have the iGPU connected to a television, but rarely use it since my main monitor is much better than the television (and calibrated).  Transcodes on my nVidia card are actually faster compared to QuickSync on this generation CPU, so I use it periodically for testing or when I just need to encode something quickly.  I actually also use QuickSync at times too, but only because it offers some options that nVidia's solution doesn't have.

 

If you're bent on trying QuickSync, though, one thing you can try is to disable hardware decoding.  ffmpeg seems to work fine when I just use the encoder side of QuickSync and only run into problems when I add QS hardware decoding to the mix.

Link to comment
Share on other sites

I don't use the iGPU for my monitors, just the GTX-970.

I think the iGPU was activated when I upgraded to win10, as I had disabled it in Win7. Maybe I should try to disable the iGPU driver / device, and see how Emby then transcodes when using the GTX-970. any other ideas to dampen my jealousy of the i3-3225 owner who has this great performance?  :P

After reading through all the posts about transcoding at 10mbit/s, I tried again with my server, because I tried with my normal streaming size of 5mbit/s before. My i3 3225 is still able to maintain stutter free 3 streams of 10mbit/s, transcoded from fairly high bandwith material, no movie is under 15GB in size. The CPU usage is a little bit higher @ 75-80%, though.

 

cheers

Link to comment
Share on other sites

Guest asrequested

After reading through all the posts about transcoding at 10mbit/s, I tried again with my server, because I tried with my normal streaming size of 5mbit/s before. My i3 3225 is still able to maintain stutter free 3 streams of 10mbit/s, transcoded from fairly high bandwith material, no movie is under 15GB in size. The CPU usage is a little bit higher @ 75-80%, though.

 

cheers

Try movies that are 35 Mb/s with Dolby Atmos. I should test with small 15 Mb/s movies and see what I get. What Ffmpeg build are you using?

Link to comment
Share on other sites

Try movies that are 35 Mb/s with Dolby Atmos. I should test with small 15 Mb/s movies and see what I get. What Ffmpeg build are you using?

Sure, I'll do that but it will take me until the weekend. I've got much to do for university.

Link to comment
Share on other sites

Now I disables my iGPU (HD4000), and tried the nvidia transcode (GTX970). Not much of a difference, if any.

 

However, something made me wonder.

 

Initially, the Nvidia transcode caused regular stutter at 4 mbps / 720p from a live TV signal. The stutter was in sync with the peaks of the CPU load (shown below).

59cc074c27920_GTX970_4mbps_720p.jpg

Link to comment
Share on other sites

Then, I tried to pause the stream playback in the Emby web player.

After a bit, I resumed the TV watching (transcoded), and the peaks were gone, and so was the stutter!?

59cc07b8a2ee0_GTX970_4mbps_720p_afterpau

Link to comment
Share on other sites

I noticed that the system was still running with fans very active during the paused tv, and saw the CPU load fairly high. (picture)

 

Is that because the PC keeps transcoding to a buffer (if it does so?), and the peaks don't cause problems when it's all buffered?

 

 

59cc084806ecb_GTX970_4mbps_720p_duringne

Link to comment
Share on other sites

I played some more... Stuttering is highly dependent on a pause buffer to even out the CPU peaks. HW decoding is disabled in all cases.

 

Is there a way to adjust the transcoding buffer?

Link to comment
Share on other sites

Guest asrequested

Could it be related to the issue reported here?

 

It does appear to be similar. I hadn't considered it an issue, but I guess it is. The CPU does get used excessively more when you transcode a second stream. I'll have to run tests and post the results and logs in that thread.

 

I think I'm going to building a Ryzen server in the not too distant future. This won't matter, then.

Edited by Doofus
Link to comment
Share on other sites

Can the transcode and/or playback buffer size be changed?

 

As you can see from the above poste, my CPU doesn't seem to max out, so the stuttering probably comes from some other bottle-neck, which is evened out if I have paused the playback for a bit, and then resume.

 

 

Anyone who knows if the live TV stream is made of packets that have some extra data at certain intervals, which could explain the load peaks from live TV?

Link to comment
Share on other sites

Guest asrequested

Hardware Acceleration? I'm over it. I was just doing some testing with both Beta (i5 6500) and stable (i7 6700k) servers. I was getting very little benefit, and encountered several playback issues. One of which was playback froze on one device (Theater desktop on my HTPC). I tried all testing again with HWA off, perfect playback. I had a maxed out CPU, with and without HWA. Ryzen Threadripper, here I come!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...