Jump to content

GPU Transcoding (Intel QuickSync and nVidia NVENC)


witteschnitte

Recommended Posts

wehavetogoback

thanks for putting up with my stupidity. 

appreciate it.

Yes it is. See image below but take notice of the warning in the yellow box!  ;)

 

54a2e0db55ae1_Capture.jpg

Link to comment
Share on other sites

Latchmor

thanks for putting up with my stupidity. 

appreciate it.

 

Nothing stupid about asking for help  :)

 

As an extra note to above, I've been using the Dev builds since day one and I haven't had too many issues to speak of but my MBS is currently a 1 (sometimes 2) user setup so only I'd get affected if anything did break. If you have all the family using yours then just be aware of frequent updates and possible issues.

 

I'm not trying to put you off as I'm sure the GPU Transcoding Dev would appreciate another tester.

 

Cheers

Edited by Latchmor
  • Like 2
Link to comment
Share on other sites

CashMoney

Well well, I wasn't expecting that. Thats a pretty big list of Intel CPU's. I'll have to check it when the server isn't busy, but I'm pretty sure my lowly old Celeron is on that list. This could be real handy. Thanks to everyone involved!

Link to comment
Share on other sites

dark_slayer

Just for clarity (my own and others) this is known to *work from an ffmpeg standpoint, but is still currently known to be broken within the server implementation's communication until mjb2000 gets a chance to sort out the necessary tweaks and update h264_qsv / ffmpeg

Edited by dark_slayer
Link to comment
Share on other sites

Latchmor

Just for clarity (my own and others) this is known to *work from an ffmpeg standpoint, but is still currently known to be broken within the server implementation's communication until mjb2000 gets a chance to sort out the necessary tweaks and update h264_qsv / ffmpeg

 

Yip I think that's correct based on mjb2000's last update  ;)

Link to comment
Share on other sites

somedude

Holy smokes, this makes me very excited!!!

I have the exact same board, as well (as the OP - the ASROCK J1900).

So far, I've REALLY liked it. It replaced an old AMD based mini-tower, and the current build is a much smaller, nicer, quieter little unit that I've been digging.

One of my concerns was moving Media Browser to it, especially as it relates to the transcoding. So far, it's been able to mostly work without issue, but THIS is something I've been waiting to hear about, and will GLADLY be giving this a try over the next couple of days.

In fact, I'm hoping to even test this afternoon, as I do occasionally put on some TV in the background while away from home, and this should make for a great test.

 

Also have a Roku, which may eventually be used in another room someday.

 

Basically, I took my main tuner - tried Media Browser on a PC in another room with cable attached, and streamed that to a Roku for testing.

Rebuilt "main" HTPC, put tuner back in it, and now don't really "need" the Roku to do live tv, but am still GREATLY interested in it. Live TV on browsers, however, I do still occasionally use (such as, at another PC in same house, via Android browser, or via IE at office).

 

THANK YOU for all the work on this! Will report back on how it works out for me :)

Currently transcode mainly live TV - both SD analog, and HD QAM and ATSC - Hauppauge 2250 is doing analog/QAM from cable, and a USB tuner is connected to antenna for ATSC, mostly as a "backup" nowadays, but very nice to have (5 tuners total in this system, which may ALSO eventually be fed HDHomerunPrime for "8" tuners :D ).

Link to comment
Share on other sites

dark_slayer

THANK YOU for all the work on this! Will report back on how it works out for me :)

Still in a temporary holding pattern as the current server switches and version of ffmpeg in this thread aren't able to effectively initiate a transcode just yet

 

I like the enthusiasm though. I know at least 6 veterans of avs who will love to hear this and use it on their server when it's sorted. Some people still vacationing and most haven't moseyed around here to find this thread yet

Edited by dark_slayer
Link to comment
Share on other sites

somedude

So far, no glitches.

 

Still seemed to peg my little j1900 CPU during Roku transcoding of live TV, but, DOES seem a little lower on web browser playback. Will have to do some further side-by-side comparisons, but so far, it doesn't seem to have caused any issues.

Edited by somedude
Link to comment
Share on other sites

So far, no glitches.

 

Still seemed to peg my little j1900 CPU during Roku transcoding of live TV, but, DOES seem a little lower on web browser playback. Will have to do some further side-by-side comparisons, but so far, it doesn't seem to have caused any issues.

 

The current setup is not offloading anything to the GPU yet.

Link to comment
Share on other sites

somedude

 

Great. On the dev server, it can be enabled via the following:

 

- Go to transcoding settings page and just click save. This will generate encoding.xml in the server config directory if it's not already there

- Shutdown the server

- Change H264Encoder to whatever his new encoder name is

 

If someone could find his thread and copy this info there that would be helpful. thanks.

 

 

Gotcha. Thank you! ...that's what I get for excitedly skimming the thread.

What about the above, though?

Link to comment
Share on other sites

   

 

Gotcha. Thank you! ...that's what I get for excitedly skimming the thread.

What about the above, though?

 

It turns out there is more required on the server end so that procedure doesn't work yet.

Link to comment
Share on other sites

mjb2000

Hi all

 

I have been looking at the changes required. The issues do seem to be related to the various arguments passed to libx264 being incompatible with h264_qsv For example "--preset [ultrafast-placebo]" should be a numeric range of "--preset [1-7]"

 

As far as I'm concerned it doesn't make much sense to edit the MB source to make it fit with this new encoder, instead, it makes sense to tweak the new encoder so it accepts command line arguments that are written for the most popular h264 encoder (libx264).

 

With this in mind I have made this latest release of ffmpeg.exe

 

So far I have made these change:

  • If "-preset [libx264 speed]" is specified, this is honoured by h264_qsv (It was originally expecting 1-7)
  • If "-level 40" is specified, this is honoured by h264_qsv (it was originally expecting 4.0)
  • If "-crf ##" is specified, this is treated by h264_qsv as the -qpb -qpi -qpp parameters (QuickSync allows different quant settings for i and b frames)

 

Issues (@@Luke can you help with this?)

 

Since I'm not able to test every device I'm a little unsure how the logic in BaseStreamingService.cs breaks down for the various different scenarios it's written for.

 

For now I've been looking at Android phone playback, which includes these parameters:

ffmpeg.exe -crf 23 -maxrate  -maxrate 212000 -bufsize 424000

Since I have now tweaked h264_qsv ,for now, the -crf command takes priority, but QuickSync doesn't support setting a max bitrate when doing quality based encoding, so I guess some logic will be required (within the qsv codec) to do standard VBR instead (ignore the crf parameter and generate a target bitrate based on the -maxrate). So Luke, do you agree with this approach? And how is the maxrate parameter that you're sending to ffmpeg generated? For example, what calculation should I do to the -maxrate to work out what I should use as -b:v (target bitrate).

 

Also have I missed other h264 settings? Does MB ever send other bitrate based parameters to ffmpeg (can you explain what these are and how I can replicate them for testing?)

 

Finally - I've also spotted a problem with frame size, my sample encode had this element in the ffmpeg command:

-vf "scale=min(iw\,426):trunc(ow/dar/2)*2"

I'm not an ffmpeg expert, so I'm not entirely sure what's going on here. What would it calculate the encoded frame size of my 1280x720 source? Whatever the answer is, it seems h264_qsv is not happy with it. Changing the equation to the following did work successfully.

-vf "scale=min(iw\,426):trunc(ow/dar/4)*4"

I'm guessing this is choosing the rounding that is used when selecting a frame size. This isn't something I can tweak within the realm of the QSV encoder, so it might need to be tweaked in the MB source - do you think a change like this could be implemented if it was conditional based on the h264 encoder used (libx264 or h264_qsv)?

  • Like 1
Link to comment
Share on other sites

the maxrate comes from the MBS client. and yea i suppose that could be done if you really can't handle it in the encoder.

Link to comment
Share on other sites

Great effort guys!

 

I just tried it out but also get errors in regards to parameters missing/not working.

 

Metadata:
    encoder         : libebml v1.2.0 + libmatroska v1.1.0
    creation_time   : 2011-03-19 02:56:47
  Duration: 01:48:07.52, start: 0.000000, bitrate: 10496 kb/s
    Stream #0:0(eng): Video: h264 (High), yuv420p, 1920x800 [sAR 1:1 DAR 12:5], 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)
    Stream #0:1(eng): Audio: dts (DTS), 48000 Hz, 5.1(side), fltp, 1536 kb/s (default)
    Metadata:
      title           : English DTS
[h264_qsv @ 036940e0] MFXInit(): -3
Output #0, hls, to 'C:\Users\Administrator\AppData\Roaming\MediaBrowser-Server\transcoding-temp\streaming\a47da6a9a496d2f1b02a3e5cfc5caf41.m3u8':
    Stream #0:0: Video: h264, none, q=2-31, 128 kb/s, SAR 1:1 DAR 0:0, 23.98 fps (default)
    Metadata:
      encoder         : Lavc56.14.100 h264_qsv
    Stream #0:1: Audio: aac, 0 channels, 128 kb/s (default)
    Metadata:
      encoder         : Lavc56.14.100 aac
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_qsv))
  Stream #0:1 -> #0:1 (dts (dca) -> aac (native))
Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
 
Keep up the good work :)
 
/Jacob
Link to comment
Share on other sites

dark_slayer

QuickSync doesn't support setting a max bitrate when doing quality based encoding

While knowing nothing about this . . . does it sort setting a max bitrate with other encoding types?

 

You mention quality based. Is there speed based? I know that quality vs speed is almost always an option for front end encoders and plex/mb3 . . .

Link to comment
Share on other sites

mjb2000

While knowing nothing about this . . . does it sort setting a max bitrate with other encoding types?

 

You mention quality based. Is there speed based? I know that quality vs speed is almost always an option for front end encoders and plex/mb3 . . .

 

Take a look at my comment on my pull request - I have asked Luke about the use of quality based encoding when we have a maximum bitrate for streaming.

 

The changes I have made, make h264_qsv focus on the bitrate (since this is a fixed ceiling) and the quality setting will adjust how much effort the GPU puts in to encoding each frame.

  • At the higher speed setting, it won't try too hard which means you'll get higher fps but lower quality. 
  • At the max quality setting, it will try as hard as possible to give a good picture (within the limits of the bitrate of course), which means you'll get a lower fps

 

I actually found that even at the max quality setting, my very low-powered CPU was not maxing out the GPU since the CPU was busy with other ffmpeg tasks (such as scaling and encoding the AAC audio). So although performance will vary for everyone, you might find that you get great results from the max quality setting and don't need to lower the quality (unless you have a crazy number of simultaneous transcodes).

Link to comment
Share on other sites

dark_slayer

 

 

I actually found that even at the max quality setting, my very low-powered CPU was not maxing out the GPU since the CPU was busy with other ffmpeg tasks (such as scaling and encoding the AAC audio). So although performance will vary for everyone, you might find that you get great results from the max quality setting and don't need to lower the quality (unless you have a crazy number of simultaneous transcodes).

Thanks for all the explanation

 

I am jumping the gun by asking, but have you patched the dev server in your own setup with the pull request? Have you then already gotten this working within your own setup? At least to android as you mentioned earlier . . .

 

I saw your pull request was merged :D

 

Not much pomp or circumstance being given here, but this has me overly excited. Probably in the top of many people's wish list (even if they didn't know it)

  • Like 1
Link to comment
Share on other sites

mjb2000

Yep - I patched the Dev branch and built it to test locally.

 

Let me know how it works for you once a new dev build is released.

 

There are still a few things I want to address in h264_qsv - in particular, I am getting some strange 'jumpy' frames when running on Windows 8 and the frame size is not resized (ie, it remains that same at the original source frame size) - Strangely, if the source video is resized then the encoded output is smooth (and it appears to work correctly under Windows 7).

 

I know this is much more temperamental than most other features in MB, so I can see this remaining as a hidden feature for some time. But hopefully with a few people testing we can iron out the issues.

Link to comment
Share on other sites

denethor

I stumbled upon this from blog page.To be honest I was following the topic but was little bit lazzy to contribute :wub:

First things first, It is really nice to see some stepped up for GPU offloading.

 

I watched lots of discussions over doom9 regarding to QuickSync. There were two main arguments from x264 guys. first; "Quicksync does not provide something more regarding to quality. X264 on ultrafast preset is good enough and faster than QS." And second problem was; "Intel does not allow low level full hardware access for developers this was really upsetting for X264 devs.(Dark Shikari for example) (You had to use Intel Media. SDK etc..) Besides that everybody knows that drivers and SDK are not the strongest muscles of Intel.

 

During those days (arround Q4 of 2011) Eric (Intel Engineer) showed up in Doom9 and implemented QuicSync in to ffdshow and than Nev used his code in LAV Video. Of course all these implementations were for decoding not for encoding.

 

I'm not sure what really changed after 3 years. Intel opened up source code of SDK but apparently it was still not good enough for X264 guys. But CPUs evolved and software encoding of X264 is getting moreand more faster. Maybe QuickSync provides more value for low level CPUs. I don't know for example NAS devices.

 

Personally I have more faith with OpenCL. Its wide open and also supported by Intel. MacPro is perfect example for that horsepower. I couldn't believe my eyes when I saw what you can do with FCPX with 4K editing. And AFAIK you can build ffmpeg with -enablle-opencl switch. But sadly there are also lots of discussions about real value of OpenCL in ffmpeg :( So looks like we still need more time to see what is next. Maybe x265 will change the paradigm.

 

I'm not asking for something or recommending any solution, just wanted to share my thoughts and little history. Because I tried lots of things -I mean LOTS of- for silent multi-stream encoding during 4 years and ended up with new Macpro. I think 1st law of thermodynamics applies here. It's really hard to get something more from same piece of silicon :)

 

Just my 2c

Edited by denethor
  • Like 1
Link to comment
Share on other sites

mjb2000

Hi denethor

 

I understand what you're saying, and I agree - the main advantage may only be for certain low power chips, but with more and more embedded and low power devices on the market (Baytrail and Intel Comute Stick) this might become more mainstream.

 

To be honest, the only reason I'm doing any of this is that my Intel CPU simply can't transcode using x264  in realtime, so it's a choice of use QuickSync, or don't transcode.

 

Support is patchy (Windows only at the moment), but it works for my purposes and if it offers a performance upgrade for other users then I'm happy to contribute my experiments :)

 

Oh - and I like your thermodynamics point, and to be honest, that is part of my motivation. Intel have built a quad-core chip with a GPU, so I want to start using that GPU to the best of its ability to make the most of the whole CPU/GPU package! The Intel J1900 is only 10W and using QuickSync is can transcode 1080p at around 70fps! (A little less once you through in resizing and audio encoding). That's a pretty good effort for a CPU which doesn't require a CPU fan :)

 

M

Edited by mjb2000
  • Like 1
Link to comment
Share on other sites

dark_slayer

I stumbled upon this from blog page.To be honest I was following the topic but was little bit lazzy to contribute :wub:

First things first, It is really nice to some stepped up for GPU offloading.

 

I watched lots of discussions over doom9 regarding to QuickSync. There were two main arguments from x264 guys. first; "Quicksync does not provide something more regarding to quality. X264 on ultrafast preset is good enough and faster than QS." And second problem was; "Intel does not allow low level full hardware access for developers this was really upsetting for X264 devs.(Dark Shikari for example) (You had to use Intel Media. SDK etc..) Besides that everybody knows that drivers and SDK are not the strongest muscles of Intel.

 

During those days (arround Q4 of 2011) Eric (Intel Engineer) showed up in Doom9 and implemented QuicSync in to ffdshow and than Nev used his code in LAV Video. Of course all these implementations were for decoding not for encoding.

 

I'm not sure what really changed after 3 years. Intel opened up source code of SDK but apparently it was still not good enough for X264 guys. But CPUs evolved and software encoding of X264 is getting moreand more faster.  Maybe QuickSync provides more value for low level CPUs. I don't know for example NAS devices

 

  1. QS is not better quality than x264 software-encode -- correct. This is the huge reason why they didn't care. They weren't making the product for Mediabrowser or Plex, and the fact that transcoding servers piggy-backed onto the encoder software didn't change a thing in terms of their priorities. The priority was quality, and the use case was a scene release uploader shrinking down their releases. Quality is paramount. However, with MB (note: only when transcoding) we really don't care that much about quality on remote transcodes or local transcodes for that matter . . . it definitely should be back seat IMO (and those that prefer quality over speed can plan their builds accordingly since preferring speed over quality works for over 90% of use cases)
  2. The speed isn't any better than x264 ultrafast -- I don't think this is correct. The primary reason for it's long-term exclusion was quality, but real-time transcoder priorities are different than average handbrake priorities. Even with an i7 I'd still love this because my server does more than just transcode, and I'd rather it have something left in the tank when multiple people are using my server remotely
Link to comment
Share on other sites

mjb2000

On the back of what dark_slayer said...

 

Yep - QuickSync isn't for quality, it's for speed. So I would not recommend using QS to encode for permanent storage. Take your time with a high quality "-preset slow" x264 encode and store the result long-term. But when it is for real-time encoding, people probably won't notice the slight loss in quality.

 

"Something left in the tank" - Along with my hope to use all the features of the chip (the GPU as well as the CPU), liked dark_slayer I'd like to keep my CPU relatively free to acomplish other tasks - In my case my Media Browser box also functions as a pfSense router, FreePBX phone system and OpenVPN endpoint. So I don't want these function to suffer just for x264 can do a slightly better job of transcoding :)

Link to comment
Share on other sites

denethor

The speed isn't any better than x264 ultrafast -- I don't think this is correct. The primary reason for it's long-term exclusion was quality, but real-time transcoder priorities are different than average handbrake priorities.

 

I'm not sure about the speed/quality for ultrafast since I didn't tested personally. Just quoted from 2011. I did a quick search for up to date test results. Looks like QS speed increased 3x overtime especially with the Haswell and now q264 is almost fast as x264 (for comparable quality of course). Only problem is >5Mbit 1080p videos. Speed/quality curve of q264 distorted above this level. (This is irrelevant in most cases for real-time)

 

 

"Something left in the tank" - Along with my hope to use all the features of the chip (the GPU as well as the CPU), liked dark_slayer I'd like to keep my CPU relatively free to acomplish other tasks - In my case my Media Browser box also functions as a pfSense router, FreePBX phone system and OpenVPN endpoint. So I don't want these function to suffer just for x264 can do a slightly better job of transcoding :)

 

Totally agree. In most cases we can sacrifice quality for speed on real time transcoding loads.(Plex/MB3) Especially with the mobile devices. Transcoder throttling is also crucial for these types of loads addition to GPU offload. No body needs 100fps trancoding performance for 24fps movie (Assuming that you have a stable connection and minimal caching is required)

 

What you guys think about OpenCL for guys like me? I do not have Quicksync CPU :rolleyes:

Edited by denethor
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...