Jump to content

GPU Transcoding (Intel QuickSync and nVidia NVENC)


witteschnitte

Recommended Posts

mjb2000

I think it was still trying to pass this off to h264_qsv looking at the log so I was wondering if there was a way to avoid that for now since it probably needs an h265_qsv instead or something. I thought I recalled qsv making a 265 encoder as well?

 

However yes, I will try a libx264 and h264_qsv test with hevc input and give you something to compare. I started getting less then 20fps for a while before I killed it

 

Your log files suggest it is not handing off the h265 to h264_qsv. Look for the line that says:

Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_qsv))

You should notice it says hevc (native) no matter which combo you try, since this is the source codec. I can only imagine that ffmpeg is struggling to decide the h265?

 

I have not seen an implementation of h265 for GPU transcoding in ffmpeg yet, correct me if I'm wrong.

 

M

Link to comment
Share on other sites

dark_slayer

Your log files suggest it is not handing off the h265 to h264_qsv. Look for the line that says:

Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_qsv))
You should notice it says hevc (native) no matter which combo you try, since this is the source codec. I can only imagine that ffmpeg is struggling to decide the h265?

 

I have not seen an implementation of h265 for GPU transcoding in ffmpeg yet, correct me if I'm wrong.

 

M

Hmm, okay well now that I've confirmed all my power settings were correct for QS to enable and switched the encoding XML back to h264_qsv I can confirm I'm getting some error still. I am on the Dev version you list in the wiki and have my auto update triggers deleted. I replaced the api just last night as well as ffmpeg and ffprobe (not using the GPU folder for now). Here is a transcode log, what else would help?

uploadfromtaptalk1422732435273.txt

Link to comment
Share on other sites

dark_slayer

Also, for h265_qsv no it turns out I was only glancing at headlines and misreading the handbrake release announcements which support 265 and quicksync -- as separate features though :D not together

Link to comment
Share on other sites

dark_slayer

Your log files suggest it is not handing off the h265 to h264_qsv. Look for the line that says:

Stream mapping:  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_qsv))
Take a look back at this post as well http://mediabrowser.tv/community/index.php?/topic/10723-GPU-Transcoding-(Intel-QuickSync-and-nVidia-NVENC)#entry172458

 

The first attached transcode log under the server log indeed indicated it tried

Stream mapping:  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_qsv))
Edited by dark_slayer
Link to comment
Share on other sites

mjb2000

Take a look back at this post as well http://mediabrowser.tv/community/index.php?/topic/10723-GPU-Transcoding-(Intel-QuickSync-and-nVidia-NVENC)#entry172458

 

The first attached transcode log under the server log indeed indicated it tried

Stream mapping:  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_qsv))

 

Yep - this is all normal. It is saying it will try to decode the 265 using ffmpeg and then encode to 264 using h264_qsv.

 

If it's not working, then I suspect it's an issue with the particular source file, or ffmpeg itself.

Link to comment
Share on other sites

dark_slayer

Okay, well pushing that aside for now can you check post 202? I was attempting to use media browser with a non hevc file and still not getting anything to playback. Modified DLL, ffmpeg, ffprobe, and encoding.xml

Link to comment
Share on other sites

mjb2000

Hmm, okay well now that I've confirmed all my power settings were correct for QS to enable and switched the encoding XML back to h264_qsv I can confirm I'm getting some error still. I am on the Dev version you list in the wiki and have my auto update triggers deleted. I replaced the api just last night as well as ffmpeg and ffprobe (not using the GPU folder for now). Here is a transcode log, what else would help?

 

What happens if you run these commands:

C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\ffmpeg\20150110\ffmpeg.exe -i file:"\\OLYMPUS\tRAID\Blu-Ray\Dracula Untold (2014)\Dracula Untold (2014).mkv" -t 30 -threads 0 -map 0:0 -codec:v:0 h264_qsv -preset 7 -b:v 20362000 -maxrate (20362000*1.2) -bufsize (20362000*2) -vsync vfr -level 4.2 -force_key_frames expr:gte(t,n_forced*6) -vf "scale=trunc(min(iw\,1920)/32)*32:trunc(min((iw/dar)\,1080)/32)*32:flags=fast_bilinear" "C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\transcoding-temp\test-output.mkv"

C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\ffmpeg\20150110\ffmpeg.exe -i file:"\\OLYMPUS\tRAID\Blu-Ray\Dracula Untold (2014)\Dracula Untold (2014).mkv" -t 30 -threads 0 -map 0:0 -codec:v:0 h264_qsv -b:v 4000k "C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\transcoding-temp\test-output.mkv"

The first is as close to the original as I can make it but avoiding audio and metadata processing. The second is very basic. 

 

If 1 fails but 2 works, try adding additional options from 1 until you find which one makes it fail

 

If 2 fails, try adding -s 1280x720 to force a frame size (don't add -vf scale and -s on the same command line).

Link to comment
Share on other sites

dark_slayer

What happens if you run these commands:

C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\ffmpeg\20150110\ffmpeg.exe -i file:"\\OLYMPUS\tRAID\Blu-Ray\Dracula Untold (2014)\Dracula Untold (2014).mkv" -t 30 -threads 0 -map 0:0 -codec:v:0 h264_qsv -preset 7 -b:v 20362000 -maxrate (20362000*1.2) -bufsize (20362000*2) -vsync vfr -level 4.2 -force_key_frames expr:gte(t,n_forced*6) -vf "scale=trunc(min(iw\,1920)/32)*32:trunc(min((iw/dar)\,1080)/32)*32:flags=fast_bilinear" "C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\transcoding-temp\test-output.mkv"

C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\ffmpeg\20150110\ffmpeg.exe -i file:"\\OLYMPUS\tRAID\Blu-Ray\Dracula Untold (2014)\Dracula Untold (2014).mkv" -t 30 -threads 0 -map 0:0 -codec:v:0 h264_qsv -b:v 4000k "C:\Users\Olympus Server\AppData\Roaming\MediaBrowser-Server\transcoding-temp\test-output.mkv"

The first is as close to the original as I can make it but avoiding audio and metadata processing. The second is very basic. 

 

If 1 fails but 2 works, try adding additional options from 1 until you find which one makes it fail

 

If 2 fails, try adding -s 1280x720 to force a frame size (don't add -vf scale and -s on the same command line).

 

Thanks mjb2000, really appreciate your time

 

I'll try adding things from 1 into 2, but as you maybe suspected 1) failed and 2) started creating an output file

 

1) fails with the following: "Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height"

 

After running 2 my fps seems wrong, it's running at like 0.5 fps for some reason. Is this expected? I can't tell for sure that my GPU is awake or asleep again. I'm running all these through teamviewer, and my server is running w8.1 x64

Link to comment
Share on other sites

Tranquil

For better testing you should try to avoid the use of vnc or teamviewer.

 

send with tapatalk from my mobile

Link to comment
Share on other sites

mjb2000

Thanks mjb2000, really appreciate your time

 

I'll try adding things from 1 into 2, but as you maybe suspected 1) failed and 2) started creating an output file

 

1) fails with the following: "Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height"

 

After running 2 my fps seems wrong, it's running at like 0.5 fps for some reason. Is this expected? I can't tell for sure that my GPU is awake or asleep again. I'm running all these through teamviewer, and my server is running w8.1 x64

 

I agree with Tranquil, when things aren't quite working, it's probably best to use a direct console session rather than anything remote to rule out any virtual display funny business.

 

I was playing around with a view that was natively 60fps and it seems QuickSync couldn't cope with 1080p @ 60fps. Specifying 30fps allowed it to work.

 

I can't see this being the problem with you command line 1 - since your video is natively 23.976 (reported as 23.98 in ffmpeg). But just to rule it out, try adding -r 24000/1001 to command 1 to force a 23.976 frame rate.

 

M

Link to comment
Share on other sites

mjb2000

We'll need to wait till @@mjb2000 to update the API dll.

 

Sorry - I'm struggling to keep up with the updates on the Dev release - Unfortunately it seems that most of the releases have involved changes to the Api.dll hence it fails as soon as the versions go out of sync.

 

Next time I will upload a complete release, so even if people are late to the party, they can use the entire set of compiled files to run the MB Server application.

 

Once the pull request is included, this should be a little easier.

 

@@Luke - Give me a shout if there is anything I can do to help.

 

M

Link to comment
Share on other sites

i've added some of the safe changes in for the next build. but the focus is on getting a beta out. you'll have to continue to supply a custom api dll until the release cycle is over.

  • Like 2
Link to comment
Share on other sites

Indypendence

 

...

 

-crf is a libx264 specific command. h264_qsv allows you to specify the quality of p, i and b frames separately, more info is available here. The commands you would need are:

ffmpeg -i in.mkv -map 0:0 -c:v h264_qsv -qpi 20 -qpb 20 -qpp 20 out.mkv

 

Thanks that worked flawlessly! Even though it seems to the naked eye that quicksync quality set to 25 results in a higher quality than when libx264 is set to crf 25. Interesting to play with :)

But thanks again!

 

Next problem I ran into, maybe someone had the problem before, sometimes when using qsv, it hangs and when you wait long enough, it will say: Timeout, device is so busy

 

All I can do at that point is close down the program and restart it. Anyone had this before?

Link to comment
Share on other sites

mjb2000

Thanks that worked flawlessly! Even though it seems to the naked eye that quicksync quality set to 25 results in a higher quality than when libx264 is set to crf 25. Interesting to play with :)

But thanks again!

 

Next problem I ran into, maybe someone had the problem before, sometimes when using qsv, it hangs and when you wait long enough, it will say: Timeout, device is so busy

 

All I can do at that point is close down the program and restart it. Anyone had this before?

 

These type of problems seem to be related to the GPU being 'powered off'

 

Can you describe your set-up? Do you have a monitor / TV attached to your GPU? Are you running MB as a user logged in to the console, or as a service? As much iinfo as you can provide would be helpful to try to identify the common elements that are causing the problem.

  • Like 1
Link to comment
Share on other sites

Indypendence

These type of problems seem to be related to the GPU being 'powered off'

 

Can you describe your set-up? Do you have a monitor / TV attached to your GPU? Are you running MB as a user logged in to the console, or as a service? As much iinfo as you can provide would be helpful to try to identify the common elements that are causing the problem.

 

It's a celeron G1840 @ 2.8 Ghz

 

There's a trick I've done to actually get it working in the first place, because it has a dedicated graphics card onboard as well: An ATI radeon 6850. It's my HTPC which I use for tv, gaming and now for testing before I decide I want to have one that's dedicated or not.

 

This means that initially, it didn't work at all, it gave me the well known error that quicksync was not available.

I'm gonna write it down all that I did to get it to run because it may help some others too. I'm using Windows 7 professional, 64 bit.

 

Because there was a dedicated graphics card present, the Intel Graphics card was disabled by the bios. So I enabled this in the bios first.

Then installed the Intel graphics card drivers.

This still does not work because there is no monitor connected!

If you right click your desktop and go to your screen resolution settings: press the detect button, it will show a screen that's not connected.

Click that and then find the button that says something like: Show device on VGA (I don't know the exact translation because I'm running on a Dutch language system).

At the multiple screens section, select to extend your screens, but ofcourse, keep the screen that's connected as your primary screen, otherwise you'll lose your taskbar and such.

This will force the graphics card to be enabled and send an output to the VGA port, regardless if it's connected or not.

 

Once this is done, quicksync is available and ffmpeg can use it to encode with it.

 

I'm plainly running ffmpeg from the commandline at the moment, MB is not a part of the process yet. I'm logged into the console and my screens are on.

I even rebooted windows and executed the ffmpeg command right afterwards, and even then it wouldn't encode. But I waited a few minutes and then it was all good.

I would almost think the quicksync hardware overheated or something, could that be?

Link to comment
Share on other sites

mjb2000

It's a celeron G1840 @ 2.8 Ghz

 

There's a trick I've done to actually get it working in the first place, because it has a dedicated graphics card onboard as well: An ATI radeon 6850. It's my HTPC which I use for tv, gaming and now for testing before I decide I want to have one that's dedicated or not.

 

This means that initially, it didn't work at all, it gave me the well known error that quicksync was not available.

I'm gonna write it down all that I did to get it to run because it may help some others too. I'm using Windows 7 professional, 64 bit.

 

Because there was a dedicated graphics card present, the Intel Graphics card was disabled by the bios. So I enabled this in the bios first.

Then installed the Intel graphics card drivers.

This still does not work because there is no monitor connected!

If you right click your desktop and go to your screen resolution settings: press the detect button, it will show a screen that's not connected.

Click that and then find the button that says something like: Show device on VGA (I don't know the exact translation because I'm running on a Dutch language system).

At the multiple screens section, select to extend your screens, but ofcourse, keep the screen that's connected as your primary screen, otherwise you'll lose your taskbar and such.

This will force the graphics card to be enabled and send an output to the VGA port, regardless if it's connected or not.

 

Once this is done, quicksync is available and ffmpeg can use it to encode with it.

 

I'm plainly running ffmpeg from the commandline at the moment, MB is not a part of the process yet. I'm logged into the console and my screens are on.

I even rebooted windows and executed the ffmpeg command right afterwards, and even then it wouldn't encode. But I waited a few minutes and then it was all good.

I would almost think the quicksync hardware overheated or something, could that be?

 

I'm glad you found a solution. If you could add this to the wiki that would be great. I think it was documented much earlier in this thread, but it's hard to find. Yes, when your integrated graphics is not connected, you need to force a non-existent display to be 'connected' using the screen resolution dislogue, otherwise the Intel GPU is not enabled and QuickSync will be unavailable.

Link to comment
Share on other sites

Hey guys, I'm new to the party and trying to make a switch from Plex to MB3.  One of the main reasons is this thread and GPU transcoding.  I'm also a NET developer (as well as other languages) and have familarity with ffmpeg also. Hopefully after I get up to speed, I can be of assistance.

 

I've only had MB3 installed for a few days so I'm still learning how it does things differently then Plex.  In Plex you can configure a few settings with regards to transcoding.  In a nutshell you can tell it how many seconds to transcode at one time before "throttling" back.  So with a fast processor you can possible support 2 different clients with only one transcode going on at any one time.  Or 3 clients but only 2 transcodes are going on at one time.  If a user Fast forwards to a different part in a movie/show a new transcode session starts at that point in the movie, etc...

 

With MB3 from my limited research/watching what happens it seems that a transcode is started and it just runs until it finishes.  I haven't played enough yet to see what happens when someone Fast forwards or just stops/pauses a movie.

 

Is my assumptions above correct on how MB3 works?  Any idea how much work would be involved in using the former approach which makes better use of hardware available at any given time and only transcodes enough to stay ahead of the client?

 

Curious about a few other things:

Can the current code make use of multiple GPUs installed in the same computer?  For example I have machines with multiple 660s and multiple 750ti in them (used to mine with them).  Does the current code support using multiple GPUs or will it just use the first or has this never been tried yet?

 

Does the code currently support using nVidia, then QuickSync then CPU as number of transcodes increase or has this not been attemped yet?

 

While not technically GPU/hardware encoding, has anyone given thought to distributed encoding?  IE A person has 2 computers on the local LAN running a "listener" that can take a "job" from the MB3 server and process it.  In an ideal world a couple older computer could be put to good use by installing a couple of GPUs in them and could each support 4 transcode streams (minimum).  Hence with 2 of these "transcode" machines up on the LAN the MB3 server could EASILY process 8 streams at the same time without doing any of the work on the server itself.  Obviously disk and network speed come into play but this would allow "linear" scaleout on a small scale depending on the hardware involved.

 

Will the current version posted work with Server Version 3.0.5515.1790?

 

Has any thought been given to create an ffmpeg "proxy"? So instead of having to modify ffmpeg to work with MB3 or MB3 to work with ffmpeg the "proxy" would be able to take command issued to it from MB3 and convert them on the fly for use by ffmpeg.  This might be the cleanest way to implement both hardware and distributed encoding into MB3.  The proxy could then watch/control how many transcode sessions are done on nVidia, on QuickSync on 2nd or 3rd computer, etc, and know when to "fallback" to CPU encoding.  The proxy could have it's own ini/config file for setup independent of MB3.

 

Thoughts?

 

Thanks,

Carlo
  • Like 5
Link to comment
Share on other sites

mjb2000

Hi Carlo - It seems like @@Luke will find you very useful!

 

 

I've only had MB3 installed for a few days so I'm still learning how it does things differently then Plex.  In Plex you can configure a few settings with regards to transcoding.  In a nutshell you can tell it how many seconds to transcode at one time before "throttling" back.  So with a fast processor you can possible support 2 different clients with only one transcode going on at any one time.  Or 3 clients but only 2 transcodes are going on at one time.  If a user Fast forwards to a different part in a movie/show a new transcode session starts at that point in the movie, etc...
 
With MB3 from my limited research/watching what happens it seems that a transcode is started and it just runs until it finishes.  I haven't played enough yet to see what happens when someone Fast forwards or just stops/pauses a movie.

 

AFAIK the transcode process is started, doesn't throttle back - I like the sound of that feature. Luke & Co will be best to bring you up to speed with what does what in the code. I have been taking a close look at what's happening in MediaBrowser.Api/Playback/BaseMediaEncoder.cs

 

 

 

Does the current code support using multiple GPUs or will it just use the first or has this never been tried yet?

 

 

I am not too sure. All GPU transcoding is done using a build of FFmpeg I have been compiling which is essentially vanilla ffmpeg + code from a QuickSync implementation + code form a nVidia implementation + a whole bunch of other codecs which are needed for wider compatibility with MediaBrowser features. I build this using a build script that takes care of various dependencies. (look back over my previous posts in this thread for details as I don't have access to all my links right now.

 

As for you exact question - this would all depend on how the nVidia element of fffmpeg handles multiple requests. I am pretty sure I saw something in the API which allows you to direct the processing to a particular nVidia GPU.

 

 

 

Does the code currently support using nVidia, then QuickSync then CPU as number of transcodes increase or has this not been attemped yet?
 

 

 No - Everything is very manual at the moment and MB will issue the transcode commands purely on the value contained in the encoding.xml file. In the future this may be able to change, but for now the codec to be used of h264 is manually entered in encoding.xml by all the people experimenting with these early builds.

 

 

 

While not technically GPU/hardware encoding, has anyone given thought to distributed encoding?  IE A person has 2 computers on the local LAN running a "listener" that can take a "job" from the MB3 server and process it.  In an ideal world a couple older computer could be put to good use by installing a couple of GPUs in them and could each support 4 transcode streams (minimum).  Hence with 2 of these "transcode" machines up on the LAN the MB3 server could EASILY process 8 streams at the same time without doing any of the work on the server itself.  Obviously disk and network speed come into play but this would allow "linear" scaleout on a small scale depending on the hardware involved.
 

 

I like the idea, but it has taken a long time to get to this point and I am not sure there will be the demand for such a scenario - but if you can help bring it then great!

 

I think earlier priories would be bringing some sort of GPU accelleration to AMD / ATI cards and improved handling of GPU capability detection and automatic switching based of capabilities.

 

 

 

Will the current version posted work with Server Version 3.0.5515.1790?
 

 

The latest Dev release contains the code for NVENC to work perfectly and my .dll replacement is not required (and I don't think it is compatible with the latest dev release).

 

The latest Dev release is missing for rounding logic required for some resolution combination that QuickSync can't handle. So your results with QuickSync will depend on if the transcode requested happens to be a rounding multiple that QuickSync is happy with.

 

I will try to push out an updated version of the .dll soon, but the dev releases were happening so frequently, it was hard to keep up. You can see my mods in my GitHub though and build it for yourself if you need something quickly.

 

 

 

Has any thought been given to create an ffmpeg "proxy"? So instead of having to modify ffmpeg to work with MB3 or MB3 to work with ffmpeg the "proxy" would be able to take command issued to it from MB3 and convert them on the fly for use by ffmpeg.  This might be the cleanest way to implement both hardware and distributed encoding into MB3.  The proxy could then watch/control how many transcode sessions are done on nVidia, on QuickSync on 2nd or 3rd computer, etc, and know when to "fallback" to CPU encoding.  The proxy could have it's own ini/config file for setup independent of MB3.

 

I had considered this - particularly around using the Q264.exe tool in conjunction with ffmpeg - but I didn't know enough about how the commands were being executed from MB to be confident I could take the intended command, interpret it using the wrapper and then handle it onwards to ffmpeg.

 

Final thoughts...

 

Take a look at the recent changes to BaseStreamingService.cs and you can see that it might be possible to use a value stored in encoding.xml to run some completely separate commands, which might help you in testing out some ideas.

 

M

  • Like 1
Link to comment
Share on other sites

techywarrior

 

With MB3 from my limited research/watching what happens it seems that a transcode is started and it just runs until it finishes.  I haven't played enough yet to see what happens when someone Fast forwards or just stops/pauses a movie.
 

 

Yes, that is how MB3 handles transcoding. It runs as fast as it can to convert the entire file. It creates a cache of the transcoded files (it's split into 1 minute increments (I think) so HLS is supported). If you fast forward it just loads the corresponding part transcoded file.

Link to comment
Share on other sites

I think the proxy idea has some merit. it would allow it to be built, tested, and hardened in a standalone fashion, which would make it easier to maintain by volunteering community members.

Link to comment
Share on other sites

I'll see if I can make time this weekend to play with this.  What I was thinking for Windows where we are doing the testing is just a simple console app that takes the command line passed to it.  Has the ability to manipulate the command strings and then just calls another version of ffmpeg. Once this simple logic is working as the "middle man" it could then be setup to monitor GPU use or at least count the transcodes currently being done via the GPU and for example allow up to 2 (if using nVidia) transcodes.  Then if a 3rd, 4th, etc come through the proxy it will just call the standard ffmpeg shipped with MB3.  Should be simple in theory and execution I would think.

 

We could have the standard ffmpeg.exe and then a modified version as ffmpegGPU.exe (or similar) or simple left as ffmpeg.exe but put into GPU folder.

 

Luke,

I guess one thing I'd need to know right off the bat is if you are looking for any specific (or any at all) return codes from ffmpeg or standout error codes or anything like that which the proxy would have to imitate? If so could you guide me to the source code files I'd need to check that has this logic?

 

Any other thoughts or suggestions?

 

Carlo

Link to comment
Share on other sites

Yes, that is how MB3 handles transcoding. It runs as fast as it can to convert the entire file. It creates a cache of the transcoded files (it's split into 1 minute increments (I think) so HLS is supported). If you fast forward it just loads the corresponding part transcoded file.

I'll have to play a bit.  But what happens if the user starts to play a file but then FF 3/4 through the video?  Does it just finish that one minute segment and then jump to where the user currently is and start from there?

Link to comment
Share on other sites

return code yea sometimes we do look for 0 as an indicator of success. but stdin,stdout, and stderr support are important and those will probably be trickier i imagine.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...