Jump to content

GPU Transcoding (Intel QuickSync and nVidia NVENC)


witteschnitte

Recommended Posts

TolkienBard

Bugger, since you were having transcoding success win NVENC using the most recent static build, I tried pointing Emby to that build to get it working, but I am still getting the same issues I listed above. I must be missing a file or flag somewhere.

 

Oddly enough, Emby works just fine if I point it at the new build of ffmpeg, but do not have it attempt to do any HW transcoding. So the issue has to do with telling it to look fo NVENC and it trying to use my GTX 960. What a shame really. All that GPU and nothing for it to do.

Edited by TolkienBard
Link to comment
Share on other sites

jscoys

I am using Zeranoe FFmpeg Build Version: 2793ebd (2016-09-08) (Static x64 bit) Looks like it utilizes my GTX-1070x and its great. I can also see "-h264_nvenc" switch is used in transcoder logs.

 

But ffmpeg.exe still utilizes CPU times arround %20-25. Is this normal? (Throttling enabled and srt subtitles involved)

 

z4MFOEX.png

Yeah when it was working with my 750ti, i had the same results. Don't understand why but it used a lot of my cpu and a few my gpu... Maybe it's a question to ask on zeranoe forum?

 

 

Sent from my iPad using Tapatalk

Link to comment
Share on other sites

IS Quicksync and NVENC support in Emby considered working? or still in development?

My current setup is a an INTEL Xeon E5-1620 CPU. when i try to playback a 4k video on my iphone, it is struggeling. Tried to lower it to 720p, but still it stop to "buffer"

The funny this is though, that the cpu on the server is only running at around 50%.

 

Would i benefit from a Nvidia Geforce 1060 card and use NVENC?

Link to comment
Share on other sites

They are both considered experimental. We are all working together trying to improve it. Many are using it with success, but it's good that we all share our experiences.

Link to comment
Share on other sites

Hello all, from my last tests in my Xen7 environment with QuickSync enabled on a Skylake Intel Core i7 when I don't use QSV to decode I can reach higher framerates than when I use it.

 

With an mp4 hd video downloaded from youtube:

 

Decoder Enabled, Encoder Enabled  (threads 0): 77fps (35%cpu)

Decoder Disabled, Encoder Enabled (threads 0): 386fps (86%cpu)

Decoder Disabled (threads 1), Encoder Enabled  (threads 0): 200fps (33%cpu)

Decoder Disabled, Encoder Disabled (threads 0): 60fps (93%cpu)

Decoder Disabled (threads 1), Encoder Disabled (threads 1): 24fps (25%cpu)

 

My virtual machine has 4 cores assigned.

 

I think the best option would be to give the ability to choose when to use QuickSync for Decoding and Encoding separatedly and let the user set what they know best.

Edited by eTomm
Link to comment
Share on other sites

witteschnitte

please look at you cpu frequency when disable the decoding.

For me if decoding is disabled the frequency on my i7 goes to the maximum in all cores.

 

with decoding the frequency ist very low (this is good - means powersaving) the framerate goes a little down that is right but in all it is not to slow so you can transcode on the fly.

So 35% CPU with minimim frequency (785mhz) is much better than 25% with Maximum CPU power (3,xxGhz)

 

 

So what you say is a good idea. every user should choose the parts who fits best to them

 

 

 

(by the way: how do you get quicksync working on a virtual machine?)

Edited by witteschnitte
Link to comment
Share on other sites

In my case I have no frequency changes. It is a virtual machine... moreover I bought a low power version of skylake exactly for this kind of topics. 

 

But yes I suppose that what you says is logical in a bare metal environment.

Link to comment
Share on other sites

Trying to get Quicksync going on my 5th gen NUC (N3700 CPU), Windows 10 Pro 64 bit. No luck so far. 

 

GPU transcoding disabled = 0.6 speedfactor in transcoding log

With Quicksync  enabled speed drops to 0.3 ??

 

I've downloaded the latest 32 bit ffmpeg build from the Zeranoe website, and put the 2 .exe files in Emby's custom ffmpeg folder. 

 

Intel drivers are up to date.

 

Should I test anything else to confirm what the problem is? Would very much like to get this working, considering the *cough* horsepower of the N3700 :-)

ffmpeg-remux-73544d90-639b-41d3-b1d4-f2cb771c9374.txt&api_key=34b8bba5105640a9b95a2edbfb230982.txt

server-63609755096.txt&api_key=34b8bba5105640a9b95a2edbfb230982.txt

Link to comment
Share on other sites

Trying to get Quicksync going on my 5th gen NUC (N3700 CPU), Windows 10 Pro 64 bit. No luck so far. 

 

GPU transcoding disabled = 0.6 speedfactor in transcoding log

With Quicksync  enabled speed drops to 0.3 ??

 

I've downloaded the latest 32 bit ffmpeg build from the Zeranoe website, and put the 2 .exe files in Emby's custom ffmpeg folder. 

 

Intel drivers are up to date.

 

Should I test anything else to confirm what the problem is? Would very much like to get this working, considering the *cough* horsepower of the N3700 :-)

 

That log is not doing any encoding, it's stream copying the video as-is, so it is completely unrelated to this topic. However, I can see that it is going incredibly slow, whereas stream copy is usually the fastest possible operation. Your limiting factor is probably the performance of the storage system.

Link to comment
Share on other sites

Maybe he is running the nuc from a old usb 2.0 drive. Or worse. 

 

@Luke: what do you think about my idea? Giving the option to enable encoding and decoding hardware acceleration separated? And would be really helpful to be able to fine tune the threads used for decoding and encoding separated too.

 

Right now I performed even more tests and setting always 1 thread on the decoding I get the best results. In theory it is also the only option to maximize the horse power of the system.

If you will use QS for both decoding and encoding it is like you are doing 2 different works on the GPU leaving the CPU unused. It can be helpful in situation where the server is your home machine and you works on it while doing other things (but you will still use 2 threads, aka 2 cores, to push the data to the gpu and from that comes my 33%). As witteschnitte says, probable in a bare metal system that cpu usage runs at the lowest speed, but on the weaker models will trigger intel turbo technology probably.

 

Instead when you run the decoding in the CPU and the encoding on the GPU the GPU does just one work. This means that two different separated hardware componets perform just one work per time. And logically (but I didn't perform any tests yet) beginning to increase the number of concurrent transconding starting from 200FPS will get you better result than starting from 80FPS. If the things would be linear, I would be able to run three times more mixed transconding than pure gpus ones.

 

@@witteschnitte to run the system completely virtual it is not that complex. Now I have a problem with heat in my system (I wanted to do a compltely fanless one) and in it I have both a Quadro and the HD530 of my skylake assigned to two different VMs. You can achieve something like that just using the XenCenter application to configure your server. The problem is that XenServer is not so much final user friendly, you can't passthrough USB, you must always passthrough full PCIE devices. If you want more information write in private to me, I will try to answer you.

Link to comment
Share on other sites

That log is not doing any encoding, it's stream copying the video as-is, so it is completely unrelated to this topic. However, I can see that it is going incredibly slow, whereas stream copy is usually the fastest possible operation. Your limiting factor is probably the performance of the storage system.

 

Storage system is a Gen8 HP Microserver with Windows Server 2012 R2 on a Samsung EVO 850 SSD. The movie file (11GB .mkv) is on a Toshiba X300 HDD (6 TB).

 

The NUC and Microserver are connected via TPLINK Gb switch and ACT cat.6 cables. File copy speed from the Microserver to the NUC or vice versa is always > 100 MByte / sec for large files (see screenshot).

 

Client device is an iPad 2 (latest Emby IOS app). iPad gets a signal from a wired Fritz WLAN 1750 accesspoint (dual band AC), which is connected to the TPLINK switch. Transfer speeds from Microserver to a fast wireless client (Dell laptop with AC card) are always 30 MByte +. Most of the time it's 60 MByte +. See second screenshot. The iPad 2 is Wireless N, and won't transfer nearly as fast, but should still be more than fast enough for Emby. 

 

If my storage system or network is the bottleneck, where is the problem? 

 

The NUC is configured for GPU transcoding, and but not acting as such, so why is my issue is unrelated to this topic? IOS clients cannot play .mkv files through direct streaming, as confirmed in this topic: http://emby.media/community/index.php?/topic/37044-ios-direct-streaming-requirements/

 

If I need to open a seperate topic for this issue, that's fine, but where does this belong then?

 

Maybe he is running the nuc from a old usb 2.0 drive. Or worse. 

 

The thought alone makes me shiver :). NUC runs Windows 10 Pro from a Kingston V300 SSD. 

post-130748-0-97854000-1474193458_thumb.png

post-130748-0-34143400-1474193459_thumb.png

Link to comment
Share on other sites

If my storage system or network is the bottleneck, where is the problem? 

 

I'm not sure I can answer because that is environment-specific. What I can say though is that typically when stream copy is used, you usually several hundred fps in the ffmpeg log. That's what happens when there's no encoding, it's very fast. But for some reason in your log, it's running at a fraction of the speed that it should be. Could be a hardware problem, could be an issue with the ffmpeg build your'e on, it's hard to say.

Link to comment
Share on other sites

@Luke: what do you think about my idea? Giving the option to enable encoding and decoding hardware acceleration separated? And would be really helpful to be able to fine tune the threads used for decoding and encoding separated too.

 

The separate settings are a good idea. As far as thread count goes though you need to translate your request into parameters that ffmpeg supports. So, what ffmpeg param would you like to customize? we already support customizing the ffmpeg thread count.

Link to comment
Share on other sites

I'm not sure I can answer because that is environment-specific. What I can say though is that typically when stream copy is used, you usually several hundred fps in the ffmpeg log. That's what happens when there's no encoding, it's very fast. But for some reason in your log, it's running at a fraction of the speed that it should be. Could be a hardware problem, could be an issue with the ffmpeg build your'e on, it's hard to say.

 

Tested 2 Android clients tonight with direct streaming (Galaxy S7, Lenovo Tab 2 A7). They work fine, and very fast. Tested a second iPad (v4) with and without Quicksync transcoding, and it responded the same as the iPad 2.

 

I'm eager to get GPU transcoding to work with the IOS devices, and troubleshoot further, but will need some help and direction with that. If there's nothing left to try, then I guess I'll have to switch to Android devices.

Edited by mark235
Link to comment
Share on other sites

The separate settings are a good idea. As far as thread count goes though you need to translate your request into parameters that ffmpeg supports. So, what ffmpeg param would you like to customize? we already support customizing the ffmpeg thread count.

 

As you know Luke, according to where you place a command line parameters ffmpeg uses it for input or output. So when I tested I inserted a -threads 1 even before -i and one after. This allowed me to manage separately threads for input and output.

 

So what I would like to have more control on is the threads for input and output and the abitlity to enable hardware acceleration for input or output separately. 

 

Tested 2 Android clients tonight with direct streaming (Galaxy S7, Lenovo Tab 2 A7). They work fine, and very fast. Tested a second iPad (v4) with and without Quicksync transcoding, and it responded the same as the iPad 2.

 

I'm eager to get GPU transcoding to work with the IOS devices, and troubleshoot further, but will need some help and direction with that. If there's nothing left to try, then I guess I'll have to switch to Android devices.

 

Post here your logs of the various transcoding. In fact in the previous one there was a transconding going on and it was the DTS to AAC. Maybe the N3700 is not fast enough for it. The other possibility is that you are using throttling. Try again disabling it.
 
But as Luke said there should never happen transcoding involving QuickSync in those cases, just when you are setting a really low bandwidth or you are needing to change an unsupported video codec. 
Link to comment
Share on other sites

hope__silent

Hi all,

 

Can't get quicksync to work at all on my end... Works with handbrake though. This is a clean install on OS and latest Emby server (stable). Every time I try to play a file it just says "There was an error playing the video."

 

OS: Win 10 home 64bit

Processor: Core i3-6100

ffmpeg version: tried emby default (20160410) and latest Zeranoe static (20160919)

 

What am I doing wrong?

 

Thanks,

Ryan

server-63609902134.txt

ffmpeg-transcode-317b9af7-af6e-46d3-98da-620e2f1cc7c3.txt

Link to comment
Share on other sites

try using the 20160913 build of zeranoe ffmpeg. The later builds seem to have problems with quicksync, this is the last build i could get working

 

Sent from my Pixel C using Tapatalk

Link to comment
Share on other sites

Hi all,

 

Can't get quicksync to work at all on my end... Works with handbrake though. This is a clean install on OS and latest Emby server (stable). Every time I try to play a file it just says "There was an error playing the video."

 

OS: Win 10 home 64bit

Processor: Core i3-6100

ffmpeg version: tried emby default (20160410) and latest Zeranoe static (20160919)

 

What am I doing wrong?

 

Thanks,

Ryan

 

I have your very same error in my configuration. I tought it was XenServer related, but I see you have a skylake too and so maybe it is skylake related.

 

The last working version for me is 20160409.

Link to comment
Share on other sites

I'm interested to try the NVENC transcoding; is there any source to get --enable-nvenc ffmpeg; or is it a must to compile one? is it hardware dependent or anyone who built it previously can share it with others?

 

regards,

DaN

Link to comment
Share on other sites

hope__silent

the source there is mpeg which is always a little trickier. do you have any h264 files you can try it with? 

 

I converted that mpeg2 file to h264 with handbrake (using quicksync ironically) and it will play back with quicksync enabled as the transcoder. With Zeranoe 20160913, the mpeg2 file will play 12-13 seconds and then freeze. I also get the option to do direct play with the h264 file but never with the mpeg2 file. Does emby not support direct play for mpeg2 in a browser? I've tried chrome and edge. I also can't set the h264 file to a setting lower than 1080p 8mbps or it appears to freeze emby and restarts the server.

 

Logs:

server-63609979154.txt

mpeg2 error won't play.txt

h264 1080p 6mbps freeze.txt

 

try using the 20160913 build of zeranoe ffmpeg. The later builds seem to have problems with quicksync, this is the last build i could get working

 

Sent from my Pixel C using Tapatalk

 

Tried 20160913, see above for results.

 

I have your very same error in my configuration. I tought it was XenServer related, but I see you have a skylake too and so maybe it is skylake related.

 

The last working version for me is 20160409.

 

Tried this, same results as 20160913.

Link to comment
Share on other sites

There is currently no browser that supports mpeg2 therefore the emby web app cannot direct play it. I'm pretty sure I can fix the h264 thing you just mentioned. thanks.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...