Luke 42077 Posted May 4, 2017 Posted May 4, 2017 Yea QuickSync won't affect whether transcoding actually gets used or not. So in this case, you enabled QuickSync but then it appears that transcoding wasn't needed at all for what you were playing.
Luke 42077 Posted May 5, 2017 Posted May 5, 2017 @@Waldonnis do you still have the same thoughts about -hwaccel auto ?
sh0rty 714 Posted May 6, 2017 Posted May 6, 2017 Hmm. Looking over the code and as I mentioned before, the dxva2 acceleration tries twice to create D3D devices: once with D3D9ex functions and if that fails, it tries again with D3D9 functions (which is where the error message is coming from). There aren't many things that could cause this. I'm inclined to think that it's a DirectX or driver installation problem since the same device creation functions are used for games and I've seen similar messages before in that context, but I'm still looking into it. Practically, the message is somewhat irrelevent to the overall transcode, but it does indicate that it's not using dxva2 for decoding (leading me to wonder just what is being used; I'd guess CPU off-hand but the use of the nvenc encoder may force the attempted use of cuvid). Any chance you can PM a report from dxdiag? DXVA Checker may reveal interesting info as well, but that would take a little explaining for it to make much sense if you aren't familiar with it (very handy tool for some types of DXVA testing/troubleshooting). @@Waldonnis do you still have the same thoughts about -hwaccel auto ? Any news about the D3D Device error guys?
Waldonnis 148 Posted May 7, 2017 Posted May 7, 2017 Any news about the D3D Device error guys?I've been super busy this past week with other things. Looking into this is on my list, though, and I'm hoping to have some time this week to get back to it. 1
Davidtt485 6 Posted May 11, 2017 Posted May 11, 2017 So here is a question, I know with NVENC the Nvidia cards are limited to two streams, but that is a limit placed by Nvidia on consumer cards, what about their quadro line?
mastrmind11 722 Posted May 11, 2017 Posted May 11, 2017 So here is a question, I know with NVENC the Nvidia cards are limited to two streams, but that is a limit placed by Nvidia on consumer cards, what about their quadro line? https://devtalk.nvidia.com/default/topic/996924/quadro-m2000m-nvenc-session-limit/?offset=5
sh0rty 714 Posted May 15, 2017 Posted May 15, 2017 I've been super busy this past week with other things. Looking into this is on my list, though, and I'm hoping to have some time this week to get back to it. @@Waldonnis @@Luke 3.2.16.3 with the commit in EncodingHelper.js fixed my D3D Device error.
Waldonnis 148 Posted May 17, 2017 Posted May 17, 2017 @@Waldonnis @@Luke 3.2.16.3 with the commit in EncodingHelper.js fixed my D3D Device error. Yeah, it reverted hwaccel, which just didn't work out as well as I had hoped. I still think there's a potential for improving hardware en/decoding autodetection, but there are some nagging issues that I would need to look at code-wise (time-consuming). Sorry for not being around much lately to look into this stuff more. I've been trying to get some really difficult clips encoded using HEVC and it's taking a ton of time trying to find the right settings (I've done more 12hr+ encode "test runs" than I care to think about). I'm still busy with it, but should have more time to watch forums again now that I'm closing in on workable sets of options rather than just watching x265 incrementing a frame counter for 12 hours like a zombie (read: less exciting than watching paint dry) or comparing individual frames for several hours straight. Detail retention and reducing banding/blurring/halos in very dark scenes (and not making actors look like they've been laminated) aren't easy with HEVC at lower resolutions and bitrates... 1
hexidecimal 0 Posted May 24, 2017 Posted May 24, 2017 (edited) I was hoping I could get some help with an issue I'm experiencing using Emby w/ NVENC enabled. I'm using OpenMediaVault (Debian 8.8 based) I've installed emby-server directly to the host OS, I'm not using one of their plugins or a docker container. I've got a Quadro K2000D to use for NVENC, and CUDA 8.0 and the proper drivers for the card are installed. If I watch live TV from my HDHomeRun, Emby and ffmpeg see the tuner, use NVENC and everything runs along smooth as can be. If I go to watch any other type of media that requires a transcode, it get about 20 seconds of audio only, ffmpeg crashes from top, and the stream fails. If I run nvidia-smi in those first 20 seconds, I can see the job hit the card and start handling the transcode, and when ffmpeg crashes, the job stops. I've tried different GPUs, I've tried different revisions of ffmpeg, and I cannot explain why Live TV works, and media files don't. I've attached a log file of my instance attempting to transcode a 1080p copy of Arrival to 1080p 8Mbps. Thanks for any help that anyone can provide, I'm out of ideas at this point. I'm happy to provide any additional logging or perform any extra tests that might be recommend as well. Update/Additional details. I'm using an Dell Precision T7500 w/ 48GB of ECC DDR3, the Quadro K200D GPU w/ 2GB of RAM, and an LSI controller card that attaches to an external SAS attached MD1000 w/ 15x 2TB disks and a RAIDZ2 pool. On a guess I moved Arrival to the OS drive, which is just a standard 1TB Seagate Constellation, and NVENC was able to transcode the file properly. It seems like it's something with the files being stored on my ZFS array but I have no idea what it might be. ffmpeg-transcode-fbf0122e-98e5-493c-aa96-1d0fec969d8b.txt Edited May 25, 2017 by hexidecimal
Waldonnis 148 Posted May 25, 2017 Posted May 25, 2017 Hmm, that's a stumper. The log doesn't really show any error, which is odd if ffmpeg is actually crashing. It actually looks like it just ran out of input after 47 frames and quit. Is there anything in the syslog or dmesg that might indicate a hardware or file system error? Sorry to ask that since you probably checked already, but best to cover all bases. While I doubt the Emby server log would be too useful in this situation, best to check that as well to see if anything is anomalous right before and after the transcode. I suppose you could try manually running the same ffmpeg line at an increased loglevel to see what ffmpeg thinks is happening and checking the exit code, but other than that and log checking, I'm not sure what else to look at offhand. I'll think about this one and see what else I can come up with. To confirm, it only seems to die when transcoding from a source file located on the ZFS array? I don't know how live TV is handled or if it's buffered on disc somewhere, so I'm at a loss to explain why that works (no tuner here)...
Luke 42077 Posted May 25, 2017 Posted May 25, 2017 @@hexidecimal are you able to try media that is not stored on your zfs array? Like perhaps a folder of videos stored on the main OS drive? Let us know how it goes. Thanks !
hexidecimal 0 Posted May 25, 2017 Posted May 25, 2017 @@hexidecimal are you able to try media that is not stored on your zfs array? Like perhaps a folder of videos stored on the main OS drive? Let us know how it goes. Thanks ! I moved the movie I was originally testing with to the local OS drive, and it transcoded properly with NVENC. It seems to lie in the ZFS volume that's causing the issue. I also tried setting a temporary transcoding path on the OS drive, and that didn't help.
hexidecimal 0 Posted May 30, 2017 Posted May 30, 2017 Just an update to my original issue. ZFS was certainly the culprit, but I'm still at a loss to why. Before OMV I was using FreeNAS 9.10 (swapped with their whole Corral... thing) and everything was fine. Ended up syncing my data to a backup server I have available, destroyed the ZFS pool, and used OMV's built in option for Software RAID6 in Debian. Moved a couple files back and everything worked just fine. So I'm just going to let it roll as a RAID6 and be done with this project. Thanks for the suggestions and taking the time to try and help out. It's appreciated.
hexidecimal 0 Posted May 31, 2017 Posted May 31, 2017 Apparently not? No idea what changed, getting the same problem now if I set NVENC as the encoder. Direct Streams work fine, transcoding however shows a black screen. Now FFMPEG keeps churning along instead of immediately stopping. I just tested 7 transcoded 1080p streams on my CPUs though and didn't get any stutter, so I'm giving up on NVENC for now. Maybe I'll tinker again in the future. Thank you all again for you help and suggestions.
RanmaCanada 494 Posted June 5, 2017 Posted June 5, 2017 So here is a question, I know with NVENC the Nvidia cards are limited to two streams, but that is a limit placed by Nvidia on consumer cards, what about their quadro line? A general rule of thumb is that all Quadro's below X2000 (X = K, M, P) and all GeForce cards have limited encode sessions. https://devtalk.nvidia.com/default/topic/996924/quadro-m2000m-nvenc-session-limit/ Which didn't used to be the case. https://devtalk.nvidia.com/default/topic/800942/session-count-limitation-for-nvenc-no-maxwell-gpus-with-2-nevenc-sessions-/ Nvidia changed things and screwed everyone over.
bradford 9 Posted June 9, 2017 Posted June 9, 2017 (edited) I'm messing around with this again on my linux server. I have confirmed I have CUDA set up properly (deviceQuery and nvidia-smi both work correctly). I compiled my own ffmpeg with cuda support based on ffmpeg 3.3. It works, but it doesn't seem to be using the GPU at all. I've confirmed that the logs show the transcode command is right, but it doesn't perform any faster or use the CPU any less than the 2016 emby ffmpeg binary. Anyone have any idea what's happening? Is it silently falling back to CPU because of something wrong with cuda? Edit: The strange thing is, nvidia-smi reports the process pane is "Not Supported", however the temperature is quite high for an idling video card. The reason I don't think it's using the GPU is the two transcode logs below produce about the same performance in terms of fps and the CPU is high. If it IS using the GPU, it's also MUCH slower than it was last December when I tested this out - I was getting about 110 fps without taxing the CPU at all - but now I'm getting maybe 70 and my CPU load is about 4 (with 8 cores) and that is probably due to LXC throttling it. Fri Jun 9 10:08:55 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 381.22 Driver Version: 381.22 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 0000:03:00.0 N/A | N/A | | N/A 61C P0 N/A / N/A | 150MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ 2016_ffmpeg_transcode.txt ffmpeg3.3_transcode.txt Edited June 9, 2017 by bradford
Waldonnis 148 Posted June 9, 2017 Posted June 9, 2017 I'm messing around with this again on my linux server. I have confirmed I have CUDA set up properly (deviceQuery and nvidia-smi both work correctly). I compiled my own ffmpeg with cuda support based on ffmpeg 3.3. It works, but it doesn't seem to be using the GPU at all. I've confirmed that the logs show the transcode command is right, but it doesn't perform any faster or use the CPU any less than the 2016 emby ffmpeg binary. Anyone have any idea what's happening? Is it silently falling back to CPU because of something wrong with cuda? It looks like it's working, but I agree that it seems slow going off of the fps numbers. What I don't know is if the initial seek time (noticed your transcode is starting 15mins into the episode, so it has to seek to that spot first) is artificially suppressing the fps reporting by including that seek time into the averaging. I'll have to try that some time soon and find out (my test clips are 5mins or less usually, so I don't seek often during testing). It doesn't explain the increase in load average, though. For testing it, you could try running a manual transcode with an increased loglevel to verify that it's actually using the hardware/driver. If you do this, be sure to only transcode the video stream rather than adding audio transcoding in too so you can spot any unusual output more easily and see what the CPU/load impact is of just the video transcode. There still is some CPU overhead for hardware video transcoding (and audio is still handled entirely by the CPU), so I would expect some load, but probably not that much unless the processor isn't handling the audio transcoding and memory-related operations very easily. 1
bradford 9 Posted June 10, 2017 Posted June 10, 2017 (edited) It looks like it's working, but I agree that it seems slow going off of the fps numbers. What I don't know is if the initial seek time (noticed your transcode is starting 15mins into the episode, so it has to seek to that spot first) is artificially suppressing the fps reporting by including that seek time into the averaging. I'll have to try that some time soon and find out (my test clips are 5mins or less usually, so I don't seek often during testing). It doesn't explain the increase in load average, though. For testing it, you could try running a manual transcode with an increased loglevel to verify that it's actually using the hardware/driver. If you do this, be sure to only transcode the video stream rather than adding audio transcoding in too so you can spot any unusual output more easily and see what the CPU/load impact is of just the video transcode. There still is some CPU overhead for hardware video transcoding (and audio is still handled entirely by the CPU), so I would expect some load, but probably not that much unless the processor isn't handling the audio transcoding and memory-related operations very easily. I grabbed the command that emby logged in my transcode log but I removed the audio component: /tmp/ffmpeg -i file:"/MEDIA/video/series/Star Trek Enterprise/S01/Star.Trek.Enterprise.S01E20.1080p.BlurayMUX.AVC.DTS-HD.MA.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -pix_fmt yuv420p -preset default -b:v 2616000 -maxrate 2616000 -bufsize 5232000 -profile:v high -force_key_frames "expr:gte(t,n_forced*3)" -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -copyts -vsync -1 -f segment -max_delay 5000000 -avoid_negative_ts disabled -start_at_zero -segment_time 3 -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af.m3u8" -y "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af%d.ts" Without audio I got about 117 fps (CPU 80% on 8 cores), whereas with audio I was getting only about 60 fps (CPU 50% on 8 cores). I ran the sans-audio ffmpeg command with loglevel -48 and saw the following output: [h264_nvenc @ 0x2f25d80] Loaded Nvenc version 8.0 [h264_nvenc @ 0x2f25d80] Nvenc initialized successfully [h264_nvenc @ 0x2f25d80] 1 CUDA capable devices found [h264_nvenc @ 0x2f25d80] [ GPU #0 - < GeForce GT 710 > has Compute SM 3.5 ] [h264_nvenc @ 0x2f25d80] supports NVENC [segment @ 0x2ef61c0] Selected stream id:0 type:video [segment @ 0x2ef61c0] Opening '/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af0.ts' for writing [file @ 0x4414ee0] Setting default whitelist 'file,crypto' [mpegts @ 0x4412e00] muxrate VBR, pcr every 2 pkts, sdt every 200, pat/pmt every 40 pkts Output #0, segment, to '/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af%d.ts': Metadata: encoder : Lavf57.71.100 Stream #0:0, 0, 1/90000: Video: h264 (h264_nvenc) (High), 1 reference frame, yuv420p(left), 1920x1080 [SAR 1:1 DAR 16:9], 0/1, q=-1--1, 2616 kb/s, 23.98 fps, 90k tbn, 23.98 tbc Metadata: encoder : Lavc57.89.100 h264_nvenc Side data: cpb: bitrate max/min/avg: 2616000/0/2616000 buffer size: 5232000 vbv_delay: -1 Stream #0:1, 0, 1/90000: Audio: mp2, 48000 Hz, stereo, s16, delay 481, 384 kb/s (default) Metadata: encoder : Lavc57.89.100 mp2 So from all appearances it is working. So why is the CPU so high? I mean, it is just an atom, but it wasn't this high when I was doing this in December. Could it be different Emby settings? Could it be the fact that I'm now using ffmpeg 3.3 whereas in December it was 3.1? I just tested the 2016 ffmpeg binary (which doesn't support nvenc) and it was giving me about 60 fps with 100% cpu on all 8 cores. Interesting, since last December I could only get about 25 fps with CPU only. So it looks like it's the DTS that's keeping my fps low? Why would that keep it so low? Esp since it's taxing the CPU LESS? Edited June 10, 2017 by bradford
Waldonnis 148 Posted June 10, 2017 Posted June 10, 2017 (edited) So it looks like it's the DTS that's keeping my fps low? Why would that keep it so low? Esp since it's taxing the CPU LESS? There are still some operations that'll need to be done on the CPU even with GPU hardware transcoding: demuxing, decoding the source streams, copying data to/from the bus/GPU, muxing the output, and any chroma conversions that may have been necessary (to name a few). Only the decoding part is really "hard", but even still, you're seeing a lot of CPU usage, even for an Atom (which are admittedly less capable than some alternatives, but still good enough for this kind of operation). I'm not sure what may have changed between 3.1 and 3.3, but it's possible that upstream bugfixes or additional features were added to ffmpeg that added some additional processing burden to it. Even the build configuration or compiler choice may have been different, but it's hard to pin that kind of thing down outside of compiling each version with the same toolchain and options. Looking over your command line and output, it's still transcoding audio, but to a "default" encoder for mp2 (odd, since my custom build has always defaulted to using mp3lame). You can totally eliminate handling the audio by removing all of the -map options except -map 0:0. That should ignore all streams except 0:0, which is the video in this case. You can confirm this by looking at the "Stream #"'s listed under the "Input " section listed above the section of output you posted (should so something like Stream #0:0(eng): Video: h264 ....). 0:0 is most often the video stream, but I've seen (and made) some exceptions to that rule, so it's good to check with ffmpeg -i <file> or ffprobe to be sure (or mediainfo, which is very handy as well). Here's an amended line to try: /tmp/ffmpeg -i file:"/MEDIA/video/series/Star Trek Enterprise/S01/Star.Trek.Enterprise.S01E20.1080p.BlurayMUX.AVC.DTS-HD.MA.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -codec:v:0 h264_nvenc -pix_fmt yuv420p -preset default -b:v 2616000 -maxrate 2616000 -bufsize 5232000 -profile:v high -force_key_frames "expr:gte(t,n_forced*3)" -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -copyts -vsync -1 -f segment -max_delay 5000000 -avoid_negative_ts disabled -start_at_zero -segment_time 3 -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af.m3u8" -y "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af%d.ts" I noticed it's not using hardware decoding as well and am curious if that would help in this case. After you test with the other -map options removed, try running it again but with this line instead (I'm just adding an additional -c:v h264_nvenc before the input file to force hardware decoding; if it doesn't run and complains about threading, just add -threads 1 right before the -i as well): /tmp/ffmpeg -c:v h264_nvenc -i file:"/MEDIA/video/series/Star Trek Enterprise/S01/Star.Trek.Enterprise.S01E20.1080p.BlurayMUX.AVC.DTS-HD.MA.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -codec:v:0 h264_nvenc -pix_fmt yuv420p -preset default -b:v 2616000 -maxrate 2616000 -bufsize 5232000 -profile:v high -force_key_frames "expr:gte(t,n_forced*3)" -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -copyts -vsync -1 -f segment -max_delay 5000000 -avoid_negative_ts disabled -start_at_zero -segment_time 3 -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af.m3u8" -y "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af%d.ts" Edited June 10, 2017 by Waldonnis 1
bradford 9 Posted June 10, 2017 Posted June 10, 2017 (edited) Here's an amended line to try: /tmp/ffmpeg -i file:"/MEDIA/video/series/Star Trek Enterprise/S01/Star.Trek.Enterprise.S01E20.1080p.BlurayMUX.AVC.DTS-HD.MA.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -codec:v:0 h264_nvenc -pix_fmt yuv420p -preset default -b:v 2616000 -maxrate 2616000 -bufsize 5232000 -profile:v high -force_key_frames "expr:gte(t,n_forced*3)" -vf "scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2" -copyts -vsync -1 -f segment -max_delay 5000000 -avoid_negative_ts disabled -start_at_zero -segment_time 3 -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af.m3u8" -y "/var/lib/emby-server/transcoding-temp/19e64c4c585fcc28c6a1cbda822497af%d.ts" Thanks @@Waldonnis for the help. That line gives me the same FPS as when I was doing mp2 encoding, and slightly less cpu (as I would expect). That second line fails with the error `Unknown decoder 'h264_nvenc'`. I changed the decoder to `h264_cuvid`, which made a huge improvement - I'm now getting ~109 fps at load avg 0.83 vs the old ~115 fps at load average 6.4. So, slightly fewer frames per second at a huge CPU improvement. Thanks for the tip! Can we get this added to the NVENC parameters in emby? I suspect that is the big difference between December and now - perhaps Emby removed that parameter? Edited June 10, 2017 by bradford
Luke 42077 Posted June 10, 2017 Posted June 10, 2017 I don't recall us ever using h264_cuvid but we can switch to that if it is better.
Waldonnis 148 Posted June 10, 2017 Posted June 10, 2017 Thanks @@Waldonnis for the help. That line gives me the same FPS as when I was doing mp2 encoding, and slightly less cpu (as I would expect). That second line fails with the error `Unknown decoder 'h264_nvenc'`. I changed the decoder to `h264_cuvid`, which made a huge improvement - I'm now getting ~109 fps at load avg 0.83 vs the old ~115 fps at load average 6.4. So, slightly fewer frames per second at a huge CPU improvement. Thanks for the tip! Can we get this added to the NVENC parameters in emby? I suspect that is the big difference between December and now - perhaps Emby removed that parameter? Oops, doh, I was tired and mistyped Glad it worked out well, though. And yeah, h264_cuvid (and hevc_cuvid) are the explicit decoders for NVENC. Not sure why the naming is different compared to the encoders *shrug*
jhoff80 94 Posted June 12, 2017 Posted June 12, 2017 3.2.20 appeared to add h264_cuvid (transcode log below): Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc)) Stream #0:1 -> #0:1 (ac3 (native) -> ac3 (native)) ...but not hevc_cuvid: Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_nvenc)) Stream #0:1 -> #0:1 (ac3 (native) -> aac (native)) If it helps, the video stream details for the latter: Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt709), 3840x2160 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now