Gerrit507 24 Posted January 4, 2019 Posted January 4, 2019 (edited) Basically get anything with a HD530 or HD630. It's the best bang for the buck for that kind of use-case imho. I agree with you that only four 1080p h264 encoding streams with a GTX1060 is pretty disappointing. Maybe there is really a hardware limitation, like it has only a limited amount of encoding circuits compared to a Quadro. Edited January 4, 2019 by Gerrit507
brett.mclain@gmail.com 1 Posted January 4, 2019 Posted January 4, 2019 It just seems crazy to me that a GTX 1060 can get outperformed by the iGPU on a modern intel chip...I'll keep poking around trying to find out why the performance is so bad but I feel like I've looked into most potential problem areas.
Happy2Play 9316 Posted January 4, 2019 Posted January 4, 2019 @@Luke @@softworkzSo is that patch going to make troubleshooting more difficult as users expecting more from their hardware?
softworkz 4467 Posted January 5, 2019 Posted January 5, 2019 (edited) @@Luke @@softworkz So is that patch going to make troubleshooting more difficult as users expecting more from their hardware? I don't think that it will change much. People can also hit a physical limit with 2 high-bitrate stream transcodings when they have some older board or bad BIOS settings or slow network, or want to encode HEVC (in case we would support that at some time) or whatever. There are so many different types of limits that can be hit.... Regarding 'expectations': Things are already getting wrong in the very moment when someone tries to measure performance/expectation by means of "Number of simultaneous 1080p/h264 transcodings'. There's two parts involved: decoding and encoding: Decoding Performancedepends on the source material. A 1h 1080p h264 video can have a size of 1 GB or 10 GB Encoding Performancesame story here: That video can be encoded at very different bitrates, again resulting in either 1 GB or 10 GB Beyond output bitrate, there are a lot of different encoding options resulting in different output quality, where the required processing performance can easily vary by a factor of 10 or more (I hope we'll be able to allow more and better settings to configure those codec options in the future) I'll follow up with more details on the actual case in a response to the OP brett.mclain Edited January 5, 2019 by softworkz 1
yepitro1986 0 Posted January 5, 2019 Posted January 5, 2019 If you're use Apple Devices you can use Infuse as your media player and Emby/Plex as your Media Server, it will save your Wallet to spend on hardware and your time only thing about transcoding
softworkz 4467 Posted January 5, 2019 Posted January 5, 2019 I installed a GTX 1060 (3GB) card into my media server (i7-3770) and was trying to do hardware transcoding with plex but found out too late that they only support encoding, not decoding. So I'm testing Emby now instead and at first I was pretty pleased with the performance but after further testing I'm a bit underwhelmed. I was able to do 4 x 1080p streams (x264) that were 18 gb files each, but one of the streams kept buffering, so really only 3. Then when I tried an HEVC Main 10 movie (15.5 mbps) it is super garbled. Server is mostly idling as the GPU is doing all the work. My expectation was that I would be able to run 6 or 7 1080p streams without breaking a sweat on a GTX 1060. Does anyone have any ideas on what could be holding back the performance? The GPU's memory usage never goes above 2GB, and the voltatile GPU-Util stays below 15% as well. It's a common misconception with Nvidia boards, to think that you would have all of your GPU's power available for transcoding videos. But this is not the case. Instead, nvidia boards include additional integrated circuits ('chips') for video encoding and decoding. I suppose, the rationale behind this is to allow video playback and recording in games without affecting 3D performance. As a result, there is no direct relation between 3D performance and video performance. An older professional board may have better video performance than the latest GeForce - physically! So, buying a consumer gpu and unlocking the driver might not be as clever as it sounds.... Unfortunately I don't have any figures, though. Back to Emby: At the side of Emby we have not fully optimized transcoding for nvidia yet. That means, data will be transmitted back and forth between main memory and GPU memory for both decoding and encoding, which is not the most perfect variant. We're working on this and you can expect some improvement in the near future, but you need to keep in mind that it's not the GPU cores that are doing the video acceleration (except some basic supporting tasks and CUDA operations, e.g. for hardware filtering). BTW - For all those who were wondering about Intel's video performance: Intel is following an approach that is similar at least in a way that dedicated asics are doing the video processing. That explains why video performance Intel/Nvidia can be similar for certain combinations even though they may strongly differ by 3D performance. 1
brett.mclain@gmail.com 1 Posted January 5, 2019 Posted January 5, 2019 It's a common misconception with Nvidia boards, to think that you would have all of your GPU's power available for transcoding videos. But this is not the case. Instead, nvidia boards include additional integrated circuits ('chips') for video encoding and decoding. I suppose, the rationale behind this is to allow video playback and recording in games without affecting 3D performance. As a result, there is no direct relation between 3D performance and video performance. An older professional board may have better video performance than the latest GeForce - physically! So, buying a consumer gpu and unlocking the driver might not be as clever as it sounds.... Unfortunately I don't have any figures, though. Back to Emby: At the side of Emby we have not fully optimized transcoding for nvidia yet. That means, data will be transmitted back and forth between main memory and GPU memory for both decoding and encoding, which is not the most perfect variant. We're working on this and you can expect some improvement in the near future, but you need to keep in mind that it's not the GPU cores that are doing the video acceleration (except some basic supporting tasks and CUDA operations, e.g. for hardware filtering). BTW - For all those who were wondering about Intel's video performance: Intel is following an approach that is similar at least in a way that dedicated asics are doing the video processing. That explains why video performance Intel/Nvidia can be similar for certain combinations even though they may strongly differ by 3D performance. I'm glad Emby is working on it. I used to host a Plex server for about 15 people and bought the GTX1060 to help when 5 or 6 people are streaming at once. Unfortunately Plex only has linux nvidia hardware support for encoding, not decoding. Thanks for clarifying that the GTX probably has an onboard asic or something similar that may have no correlation to it's 3d rendering abilities. Good to know! With regards to "how many streams can I do at once with xyz hardware" I understand that can be frustrating due to the plethora of variables involved. It might be helpful to select 10-15 clips of varying quality/size/resolution/codec and ask users to benchmark how many they can transcode, or have some tool to measure performance. This would allow the community to crowdsource the info. For example have a few 480p videos, 720p, 1080p, 4k, and have each in x264 and x265, with a few different sizes and upload them up somewhere for us to download and test against. There would still be lots of variables but could help give ballpark figures.
softworkz 4467 Posted January 5, 2019 Posted January 5, 2019 Thanks for clarifying that the GTX probably has an onboard asic or something similar that may have no correlation to it's 3d rendering abilities. Good to know! It's a fact and it's true for all nvidia boards. Look here: https://developer.nvidia.com/video-encode-decode-gpu-support-matrix There's a column indicating the "number of chips". But that doesn't mean that it's always the same 'chip'. Of course there are different versions with different performance parameters depending on the board. So, now that this point is understood, I can come to explain some additional specifics of nvidia hardware acceleration. On Windows, there's DXVA2 hardware decoding supported by nvidia as well. And this is not using NVDEC chips for decoding but actual GPU cores instead. Also there is no limit for the number of streams being decoded simultaneously. (doesn't exist on Linux, though) With regards to "how many streams can I do at once with xyz hardware" I understand that can be frustrating due to the plethora of variables involved. It might be helpful to select 10-15 clips of varying quality/size/resolution/codec and ask users to benchmark how many they can transcode, or have some tool to measure performance. This would allow the community to crowdsource the info. For example have a few 480p videos, 720p, 1080p, 4k, and have each in x264 and x265, with a few different sizes and upload them up somewhere for us to download and test against. There would still be lots of variables but could help give ballpark figures. This is in fact a good idea and actually something that we're already planning for.
pir8radio 1304 Posted February 26, 2019 Posted February 26, 2019 This is in fact a good idea and actually something that we're already planning for.
Jdiesel 1269 Posted April 14, 2019 Posted April 14, 2019 Any more information on this front? I was thinking of adding a GPU to my dual slot Xeon server to help out with transcoding but it sounds like slapping in a consumer Nvidia card might not be the best idea. Would be nice to get some feedback from those running consumer Nvidia hardware with the restriction patch. Also how does AMD currently fare in this area?
AmericanCrisis 28 Posted April 20, 2019 Posted April 20, 2019 I tried HW transcoding for the first time the other day. I think I liked it... It seems more energy efficient, at least I think... I had a problem with Quicksync that I don't have with SW in that when the media file (movie in this case) ends, playback jumps back 10 minutes or so. Or, the file ends, but Emby doesn't send user back to media file info screen. Basically, screen goes black then on dashboard, it just shows file still playing. User has to manually stop it and press back button. Anyhow, not a huge problem. Still usable. I turned it off for the time being. I'm still trying to determine what the OVERALL benefit to HW transcoding is over SW transcoding.
Jdiesel 1269 Posted May 6, 2019 Posted May 6, 2019 Can anyone speak the the quality/performance differences between the Pascal and Turing NVENC implementations. I've found some information from the video game streaming community that suggests that there is a significant quality improvement in Turing chips.
softworkz 4467 Posted May 6, 2019 Posted May 6, 2019 Quality improvements are unlikely because quality depends on how you configure the encoding (that's a way too big story to explain quickly). Even with old GPUs or very weak CPUs it's not problem to achieve high quality encoding. It would just take its time and might be slower than realtime (playback speed). The differences are rather about encoding performance, so you'll be able to do more simultaneous encodings or the same number of encodings with higher quality.
Jdiesel 1269 Posted May 6, 2019 Posted May 6, 2019 (edited) As someone who doesn't currently have hw transcoding setup but is in the market for a GPU to do so, is it possible to adjust the presets with NVENC like you can do with software encoding? As someone who is more interested in quality over quantity this is something I would be wanting to take advantage of. If it is already possible, disregard. The article that I have been reading. https://unrealaussies.com/tech/nvenc-x264-quicksync-qsv-vp9-av1/#Introduction Edited May 6, 2019 by Jdiesel
softworkz 4467 Posted May 6, 2019 Posted May 6, 2019 We already have some internal prototyping for allowing codec specific quality settings (each hw codec implementation does this a bit different), so this is at least something we're having in the pipeline, but I cannot say anything about a timeline for this. @@Luke
Luke 39406 Posted May 6, 2019 Posted May 6, 2019 Yes he has a lot on his to do list, but perhaps later after some other things. Thanks.
Luke 39406 Posted July 7, 2019 Posted July 7, 2019 The upcoming Emby Server 4.2 release will have improved hardware transcoding support for both RockPro64 and Raspberry Pi. Thanks guys.
Lawrage 13 Posted July 10, 2019 Posted July 10, 2019 Just an update on this. Have been using my Intel i5 in my media server up till now but was getting slow / stuttering with large transcodes / 4k mainly and all cores at 100%. Bought a 1050ti, installed the patch to remove the 2 stream limit and tested 4 X 4k transcodes and all 4 played without issue, GFX at aroung 70%. Very happy with the purchase. 1
Guest asrequested Posted July 13, 2019 Posted July 13, 2019 I just added a GTX 1060 to my server, thinking that it would handle HEVC media better than my Threadripper. To my amazement, there was no difference. Except that using NVENC/NVDEC is almost unusable when playing in chrome (slower transcoding and freezing playback). Are there expected improvements to this in 4.2?
Luke 39406 Posted July 13, 2019 Posted July 13, 2019 I just added a GTX 1060 to my server, thinking that it would handle HEVC media better than my Threadripper. To my amazement, there was no difference. Except that using NVENC/NVDEC is almost unusable when playing in chrome (slower transcoding and freezing playback). Are there expected improvements to this in 4.2? Please see how to report a media playback issue. thanks.
Guest asrequested Posted July 13, 2019 Posted July 13, 2019 I figured that was coming. I was intending to, I'm just in the middle of a bunch of things. When I'm finished, I'll try to run some specific testing and post my results with logs
KamElita 0 Posted August 12, 2019 Posted August 12, 2019 all Intel processors from the last decade or even earlier use speculative execution, and thats where the exploits have been discovered.
pir8radio 1304 Posted August 14, 2019 Posted August 14, 2019 I figured that was coming. I was intending to, I'm just in the middle of a bunch of things. When I'm finished, I'll try to run some specific testing and post my results with logs 1060 is a pretty weak card no?
Guest asrequested Posted August 14, 2019 Posted August 14, 2019 1060 is a pretty weak card no? It's a pretty good card, but there are some bugs with how the server is handling HEVC over HLS, which appears to be much worse with HWA. So for my present usage, it's not practical.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now