JeremyFr79 228 Posted March 27, 2017 Posted March 27, 2017 Wondering what the maximum amount of processing threads Emby/ffmpeg are able to utilize? I ask because I'm making the move into 4k content which is encoded HEVC. My Emby server is a Hyper-V VM with setup to utilize all 32 core/64threads of the host machine. However no matter the settings I change in Emby it will only transcode 4k at 15-16fps and I see most of my core/threads unused. Is there a way to make Emby utilize more threads for single transcode? I can easily see much more CPU usage with multiple transcodes running simultaneously, but I'm getting the feeling that a single transcode is thread limited. Hopeing I can get some insight into this.
Luke 38988 Posted March 27, 2017 Posted March 27, 2017 By default it is handled internally and automatically by ffmpeg. There isn't' really a limit to how many it can utilize.
JeremyFr79 228 Posted March 27, 2017 Author Posted March 27, 2017 By default it is handled internally and automatically by ffmpeg. There isn't' really a limit to how many it can utilize. hmm ok, Well it seems to want to max out at a single CPU on the host (16 threads) it doesn't seem to want to span to another CPU for the added horsepower, so I'm wondering if it has an issue with spanning NUMA nodes.
pir8radio 1302 Posted March 27, 2017 Posted March 27, 2017 hmm ok, Well it seems to want to max out at a single CPU on the host (16 threads) it doesn't seem to want to span to another CPU for the added horsepower, so I'm wondering if it has an issue with spanning NUMA nodes. Please post what you find, I was just doing 4k testing the other day, beating up my single CPU (20 threads) barely getting 23FPS (not enough to watch it) Was going to order the second CPU kit but if it wont work I don't want to waste the $700-$900 bucks...
JeremyFr79 228 Posted March 27, 2017 Author Posted March 27, 2017 It's hard to say, but transcoding 4k on my my VM yields about 15-16fps, and as I watch task manager I ever see roughly 1 CPU of threads running, oddly it's not even maxing out those threads either. I have turned off throttling etc and get the exact same result. I feel like ffmpeg, may not scale well for single jobs at least that seems to be what I'm seeing anyways.
Luke 38988 Posted March 27, 2017 Posted March 27, 2017 I'm not sure if the virtual cpu's play a role in this or not. It also depends on multi-threading support in the individual decoders and encoders. I know that libx264 has multi-threading support when encoding to h264, but I don't know about the hevc decoder.
ShoutingMan 95 Posted March 28, 2017 Posted March 28, 2017 I'm curious about this even for SD and HD transcoding. Real-time transcoding an HD MKV results in stuttering playback on the AppleTV...but Emby's ffmpeg is only running at 15% of my i5, with throttling disabled. Switching to using Intel QuickSync doesn't seem to do anything.
zigzagtshirt 55 Posted March 28, 2017 Posted March 28, 2017 I'm curious about this even for SD and HD transcoding. Real-time transcoding an HD MKV results in stuttering playback on the AppleTV...but Emby's ffmpeg is only running at 15% of my i5, with throttling disabled. Switching to using Intel QuickSync doesn't seem to do anything. What are your transcoding settings set at? Also, when playing, does the transcoding progress stay ahead of the playback (open up the server dashboard while it plays, compare the green vs red bar on the playback status).
ShoutingMan 95 Posted March 29, 2017 Posted March 29, 2017 What are your transcoding settings set at? Also, when playing, does the transcoding progress stay ahead of the playback (open up the server dashboard while it plays, compare the green vs red bar on the playback status). I'm streaming an HD MKV (Episode 1 of Blu-ray from "Firefly") to an AppleTV (tvOS 1.0.17) ffmpg is using 10% - 25% of CPU (Gen 6 i5 3.2 GHz) According to the Dashboard, the transcoding finished after a few minutes (long before the show length). The show was choppy during and after transcoding. Transcoding Settings are: Intel Quick Sync Thread Count: Auto Throttling: Off ffmpg shows "Use a custom version" Path: C:\Users\[uSER]\AppData\Roaming\Emby-Server\ffmpeg\20160410\ffmpeg.exe H264 Preset:Auto CRF: 23
ShoutingMan 95 Posted March 29, 2017 Posted March 29, 2017 Trying another show, Dollhouse in HD, and it's streaming fine, I think (doing short sampling, not watching a full episode). Except for pausing and forward / reverse. Doing that usually causes the video to restart from the beginning or to freeze. So: there might be something specifically weird about Firefly that doesn't work well with ffmpeg? But there are other oddities with live transcoding that I don't think I see with using pre-transcoded (Handbrake'd) video. I haven't played much comparing QSV on vs off. (I'd love to be able to transcode in real time so I don't have to take time and storage space creating duplicate Handbrake'd libraries specifically for watching on the AppleTV or streamed to mobile devices)
zigzagtshirt 55 Posted March 29, 2017 Posted March 29, 2017 @@ShoutingMan I would definitely try to run some comparisons between having QuickSync on vs off. Also maybe there is a weak link in your network somewhere. What is your network setup?
pir8radio 1302 Posted March 29, 2017 Posted March 29, 2017 @@ShoutingMan I would definitely try to run some comparisons between having QuickSync on vs off. Also maybe there is a weak link in your network somewhere. What is your network setup? I doubt it but it could also be that ffmpeg can not grab the video fast enough, maybe on a slow network share, to decode/recode it?
Jdiesel 1253 Posted March 29, 2017 Posted March 29, 2017 I was under the impression that single thread performance was what mattered for real-time transcoding. Multiple cores are beneficial for multiple simultaneous streams but if a single performance isn't up to the task throwing more cores at the task will do little to help. What CPU do you have? What is the bitrate of the 4k video?
JeremyFr79 228 Posted March 29, 2017 Author Posted March 29, 2017 The host is running (4) Xeon L7555's for a total of 32 cores/64 threads. It's obvious the transcoding is multi threaded as I can see more than one core light up during transcode. It looks like the issue lies in it's ability to span more than 1 physical CPU. I'm fine with it not working as at some point I can if needed toss a GPU into the server to take over for transcoding down the road or eventually move to new hardware.
ShoutingMan 95 Posted March 30, 2017 Posted March 30, 2017 (edited) @@ShoutingMan I would definitely try to run some comparisons between having QuickSync on vs off. Also maybe there is a weak link in your network somewhere. What is your network setup? I don't think I have a network limitation: AppleTV and MediaPC are connected to the same 100Mbps switch, which hangs off of a 1Gbps LAN. I have a TiVo Mini on the same network switch and it streams from the master Roamio fine. I've streamed HD content from Netflix via the PC and the TiVo Mini and that works fine as well. Edited March 30, 2017 by ShoutingMan
ShoutingMan 95 Posted March 30, 2017 Posted March 30, 2017 I was under the impression that single thread performance was what mattered for real-time transcoding. Multiple cores are beneficial for multiple simultaneous streams but if a single performance isn't up to the task throwing more cores at the task will do little to help. What CPU do you have? What is the bitrate of the 4k video? (If ffmpeg is single-threaded, then it's maxing out my i5: 25% CPU is 100% of a single core. That would explain that on my HD system.)
Waldonnis 148 Posted March 30, 2017 Posted March 30, 2017 (edited) The host is running (4) Xeon L7555's for a total of 32 cores/64 threads. It's obvious the transcoding is multi threaded as I can see more than one core light up during transcode. It looks like the issue lies in it's ability to span more than 1 physical CPU. I'm fine with it not working as at some point I can if needed toss a GPU into the server to take over for transcoding down the road or eventually move to new hardware. I don't think ffmpeg is "NUMA aware", so to speak. It seems that x265 supports multiple physical CPUs, but requires a special option to do so (--pools option). It doesn't look like ffmpeg's libx265 wrapper supports it directly, so you would have to add it into the -x265-params list. I don't see a corresponding option for x264, unfortunately, and didn't bother looking at the audio encoders (doubt it would matter). I never bothered looking into this before, so it's been interesting. Here's the snippet from the x265 command-line help for --pools: --pools <integer,...> Comma separated thread count per thread pool (pool per NUMA node) '-' implies no threads on node, '+' implies one thread per core on node Edited March 30, 2017 by Waldonnis 1
JeremyFr79 228 Posted March 30, 2017 Author Posted March 30, 2017 I don't think ffmpeg is "NUMA aware", so to speak. It seems that x265 supports multiple physical CPUs, but requires a special option to do so (--pools option). It doesn't look like ffmpeg's libx265 wrapper supports it directly, so you would have to add it into the -x265-params list. I don't see a corresponding option for x264, unfortunately, and didn't bother looking at the audio encoders (doubt it would matter). I never bothered looking into this before, so it's been interesting. Here's the snippet from the x265 command-line help for --pools: --pools <integer,...> Comma separated thread count per thread pool (pool per NUMA node) '-' implies no threads on node, '+' implies one thread per core on node My setup is atypical like I said I'm not too worried about it, with movies I keep a 1080p and 2160p copy of the movie so if I'm watching on a non 4k client it'll stream the 1080p version, unfortunately that doesn't work for TV episodes which is where my issue is oh well lol
Waldonnis 148 Posted March 30, 2017 Posted March 30, 2017 My setup is atypical like I said I'm not too worried about it, with movies I keep a 1080p and 2160p copy of the movie so if I'm watching on a non 4k client it'll stream the 1080p version, unfortunately that doesn't work for TV episodes which is where my issue is oh well lol Understandable. ffmpeg's options aside, it still seems odd to me that increasing the thread count isn't spreading the work beyond one CPU, though, since the OS should be distributing the threads unless coded to prefer something specific. Is it Linux or Windows, and what thread model is being used? It's been ages since I worked on thread model code, but I can take a look when I get some time.
JeremyFr79 228 Posted March 30, 2017 Author Posted March 30, 2017 Understandable. ffmpeg's options aside, it still seems odd to me that increasing the thread count isn't spreading the work beyond one CPU, though, since the OS should be distributing the threads unless coded to prefer something specific. Is it Linux or Windows, and what thread model is being used? It's been ages since I worked on thread model code, but I can take a look when I get some time. So the host is running Server 2012R2 Datacenter, 4 Xeon L7555's, 128GB RAM, guest is a Gen 2 Hyper-V VM with 64vCores, and 16GB of RAM also running 2012R2 Datacenter.
deadworldisee 0 Posted May 29, 2017 Posted May 29, 2017 From my experience with ffmpeg and libx264 ,ffmpeg can scale up to 16 threads and the first of it is used to split and gather info on the other threads(something like this). In order to have the fastest x264 encoding, regarding is ffmpeg or other(but I suggest ffmpeg) use a CPU with the highest clock speed and overclock it.For example an i7 6900k overcloacked to 4.5 GHz. Those Xeons are a waste of money for this job. Also I recommend to use the latest Nvidia GTX 1060 for the NVENC 5 /x264.There is very little difference between quality in bitrate on the latest NVENC 5 and libx264,that it was with NVENC 4 or older., but the speed is like 7,8,9 times faster. And if you are ok with the encoding result you can change the ffmpeg command to encode with libx264.(THere is no difference in GTX 1060 and GTX 1080 for encoding). I dont recommend to use HEVC now, because it's a new encode and if you want your movies be playable on most computers or browsers go to x264. With HEVC you get a maximum 5% space.
pir8radio 1302 Posted May 29, 2017 Posted May 29, 2017 (edited) Are we talking "threads" or something else here?... My ffmpeg averages around 70 threads when encoding, with peaks around 140, while averaging under 60% CPU. Edited May 29, 2017 by pir8radio
deadworldisee 0 Posted May 30, 2017 Posted May 30, 2017 (edited) I mean CPU core threads .8 cores and 16 threads . For a single command line(libx264), ffmpeg can encode using max 16 Threads from any CPU(only one cpu). Your XEON 2470 has 20 Threads and can use only 16 of them on x264 encoding of a single file. You can use paralel multiple commands line for ffmpeg with libx264 or NVENC, but this will affect the performance of all commands. IF you use only 60% of CPU and have <=16 CPU cores/threads ,your command line doesnt work at full speed. Using 2 simultaneously h264_nvenc(gtx 1060) encoding on 2 different files with ffmpeg Cpu (i7 5820k ~ 4.5 GHz) goes to 60-70% usage. Edited May 30, 2017 by deadworldisee
JeremyFr79 228 Posted May 30, 2017 Author Posted May 30, 2017 I'll uh try to shoe horn a 1060 into my $25k Dell 2U server lol...........
pir8radio 1302 Posted May 30, 2017 Posted May 30, 2017 I mean CPU core threads .8 cores and 16 threads . For a single command line(libx264), ffmpeg can encode using max 16 Threads from any CPU(only one cpu). Your XEON 2470 has 20 Threads and can use only 16 of them on x264 encoding of a single file. You can use paralel multiple commands line for ffmpeg with libx264 or NVENC, but this will affect the performance of all commands. IF you use only 60% of CPU and have <=16 CPU cores/threads ,your command line doesnt work at full speed. Using 2 simultaneously h264_nvenc(gtx 1060) encoding on 2 different files with ffmpeg Cpu (i7 5820k ~ 4.5 GHz) goes to 60-70% usage. Hum... I did some quick googling and can't seem documentation that says ffmpeg only supports 16 threads, can you post a link to it please? I'm finding forum posts with users changing their threads to 20 and 40 per suggestion of other ffmpeg forum users but nothing concrete. Thanks!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now