Jump to content

Maximum CPU threads?


JeremyFr79

Recommended Posts

JeremyFr79

Wondering what the maximum amount of processing threads Emby/ffmpeg are able to utilize?  I ask because I'm making the move into 4k content which is encoded HEVC.  My Emby server is a Hyper-V VM with setup to utilize all 32 core/64threads of the host machine.  However no matter the settings I change in Emby it will only transcode 4k at 15-16fps and I see most of my core/threads unused.  Is there a way to make Emby utilize more threads for single transcode?  I can easily see much more CPU usage with multiple transcodes running simultaneously, but I'm getting the feeling that a single transcode is thread limited.

 

Hopeing I can get some insight into this.

Link to comment
Share on other sites

By default it is handled internally and automatically by ffmpeg. There isn't' really a limit to how many it can utilize.

Link to comment
Share on other sites

JeremyFr79

By default it is handled internally and automatically by ffmpeg. There isn't' really a limit to how many it can utilize.

hmm ok, Well it seems to want to max out at a single CPU on the host (16 threads) it doesn't seem to want to span to another CPU for the added horsepower, so I'm wondering if it has an issue with spanning NUMA nodes.

Link to comment
Share on other sites

pir8radio

hmm ok, Well it seems to want to max out at a single CPU on the host (16 threads) it doesn't seem to want to span to another CPU for the added horsepower, so I'm wondering if it has an issue with spanning NUMA nodes.

 

Please post what you find, I was just doing 4k testing the other day, beating up my single CPU (20 threads) barely getting 23FPS (not enough to watch it)  Was going to order the second CPU kit but if it wont work I don't want to waste the $700-$900 bucks... 

Link to comment
Share on other sites

JeremyFr79

It's hard to say, but transcoding 4k on my my VM yields about 15-16fps, and as I watch task manager I ever see roughly 1 CPU of threads running, oddly it's not even maxing out those threads either.  I have turned off throttling etc and get the exact same result.  I feel like ffmpeg, may not scale well for single jobs at least that seems to be what I'm seeing anyways.

Link to comment
Share on other sites

I'm not sure if the virtual cpu's play a role in this or not. It also depends on multi-threading support in the individual decoders and encoders. I know that libx264 has multi-threading support when encoding to h264, but I don't know about the hevc decoder. 

Link to comment
Share on other sites

ShoutingMan

I'm curious about this even for SD and HD transcoding. Real-time transcoding an HD MKV results in stuttering playback on the AppleTV...but Emby's ffmpeg is only running at 15% of my i5, with throttling disabled. Switching to using Intel QuickSync doesn't seem to do anything. 

Link to comment
Share on other sites

zigzagtshirt

I'm curious about this even for SD and HD transcoding. Real-time transcoding an HD MKV results in stuttering playback on the AppleTV...but Emby's ffmpeg is only running at 15% of my i5, with throttling disabled. Switching to using Intel QuickSync doesn't seem to do anything. 

 

What are your transcoding settings set at?  Also, when playing, does the transcoding progress stay ahead of the playback (open up the server dashboard while it plays, compare the green vs red bar on the playback status).

Link to comment
Share on other sites

ShoutingMan

What are your transcoding settings set at?  Also, when playing, does the transcoding progress stay ahead of the playback (open up the server dashboard while it plays, compare the green vs red bar on the playback status).

 

I'm streaming an HD MKV (Episode 1 of Blu-ray from "Firefly") to an AppleTV (tvOS 1.0.17)

ffmpg is using 10% - 25% of CPU (Gen 6 i5 3.2 GHz)

According to the Dashboard, the transcoding finished after a few minutes (long before the show length).

The show was choppy during and after transcoding.

 

 

Transcoding Settings are:

Intel Quick Sync

Thread Count: Auto

Throttling: Off

ffmpg shows "Use a custom version"

Path: C:\Users\[uSER]\AppData\Roaming\Emby-Server\ffmpeg\20160410\ffmpeg.exe

H264 Preset:Auto

CRF: 23

Link to comment
Share on other sites

ShoutingMan

Trying another show, Dollhouse in HD, and it's streaming fine, I think (doing short sampling, not watching a full episode).

Except for pausing and forward / reverse. Doing that usually causes the video to restart from the beginning or to freeze.

 

So: there might be something specifically weird about Firefly that doesn't work well with ffmpeg?

But there are other oddities with live transcoding that I don't think I see with using pre-transcoded (Handbrake'd) video.

 

I haven't played much comparing QSV on vs off.

 

(I'd love to be able to transcode in real time so I don't have to take time and storage space creating duplicate Handbrake'd libraries specifically for watching on the AppleTV or streamed to mobile devices)

Link to comment
Share on other sites

zigzagtshirt

@@ShoutingMan

 

I would definitely try to run some comparisons between having QuickSync on vs off.  

 

Also maybe there is a weak link in your network somewhere.  What is your network setup?  

Link to comment
Share on other sites

pir8radio

@@ShoutingMan

 

I would definitely try to run some comparisons between having QuickSync on vs off.  

 

Also maybe there is a weak link in your network somewhere.  What is your network setup?  

 

I doubt it but it could also be that ffmpeg can not grab the video fast enough, maybe on a slow network share, to decode/recode it?

Link to comment
Share on other sites

Jdiesel

I was under the impression that single thread performance was what mattered for real-time transcoding. Multiple cores are beneficial for multiple simultaneous streams but if a single performance isn't up to the task throwing more cores at the task will do little to help. What CPU do you have? What is the bitrate of the 4k video?

Link to comment
Share on other sites

JeremyFr79

The host is running (4) Xeon L7555's for a total of 32 cores/64 threads.  It's obvious the transcoding is multi threaded as I can see more than one core light up during transcode.  It looks like the issue lies in it's ability to span more than 1 physical CPU.  I'm fine with it not working as at some point I can if needed toss a GPU into the server to take over for transcoding down the road or eventually move to new hardware.  

Link to comment
Share on other sites

ShoutingMan

@@ShoutingMan

 

I would definitely try to run some comparisons between having QuickSync on vs off.  

 

Also maybe there is a weak link in your network somewhere.  What is your network setup?  

 

I don't think I have a network limitation:

AppleTV and MediaPC are connected to the same 100Mbps switch, which hangs off of a 1Gbps LAN.

I have a TiVo Mini on the same network switch and it streams from the master Roamio fine.

I've streamed HD content from Netflix via the PC and the TiVo Mini and that works fine as well.

Edited by ShoutingMan
Link to comment
Share on other sites

ShoutingMan

I was under the impression that single thread performance was what mattered for real-time transcoding. Multiple cores are beneficial for multiple simultaneous streams but if a single performance isn't up to the task throwing more cores at the task will do little to help. What CPU do you have? What is the bitrate of the 4k video?

(If ffmpeg is single-threaded, then it's maxing out my i5: 25% CPU is 100% of a single core. That would explain that on my HD system.)

Link to comment
Share on other sites

Waldonnis

The host is running (4) Xeon L7555's for a total of 32 cores/64 threads.  It's obvious the transcoding is multi threaded as I can see more than one core light up during transcode.  It looks like the issue lies in it's ability to span more than 1 physical CPU.  I'm fine with it not working as at some point I can if needed toss a GPU into the server to take over for transcoding down the road or eventually move to new hardware.  

 

I don't think ffmpeg is "NUMA aware", so to speak.  It seems that x265 supports multiple physical CPUs, but requires a special option to do so (--pools option).  It doesn't look like ffmpeg's libx265 wrapper supports it directly, so you would have to add it into the -x265-params list.  I don't see a corresponding option for x264, unfortunately, and didn't bother looking at the audio encoders (doubt it would matter).

 

I never bothered looking into this before, so it's been interesting.  Here's the snippet from the x265 command-line help for --pools:

   --pools <integer,...>         Comma separated thread count per thread pool (pool per NUMA node)
                                 '-' implies no threads on node, '+' implies one thread per core on node
Edited by Waldonnis
  • Like 1
Link to comment
Share on other sites

JeremyFr79

 

I don't think ffmpeg is "NUMA aware", so to speak.  It seems that x265 supports multiple physical CPUs, but requires a special option to do so (--pools option).  It doesn't look like ffmpeg's libx265 wrapper supports it directly, so you would have to add it into the -x265-params list.  I don't see a corresponding option for x264, unfortunately, and didn't bother looking at the audio encoders (doubt it would matter).

 

I never bothered looking into this before, so it's been interesting.  Here's the snippet from the x265 command-line help for --pools:

   --pools <integer,...>         Comma separated thread count per thread pool (pool per NUMA node)
                                 '-' implies no threads on node, '+' implies one thread per core on node

 

My setup is atypical :D like I said I'm not too worried about it, with movies I keep a 1080p and 2160p copy of the movie so if I'm watching on a non 4k client it'll stream the 1080p version, unfortunately that doesn't work for TV episodes which is where my issue is :( oh well lol

Link to comment
Share on other sites

Waldonnis

My setup is atypical :D like I said I'm not too worried about it, with movies I keep a 1080p and 2160p copy of the movie so if I'm watching on a non 4k client it'll stream the 1080p version, unfortunately that doesn't work for TV episodes which is where my issue is :( oh well lol

 

Understandable.  ffmpeg's options aside, it still seems odd to me that increasing the thread count isn't spreading the work beyond one CPU, though, since the OS should be distributing the threads unless coded to prefer something specific.  Is it Linux or Windows, and what thread model is being used?  It's been ages since I worked on thread model code, but I can take a look when I get some time.

Link to comment
Share on other sites

JeremyFr79

Understandable.  ffmpeg's options aside, it still seems odd to me that increasing the thread count isn't spreading the work beyond one CPU, though, since the OS should be distributing the threads unless coded to prefer something specific.  Is it Linux or Windows, and what thread model is being used?  It's been ages since I worked on thread model code, but I can take a look when I get some time.

So the host is running Server 2012R2 Datacenter, 4 Xeon L7555's, 128GB RAM, guest is a Gen 2 Hyper-V VM with 64vCores, and 16GB of RAM also running 2012R2 Datacenter.

Link to comment
Share on other sites

  • 1 month later...
deadworldisee

From my experience with ffmpeg and libx264 ,ffmpeg can scale up to 16 threads and the first of it is used to split and gather info on the other threads(something like this).

In order to have the fastest x264 encoding, regarding is ffmpeg or other(but I suggest ffmpeg) use a CPU with the highest clock speed and overclock it.For example an i7 6900k overcloacked to 4.5 GHz. Those Xeons are a waste of money for this job.

 

Also I recommend to use the latest Nvidia GTX 1060 for the NVENC 5 /x264.There is very little difference between quality in bitrate on the latest NVENC 5 and libx264,that it was with NVENC 4 or older., but the speed is like 7,8,9 times faster.

And if you are ok with the encoding result you can change the ffmpeg command to encode with libx264.(THere is no difference in GTX 1060 and GTX 1080 for encoding).

I dont recommend to use HEVC now, because it's a new encode and if you want your movies be playable on most computers or browsers go to x264. With HEVC you get a maximum 5% space.

Link to comment
Share on other sites

pir8radio

Are we talking "threads" or something else here?...  My ffmpeg averages around 70 threads when encoding, with peaks around 140, while averaging under 60% CPU. 

 

592c5c243cd97_chart.png

Edited by pir8radio
Link to comment
Share on other sites

deadworldisee

I mean CPU core threads .8 cores and 16 threads . For a single command line(libx264), ffmpeg can encode using max 16 Threads from any CPU(only one cpu). Your XEON 2470 has 20 Threads and can use only 16 of them on x264 encoding of a single file.

You can use paralel multiple commands line for ffmpeg with libx264 or NVENC, but this will affect the performance of all commands. IF you use only 60% of CPU and have <=16 CPU cores/threads ,your command line doesnt work at full speed.

 

Using 2 simultaneously h264_nvenc(gtx 1060) encoding on 2 different files with ffmpeg Cpu (i7 5820k ~ 4.5 GHz) goes to 60-70% usage.

Edited by deadworldisee
Link to comment
Share on other sites

JeremyFr79

I'll uh try to shoe horn a 1060 into my $25k Dell 2U server lol...........

Link to comment
Share on other sites

pir8radio

I mean CPU core threads .8 cores and 16 threads . For a single command line(libx264), ffmpeg can encode using max 16 Threads from any CPU(only one cpu). Your XEON 2470 has 20 Threads and can use only 16 of them on x264 encoding of a single file.

You can use paralel multiple commands line for ffmpeg with libx264 or NVENC, but this will affect the performance of all commands. IF you use only 60% of CPU and have <=16 CPU cores/threads ,your command line doesnt work at full speed.

 

Using 2 simultaneously h264_nvenc(gtx 1060) encoding on 2 different files with ffmpeg Cpu (i7 5820k ~ 4.5 GHz) goes to 60-70% usage.

 

Hum...   I did some quick googling and can't seem documentation that says ffmpeg only supports 16 threads, can you post a link to it please?  I'm finding forum posts with users changing their threads to 20 and 40 per suggestion of other ffmpeg forum users but nothing concrete. 

 

Thanks!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...