Jump to content

hardware suggestions for transcoding


charleslam

Recommended Posts

charleslam

currently my server is a s2600cp with 2x intel e5 2650L's  with 16gb of ram, with a mellanox connectx2 and a perc h700 raid card.  

 

i have noticed 2 things with making the switch to emby from plex that i do not like that i need to figure out.

 

1. the video quality isnt as great with emby vs. plex.  edges are rougher, and i notice quality issues with transcoding.

2. when i make changes to the quality settings, my cpu usage spikes up higher.  like way higher.  like 1 stream puts all cores on 30% usage.  i know im not doing something right with the settings, when set to defaults i get around 7% usage on all cores at most, typically 2% normally.

 

so my question is, instead of tinkering day and night to figure out how to get the quality up while keeping cpu utilization at normal levels, ive noticed there are several other options in terms of hardware acceleration in transcoding.  What additional hardware will do the best job?

 

should i be looking into nvidia cards to put on this server?  i was reading that nvidia transcoding is limited to 2 streams at a time.

 

or is there a better solution?

 

as an fyi, i have 2 roku boxes, 1 windows 7 kitchen pc, a couple windows laptops and a mother in illinois that uses her windows 10 laptop as clients.

 

also another quick question.  some people say intel quicksync degrades the video quality, others say no.  ive tried it and i havent noticed any difference (and that includes cpu performance, still at around 30% with what i considered conservative settings).  

Link to comment
Share on other sites

Waldonnis

I'll take a stab at this...

 

1: I'd need more info to know what's going on there.  Logs would be helpful, and more info about the source material as well.

2: Is this happening on the same source file with different quality settings when playing back on the same client?  I could postulate here and can think of a few reasons for this (very few), but again, logs would tell more of the story.

 

As for hardware transcoding, nVidia consumer cards (GeForce) are limited to 2 simultaneous streams.  Quadros are not artificially limited like the GeForces, but performance may be limited or diminish if you push too many simultaneous streams through it (obviously, no card can handle an unlimited number).  I haven't seen reports of anyone really stress a Quadro for encoding, so I have no clue what the breakpoint is there.

 

QuickSync (QSV) degrades the video quality.  So does nVidia's solution (NVENC).  And frankly, so does any video codec or encoder when transcoding since you're not coming from a raw source (you're recompressing an already compressed source; you're making a copy of a copy, or really, a copy of a copy of a copy since it's probably a DVD or BRD source that was already compressed).  Is QSV or NVENC on par with x264 or x265 when it comes to features, analysis, and compression ratios?  No.  Is it good enough for on-the-fly transcoding purposes since you're probably already running x264 with an ultrafast preset and not much tuning ?  That's more of a grey area and largely dependent on taste, viewing distance/environment, and requirements.  Some people absolutely love the quality of NVENC and QSV, while others aren't so fond of it (I fall in the latter category).  Hardware encoding results aren't terrible like some folks would have you believe, but some source material will make the weaknesses more obvious to the keen eye, though.

 

If you're super picky, hardware transcoding may not look good to you, but then again, on-the-fly software transcoding probably wouldn't either...and you'd be better served manually encoding to something that can be directly played to ensure acceptable quality levels.  If you're fine with Plex or Emby's software transcoding quality, I'd guess that you wouldn't notice the difference when swapping to hardware.  Ultimately, you have to see the output yourself in your environment to know for sure.

 

If you google around, you can probably find samples posted by people "bragging" about their encode speeds to see if it would work for you.  Just be aware that most of them aren't especially knowledgeable when it comes to encoding, so take their assertions and comparisons with a grain of salt - basically, ignore anyone who says things like "this looks better than x264 output" since those folks rarely do more with x264 than use the default values for all options anyway (no knock on them; encoding is a very deep subject and the sheer number of options can be daunting).  If you have a friend or coworker with a dGPU/iGPU that supports either, you could always slap ffmpeg on a USB stick and ask them to encode a short test file for comparison as well - shouldn't take more than a few minutes of their time and it would give you an idea of how fast it could be.

 

From a broader perspective, hardware encoding has its uses, especially when live streaming or when top quality at a given bitrate and fine-tuning the encode aren't necessarily your first concerns (pre-viz, game recording, some types of animation, etc).  Quality isn't always a big deal and using the uncompressed source isn't always practical/needed either, so shaving minutes/hours off encoding in those situations can be incredibly helpful (think CG rough animations and such).  I consider on-the-fly transcoding to be in this category personally (performance matters more than quality since throughput has to meet or exceed the source's framerate), but everyone has different eyeballs and opinions of what "good quality" means.  If it's good enough for you, then there's no reason not to use it as it's significantly faster in many cases and lightens the system load a bit.

Link to comment
Share on other sites

charleslam

i appreciate the info.  while i dont normally do 10 simultaneous streams, that would be the end goal.  i would want that capability.  as for converting manually and then do direct play, ive thought about it definitely.  but considering my wide range of sources converting all would be a very heavy task.   

 

im wondering what sort of quadro card i would be looking at, to get this type of performance.  

 

I get the feeling that tinkering with settings until i find something right is the only way to proceed.  while plex, is extremely easy in terms of plug and play, i really want to see my emby installation work better in this area.  in almost everything else i like it better.  (gf thinks plex looks better but, whateves.  i care less about aesthetics of a gui than she does)

 

when i first tried messing with the settings i selected for transcoding:

transcoding thread count:max (thought was to use all 24 threads i have available)

H264 encoding preset : medium

H264 CRF:18

deinterlacing:bob and weave

 

with these settings i found no improvement with intel quicksync enabled or disabled.

 

i mean i knew there would be an increase in usage but from 2-7% to 30-40% on just one stream seems kinda crazy to me.  

Edited by charleslam
Link to comment
Share on other sites

Guest asrequested

We'd need your transcode logs to see what is actually happening. What are you actually transcoding? Movie playback? Live TV? Recordings? What apps are you using for playback on the PCs?

Link to comment
Share on other sites

Waldonnis

i appreciate the info.  while i dont normally do 10 simultaneous streams, that would be the end goal.  i would want that capability.  as for converting manually and then do direct play, ive thought about it definitely.  but considering my wide range of sources converting all would be a very heavy task.   

 

im wondering what sort of quadro card i would be looking at, to get this type of performance.

 

Not sure. I know others have asked, so there may be a few posts around here from Quadro owners that you could ask.  People around here tend to be a friendly and helpful bunch, so I don't think anyone would mind a few questions.

 

I get the feeling that tinkering with settings until i find something right is the only way to proceed.  while plex, is extremely easy in terms of plug and play, i really want to see my emby installation work better in this area.  in almost everything else i like it better.  (gf thinks plex looks better but, whateves.  i care less about aesthetics of a gui than she does)

 

It's a mixed bag of pros and cons with having things more "automatic".  Less customisation means it's easier for new folks, but lacks customising options that others might want or need.  It's an equation without an answer, really.  Some folks like dead-simple while others like things to be so complex that nobody understands it all except them  :P   I think Emby has just enough customisation to be useful, but it's lacking in a few areas (more audio downmixing options, but I won't even do a FR since I'm one of the very few that would bother with them).  Personally, I never liked Plex's transcoding options - they're way too simplistic and "one size fits all" for me.  Emby's also don't quite suit my perferences, hence why I transcode manually for direct playback.  It takes more time, but saves me a LOT of cycles and power in the long run since I only transcode once rather than every time I play anything.

 

Re-encoding a whole library is a daunting task for sure, so I can sympathise there, but it's a real money saver in the long run - you don't need as much hardware, that's for sure.  When I did my last re-encoding binge, I only did 2-3 movies a day because the machine I encode with is usually in use doing other stuff for large chunks of every day.  I'd just queue up a few in the evening and encode them overnight.  Took a few months, but I'm much happier with the results than when I was trying to transcode things.  I now have more control over presentation quality and even managed to standardise my track layouts and such so that I could script future operations easily (I remixed all of my movies' audio tracks at one point...super simple to script since I made sure the lossless source tracks were always the second audio track).

 

when i first tried messing with the settings i selected for transcoding:

transcoding thread count:max (thought was to use all 24 threads i have available)

H264 encoding preset : medium

H264 CRF:18

deinterlacing:bob and weave

 

with these settings i found no improvement with intel quicksync enabled or disabled.

 

i mean i knew there would be an increase in usage but from 2-7% to 30-40% on just one stream seems kinda crazy to me.

 

Core use on a NUMA setup is well covered in other threads. Long story short: x264 isn't NUMA-aware, so you won't get it to span processors (and wouldn't want to, honestly).  Processors with a "UMA mode" like ThreadRipper are a little different, but a multi-cpu system shouldn't see ffmpeg spanning all cores.  Latency is involved and since video encoding is more of a serial workload, it's just not efficient enough to bother writing the code for.  I don't have the link offhand, but Doofus and I covered a lot of this before on the forum so I won't reiterate it now (doom9 has a thread or two about threading/NUMA as well).  x265 is NUMA-aware, but since Emby doesn't transcode to HEVC yet and x265 needs a special option anyway, it's pointless to talk about that in this context.

 

Definitely post a server log and ffmpeg logs from both settings if you can.  For the thread settings, you can leave it on 0 for "automatic".  It should only use the cores from one processor, but it sounds like you were spawning more threads than you had on a single cpu package which led to even more load (and probably more stalls).  I'd only really recommend setting it if you wanted it to use less threads than you have available rather than trying to match your core count.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...