Jump to content

Is it worth the upgrade?


Crowleyisking

Recommended Posts

Gecko

@rbjtechAlways a pleasure to read your full_of_wisdom replies :)

Indeed, I was testing Emby on a mini HP PC yesterday (12500T/M2 SSD + 16Go Ram), after a quick and dirty install of debian 11 (not working with 12th gen) and 12 (yeah supporting the iGPU). The medias are on a separate machine over a simple 1G connection. Really nothing fancy. But I'm so new to Intel Transcoding that I have everything to learn.

I had a really poor experience with that distro, being blocked a every steps to make transcoding work...
On the pre-installed windows 11, I also tested emby server and couldn't pass 3 to 4 concurrent streams. Not the performance I had in mind when reading you could do 7 to 8 4K with your setup, but I had no memories about the details of your hardware either.

Today, I reinstalled the last fedora on the miniPC and installed the snap version of Emby (4.7.11). No quick-sync available, only vaapi. Performance is quite OK. I could hit 5 transcodes of 1080p but 4K was not starting at all.

So I installed on bare metal the last beta to the system. QuickSync is showned in the list (yeah) and 4K is playing pretty nicely (yeah x2). I could ran 5-6 streams (with regulation activated) before hitting the 1G bandwith. The result was much better than expected for a 2 hours experience with that distro starting from scratch :)

In the end, it seems that Emby-beta is far more confortable with quick-sync on linux and for my setup (Emby on a different machine than the media and a slow lan connection), even faster than windows. I wonder if I could hook up the mini to the main server with something else than a lan connecter, like thunderbolt for example...

Anyway, I will continue to benchmark all of this in the days to come.

  • Like 1
Link to comment
Share on other sites

rbjtech
11 hours ago, Gecko said:

@rbjtechAlways a pleasure to read your full_of_wisdom replies :)

The medias are on a separate machine over a simple 1G connection.

Thanks for taking the time to explain your setup.

So above is your 'bottleneck' - for the 7-8 4K remux transcodes, I was using LOCAL storage - nvme disk to be precise - and thus I was able to 'read' the data at a sustained rate.   If using a 1Gig LAN, then while the theory suggests you should be able to 'read' 10 x 100Mbit/sec streams, in reality - this just isn't possible due to the tcp/ip overheads.  In using NAS as the source, tbh I think you have done very well with 3-4 concurrent streams (for ~80Mbit/sec x 4 = 320Mbit/sec) - which is way over a single SATA3 disk max I/O. (max of 220-250Mbit/sec).

Options to connect the NAS to the emby transcoder include 10Gbit connections (but expensive) or something like eSATA or maybe even USB3 (if physically close) if the NAS supports it.

Link to comment
Share on other sites

Gecko
1 minute ago, rbjtech said:

Thanks for taking the time to explain your setup.

So above is your 'bottleneck' - for the 7-8 4K remux transcodes, I was using LOCAL storage - nvme disk to be precise - and thus I was able to 'read' the data at a sustained rate.   If using a 1Gig LAN, then while the theory suggests you should be able to 'read' 10 x 100Mbit/sec streams, in reality - this just isn't possible due to the tcp/ip overheads.  In using NAS as the source, tbh I think you have done very well with 3-4 concurrent streams (for ~80Mbit/sec x 4 = 320Mbit/sec) - which is way over a single SATA3 disk max I/O. (max of 220-250Mbit/sec).

Options to connect the NAS to the emby transcoder include 10Gbit connections (but expensive) or something like eSATA or maybe even USB3 (if physically close) if the NAS supports it.

They will be physically close. Medias are stored on an unraid server using amd cpu/nvidia gpu which actually host emby server. But running a single transcode seems to use around 100watts with that setup whereas the mini pc uses less than 30 watts at peak for holding 5 streams.

my goal is to get rid of the gpu, offload a lot of containers to the new mini pc, including emby, for a transition period so that my main System can go to sleep during the day when not watching media.

in 2024, new Intel cpu with av1 encode should come out and ultimately replace my amd system once and for all, reintegrating emby locally :)

i think you made a mistake with mbits and mo/s. A single drive is around 300mo/s, so more than enough to hold more than 4 streams… or I am misunderstanding something?

Link to comment
Share on other sites

rbjtech
23 minutes ago, Gecko said:

i think you made a mistake with mbits and mo/s. A single drive is around 300mo/s, so more than enough to hold more than 4 streams… or I am misunderstanding something?

doh! - yep rookie mistake lol, I meant Mbyte/sec for the disk I/O stats .. 

If I've understood your setup, it doesn't matter what the SATA3 transfer rate is - you are limited by the 1Gb Ethernet interface ?

For local storage, even a single SATA3 disk should be able to sustain the 7-8 transcoding streams (but you will stress it, remember it's having to do random I/O to get the multiple files..)

Link to comment
Share on other sites

Gecko
On 5/19/2023 at 1:04 PM, rbjtech said:

doh! - yep rookie mistake lol, I meant Mbyte/sec for the disk I/O stats .. 

If I've understood your setup, it doesn't matter what the SATA3 transfer rate is - you are limited by the 1Gb Ethernet interface ?

For local storage, even a single SATA3 disk should be able to sustain the 7-8 transcoding streams (but you will stress it, remember it's having to do random I/O to get the multiple files..)

Exactly, my network seems to be the limiting factor... for the time being.

My data being spread across disks in Unraid, given the media that are chosen to be streamed, I may end up on different physical drives or even ssd, so I'm not really concerned about this.(I did some tests today, and a 4K media ask a spinning disk merely 20mo/s, I'm far from a single disk limit)

Emby is transcoding @50fps or so for a full transcode from 4K HDR to 1080p with PGSSUB activated and tone mapping. The Nvidia GPU on my main server is just a little bit higher around 80 at best, not too bad for an iGPU asking 20 watts to do the same job :)

 

 

Link to comment
Share on other sites

rbjtech
1 minute ago, Gecko said:

Exactly, my network seems to be the limiting factor... for the time being.

My data being spread across disks in Unraid, given the media that are chosen to be streamed, I may end up on different physical drives or even ssd, so I'm not really concerned about this.(I did some tests today, and a 4K media ask a spinning disk merely 20mo/s, I'm far from a single disk limit)

Emby is transcoding @50fps or so for a full transcode from 4K HDR to 1080p with PGSSUB activated and tone mapping. The Nvidia GPU on my main server is just a little bit higher around 80 at best, not too bad for an iGPU asking 20 watts to do the same job :)

 

 

I find it's PGS subs that really have a massive hit on performance.   I'll get between 80-110 fps on a 4K > 1080p ~10Mbit (with TM) but as soon as I add PGS, it'll dive to 50% of that (~50 fps or less).

I always use SRT for this reason as I don't believe the PGS subtitles are hardware accelerated - according to the 'User Sessions' page anyway...

Maybe @softworkzcould weight in here..

Link to comment
Share on other sites

  • 1 month later...
Gecko

Hello @rbjtech,

I read your reply lately and indeed, I have the same conclusions as you regarding PGSSUB.

Using a 4K HDR10 @50Mbps media on my 12500T, I get the following result using a natively and dockerized emby for the same output and video segment. The media is on a distant NAS over a 1Gbps network link. VAAPI is used as decoder, QuickSync as encoder. Default settings everywhere else. Little to no overhead can be seen with docker, which is great. The CPU has been tested on various performance profile. Only 3 increase the performance even further by 10% : latency-performance, network-latency, network-troughput. The results below are on power-save profile :

Installation Method Input Resolution Tone Mapped Active Subtitles Subtitle Overlay Output Resolution FPS
Native 4K @50Mbps Yes No Hardware 1080p @10Mbps 106
Native 4K @50Mbps Yes Yes Hardware 1080p @10Mbps 83
Native 4K @50Mbps Yes Yes Software 1080p @10Mbps 52
Native 4K @50Mbps Yes No Software 1080p @10Mbps 104
Docker 4K @50Mbps Yes No Hardware 1080p @10Mbps 101
Docker 4K @50Mbps Yes Yes Hardware 1080p @10Mbps 83
Docker 4K @50Mbps Yes Yes Software 1080p @10Mbps 53
Docker 4K @50Mbps Yes No Software 1080p @10Mbps 101
          b  

Using VAAPI only, the numbers are a little bit higher : 133 (instead of 106) without subs, 66 (instead of 53) with subs.

 

Side note for @softworkz: current hardware overlay implementation does wash out the Tone Mapped colors when using QuickSync compared to software overlay or hardware overlay with VAAPI as shown in the attached screenshots. Using VAAPI as encoder result in poor performance when tone mapping method is set to Extra-T - OpenCL (less than 24fps) on my side. And the subtitle positioning is still off as noticed in another thread.

 

The best overall solution is the default one proposed by emby : VAAPI all the way with software decoding for graphical subtitles. But, in a near future, when the tone mapping and subtitle position will be corrected by softworkz, the hardware overlay solution will be a much more efficient one to run.

 

4M TM PGS HW -> 10MB.png

4M TM PGS SW -> 10MB.png

  • Like 1
Link to comment
Share on other sites

Thanks for confirming that the default chosen method is the most efficient one.

I can't say much without seeing the log files, so I can see exactly how the transcoding pipelines are being set up in each case.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...