Jump to content

Transcoding solutions


Recommended Posts

Posted

So far I am enjoying emby.  I purchased a 1-month premiere to try all the features I can before deciding on the lifetime purchase.

The way I have my emby server setup right now:

Dell Poweredge R420 dual Xeon(R) CPU E5-2430 v2

192GB RAM total

8TB ssd storage in RAID10 ZFS

256GB SSD host os LVM = Proxmox7

 

Emby installed in a debian 11 lxc with currently 2 cores and 8GB RAM.  I have tried as much as 8 cores (during testing).  Emby detects when it "should" transcode files but the processors are not up to the task.  In my research it appears to be linked to the processors not having/supporting intel quicksync.

My next avenue of approach is to get an e-gpu setup for hw transcoding.  I have everything setup the way I want it so I am trying to figure out a non-invasive way of making hw transcoding possible.  With my lxc I can do gpu passthrough and the server itself has 2 x16 pcie gen3 slots.  

I will go the e-gpu route if I could find a gpu that both works with my server and emby.

So, besides the e-gpu route, is there any possibility of emby being able to transcode reliably/efficiently with the existing processors?

Posted

Hi, regarding the processors, can we look at an example of what you feel is not performing well? Thanks.

Posted

In the two screenshots, only 1 stream is active and being transcoded.  It is being transcoded to an android client in the LAN (no access from public world).  The cpu usage according to my proxmox host is really high for only 1 stream.  Because I don't currently have a gpu for hw transcoding, the only options I have for transcoding in emby is yes, no, advanced and advanced doesn't see any hw capable devices.

Even when I attempted to give the lxc 8 cores and 8GB RAM the cpu usage was still around this typically.  

I have been trying to figure out what settings to convert video files to so that the "weakest" device on the network can direct stream the file but I have come to see the usefulness and convenience of transcoding.

For reference, the ID's on my host are 200's for lxc's and 100's for vms.  The vm I have (router) is 1 core and 1GB RAM which barely sees significant cpu usage.  The lxc's are 1 core 1GB RAM with 1 lxc having 4GB RAM (nfs/tftp/smb etc. server).

ss1.jpg

ss2.jpg

Posted (edited)

Just a cpl of questions as i will try and replicate on my proxmox setup

  • What are your LXC settings
  • How re you mapping your media to the container i.e are you using NFS to the container via Resource mounts
  • How quick does the actual transcoding complete
  • How does Top / Htop or even better Btop look like as opposed to proxmox monitoring

I will also force a transcode on my Cloud server to see what / how long the cpu spikes for

Update 1

Set off a 4k TV Show and forced it to transcode (dropped bitrate and 1080p), On my cloud server (4 dedicated Epyc Cores no HW transcoding available) results as follows

  • 1Hr Tv Program 4K 20+mbps - 1080p 2mbps = 4 cores average 60 - 70% for approx 10 Minutes to transcode the lot

I then thought what would be the impact of starting another transcode as that seems to be the concern

  • Started another movie (before the first one had finished transcoding above) 4k 18mbps > 720p 2mbps the cpu initially spiked to about 80% for a few seconds then settled transcoding both files at an average of 65%-75% max so no major impact on overall cpu use when TC 2 files over 1 so its scheduling working fine, how does your cpu usage appear when transcoding multiple files, does it max out and stutter or work ok

 

 

Edited by CassTG
Posted (edited)

So i just setup a Turnkey Debian CORE lxc container to see how it would run for ya

Bare in mind this is a little Lenovo M900 Tiny box (local network) with a much weaker cpu by miles (6400T), i tried with 2 cpu and 4 cpu cores and turned OFF intel quicksync so just software transcoding.

2 Cores

  • 1 Movie forced transcode = ave 90% Cpu usage
  • 2 Movies forced transcode = ave 95% Cpu Usage
  • 3 Movies forced Transcode = ave 99% Cpu usage

2 films transcoding played fine as soon as i put the 3rd into transcoding there was not enough resources so one film paused every now and then

160886852_Screenshot2022-03-19at13_38_30.thumb.jpg.e91c0cb8b7ad13d1ed4539c3908421c8.jpg

 

4 Cores

CPU usage was roughly the same 90-95% with some dips to 60% but 3 streams played fine on transcoding without stutters

So the question would be how many transcodes are likely to happen versus cost of gpu solution

 

Screenshot 2022-03-19 at 13.49.59.jpg

Edited by CassTG
Posted (edited)

@CassTG

Thank you for testing.  I appreciate it.  I am still a greenhorn in the media server game and although I managed to get my proxmox setup I went in blind, read the docs and applied what I read.  I am always learning so I will state ahead that I may not sound knowledgeable in some areas.

  • What are your LXC settings

LXC is a debian 11 standard image configured with a single admin user and root user.  Updated (at the time of install) and then emby installed on top of that.  It has 2 cores (for now) and 8GB RAM.  I have tried going up to 8 cores but didn't see much of an improvement at that point.

  • How re you mapping your media to the container i.e are you using NFS to the container via Resource mounts

I have a zfs pool of 8TiB in a RAID10 zfs using 2TB ssds.  I have a folder in that pool which is mounted to my lxc.  So it is directly mounted and not using network share service.

  • How quick does the actual transcoding complete

I have been testing with files as small as 1GB and as big as 5GB.  These are a mix of 2k/4k (5GB files) and some 1080/720 (1GB) files.  I say 2k/4k/1080/720 according to the resolution that media info reports to me.  On 1 2k/4k stream to 1 android tablet the transcoding happens near the duration of the file runtime.  And it definitely stutters on the tablet.

  • How does Top / Htop or even better Btop look like as opposed to proxmox monitoring

I'm guessing I would need to run these commands both in the lxc and the host at the same time while the transcoding is happening?

  • how does your cpu usage appear when transcoding multiple files, does it max out and stutter or work ok

I have seen similar to your tests in terms of cpu usage but unlike your test I definitely get stutters and even worse sometimes the playback just stops altogether.

  • So the question would be how many transcodes are likely to happen versus cost of gpu solution

At any given realistic time I would have a max of 5 streams active.  The 2 android tablets, the 2 smart tvs, and my main pc.  This should be the max number since when all 4 devices were running and I did notice some stuttering/pausing I went into the web browser of the server to attempt to play the same problem file and compare.

On your M900, while it may have a 6400T processor I think I read somewhere that even newer intel processors without ht can still outperform a xeon processor with ht.  I know that I have dual xeons but I tend to think that even though our processors (6400T to xeon e5-2430 v2) were released 1 year apart that's still a significant processor upgrade.  I could be wrong on this thought too.

Thanks again @CassTG for your time and efforts.

Edited by genuser
  • Like 1
Posted (edited)

Well i also wanted to do another test which i think you may want to undertake.

Whilst i use LXC for lightweight stuff i.e adguard and nextcloud, im more o a proper docker fan when it comes to containers

However i had an idea to test the same setup but in a Debian 11 minimal netinstall VM instead of LXC

And here is the results

I tested with  4 cpu cores and quicksync off

  • Started the same film and forced transcode - CPu started at 90% for a few seconds then dropped to ave 75%
  • Started the second film and transcoded - Cpu jumped to about 83% ave
  • Started the fourth film and transcoded - CPu sat at about 88% ave

So i decided what the hell and started a fourth film - Cpu sat at about 92-94% average

Whats interesting is in the VM the ram gets utilised more than in the lxc containers

904198835_Screenshot2022-03-19at14_25_44.thumb.jpg.b8903a432ea63491e460c73a5ffc66d3.jpg

389965632_Screenshot2022-03-19at14_26_27.thumb.jpg.2fb0946bef442925f25363965615d4cf.jpg

 

So it may be worth trying your setup again in a VM rather than LXC (which if you do use a egpu will be easier to passthrough)

But all 4 streams continued to play without stuttering, emby adkuste the number of fps encoded variably between the files as needed, to keep a good buffer to avoid pausing.

Yes the 6400t is newer but it's a mobile version of the chip so not all that and considering you can over provision on Proxmox quite happily, you could set your cpus to 8 or whatever 16 whatever you have as they are shared anyway and unless all your vms and lxc's are hammering the cpu all the time you wouldnt notice most of the time

Reason why i say run top or htop is i remembering reading about Proxmox summary screens and obv that includes the overhead of running the container and a few other things,  sure it's only a few percentage difference but all the same, so you could ssh in and just type

top

to see what the cpu is actually doing within the container

 

Edited by CassTG
Posted

Thanks again @CassTG.

I also did another small and quick test from my last post to this.

I played the same file on 1 tablet and my main pc at the same time.  I bumped up the cores to 6.  The lxc on the summary page showed 95% cpu avg.  Top on the host and the lxc showed 270%+ going to ffmpeg for emby on the 2 active streams.  This was a 5.2GB file size.

The results were the video was stuttering/pausing and the quality was lower than what the setting said it was.  It was pixelated.

As for the RAM usage between the vm and lxc, that is interesting and I wonder if there is a setting in the lxc config to allow it to actually use more RAM since my lxcs barely touch theirs and I would like for them to actually use RAM since I have a good amount of RAM and zfs is barely hogging that.

Also, I am using the lxc route because I can direct bind mount host folders into the container.  With a vm I have no choice but to use a network share.  I would like to avoid that (network share into vm) at all costs since I don't choose to backup the proxmox host or the vms/lxc but rather the actual data that is being processed...if that makes sense.

Q-Droid
Posted

You can also try to adjust your software transcoding settings to reduce the workload a bit. Increase CRF from the default of 23 by 1 or 2 and try faster encoding presets. Increasing CRF will reduce quality in very small increments, often not noticeable. Faster encoding presets compress less resulting in slightly higher bitrates for the same quality levels. 

Testing these options can buy you a little CPU headroom, it depends on what you can tolerate in terms of quality and bitrate.

 

Posted

@genuser

 

I thought i would try one last test which had interesting results when i compare to the initial LXC container test which maxed at 3 streams with minor stuttering

Taking into account why you want to use LXC's i tried the inception way of doing things by running a container within a container

Setup

  1. Created a LXC container based on Turnkey Debian 10 Core (this container is the only one this really works on)
  2. Setup with 4 cpus 4gb ram DHCP and 1.1.1.1 as dns
  3. in first lxc screen UNTICKED unprivileged container
  4. before booting add nested options (i also added Fuse as my tests are all done with media on Google Drive)

Started container

  1. Did all the updates but this time installed DOCKER for debian
  2. Visit Dockerhub and setup a quick docker container within the lxc container using these parameters to test
docker run -d \
  --name=emby \
  -e PUID=0 \
  -e PGID=0 \
  -e TZ=Europe/London \
  -p 8096:8096 \
  -v /emby_data:/config \
  -v /Mount/media:/data/movies \
  --restart unless-stopped \
  lscr.io/linuxserver/emby:4.7.0-beta

Now i ran the same tests and 3 streams played as per initial LXC with ffmpeg at 95% per stream so just under 300% of the 400% available however after a few minutes this did seem to crash the container.

So i installed Tuned which i use on all my vms and stuff (normally leaving at default virtual-guest)

sudo apt install tuned tuned-utils tuned-utils-systemtap
systemctl start tuned
systemctl enable tuned

and set the profile for a laugh and to test to:

sudo tuned-adm profile throughput-performance

I then opened up Emby again and started the same 3 films and set all to transcoding. 

As before the cpy was pegged at about 95% but all 3 streams were playing perfectly no stutter with a good buffer being made

So i thought why not start a 4th to see how it performs. Well it started fine, CPU spiked to 98%, but all 4 streams were playing perfectly without a dropped frame

Maybe worth a test setting up another lxc and trying the above. It may make no diff at all but takes 20 mins to try 

Posted
3 hours ago, genuser said:

Thanks again @CassTG.

I also did another small and quick test from my last post to this.

I played the same file on 1 tablet and my main pc at the same time.  I bumped up the cores to 6.  The lxc on the summary page showed 95% cpu avg.  Top on the host and the lxc showed 270%+ going to ffmpeg for emby on the 2 active streams.  This was a 5.2GB file size.

The results were the video was stuttering/pausing and the quality was lower than what the setting said it was.  It was pixelated.

As for the RAM usage between the vm and lxc, that is interesting and I wonder if there is a setting in the lxc config to allow it to actually use more RAM since my lxcs barely touch theirs and I would like for them to actually use RAM since I have a good amount of RAM and zfs is barely hogging that.

Also, I am using the lxc route because I can direct bind mount host folders into the container.  With a vm I have no choice but to use a network share.  I would like to avoid that (network share into vm) at all costs since I don't choose to backup the proxmox host or the vms/lxc but rather the actual data that is being processed...if that makes sense.

Just re-read this again, was wondering about a few things

  • Where both streams direct playing to the tablet and pc or were both immediately transcoding
  • what was the file format being played

If they were transcoding how have you setup the actual apps, i know i get scolded for saying this but i had no end of grief with files transcoding when there was no reason to (file bandwidth exceeds max Bw reasons) and this was 100% always to do with the AUTO setting in Playback > Video > Home Network and Internet Network speeds. FOR me this was the cause of so many issues, whenever i set up family Tv / Tablets etc this is the first setting i change without fail as it always solves that issue, and im never a fan of auto settings on anything.

Reason i ask is even my old LGTV plays back almost everything without transcoding same with the weakest device a firehd tablet. The only time i ever see a transcode is an old avi format tv show which to be fair are so small file size wise it only takes a minute to transcode the whole file so

 

Q-Droid
Posted
4 minutes ago, CassTG said:

If they were transcoding how have you setup the actual apps, i know i get scolded for saying this but i had no end of grief with files transcoding when there was no reason to (file bandwidth exceeds max Bw reasons) and this was 100% always to do with the AUTO setting in Playback > Video > Home Network and Internet Network speeds. FOR me this was the cause of so many issues, whenever i set up family Tv / Tablets etc this is the first setting i change without fail as it always solves that issue, and im never a fan of auto settings on anything.

You're right about the AUTO setting for client bitrate detection. It's too inconsistent and more often than not it errs in the wrong direction resulting in unacceptably low bitrates and resolutions. Can't always blame the client or the app since there are so many variables between that and the server. I agree that changing this setting it is one of the first things to do when remote clients have playback problems, or even before they complain.

  • Like 1
Posted

@Q-Droid

Thanks for the suggestion.  I tested changing some of those and saw no noticeable change in overall performance.  Quality was noticeable toward the worse direction unfortunately.

@CassTG

I did not stand up another container yet but I installed the tuned setup you mentioned.  I set the profile to throughput-performance.

Here are some observations and changes I have done so far:

Installed tuned* to my existing lxc.  The lxc is still on 6 cores (for testing).  I installed the emby desktop client onto my main pc.  I kept accessing the media files through my brave browser.  My brave browser has no issues playing 4k content (according to the providers) from youtube, prime video.  My brave browser has trouble playing 4k when streamed through the emby browser at which point emby decides to transcode it.  The test 4k content when streamed through the desktop client goes to direct play, no transcoding and no cpu hit.  The 4k content when streamed through the android client will do one of two things depending on settings.  It will either go to direct play or it will transcode.  Either way, direct or transcode it has issues on the android client.  

I have tried changing transcode settings from auto to fast to faster and changing the crf from 18 up to 28.  No changes that were satisfactory.

To answer your latest questions:  I have managed to get them both (web browser and android client) to go to direct play or transcode depending on the settings as you have mentioned.  When all is set to auto, it sometimes thinks it doesn't need to transcode when it should.  The file format I am testing with is .mkv encoded with h265.  This file (4k, mkv, h265) plays with 0 issues on the desktop client.  I'm assuming it is offloading the decoding to my pc at that point and has nothing to do with emby.  The test file on the web browser initiates transcoding based on settings and has issues.

I have also tried, among other things, changing the playback settings.  When stats for nerds was reporting "bitrate exceeds" was when I set the playback to 1080p 60/30/15/10Mbps.  The test file has a bitrate of 97Mbps.  Setting the playback to 4k 100 or 120Mbps results in transcode because it then reports "video codec not supported".  It seems that emby doesn't support the h265 codec through the web browser (maybe)...IDK.

I think, with the help from @CassTG and @Q-Droid, I have taken this processor as far as it can go.  I think the e-gpu route is the next logical step.  

Posted (edited)

Yeah i believe Chromium based browsers only supports h265 hevc playback on Chrome OS on 11th gen cpus and above

Just tested a 60mbps 4k hevc file in the following:

  • Chrome - Transcodes due to container issue
  • Safari - Gave up trying to play it lol (no suprise there)
  • Firefox - Transcodes

But yeah on desktop Emby Theatre app always the way to go

Odd however about the Android device, not yet come across one that cant direct play most stuff myself but every setup is different i guess

But as you say you have tested everything which validates buying a shiny new toy which is always a plus!

Edited by CassTG
Posted

So until I can get the right gpu (researching what works on pe r420 server) I'm thinking I will have to manually re-encode my 4k files for the time being.

I started thinking maybe I can still leverage what I currently have.  Software x264 encoding is all cpu so maybe I can standup another container or vm that does nothing but encoding.  When the file is done encoding I just move it to the emby server.

@CassTG Do you think having 1 container or vm pegging its cpu for software encoding (not transcoding) would impact the other services or even the host in a drastically negative way?  I know I can test it all but was hoping maybe you may have come across a similar scenario with proxmox.  

Posted (edited)
1 hour ago, genuser said:

So until I can get the right gpu (researching what works on pe r420 server) I'm thinking I will have to manually re-encode my 4k files for the time being.

I started thinking maybe I can still leverage what I currently have.  Software x264 encoding is all cpu so maybe I can standup another container or vm that does nothing but encoding.  When the file is done encoding I just move it to the emby server.

@CassTG Do you think having 1 container or vm pegging its cpu for software encoding (not transcoding) would impact the other services or even the host in a drastically negative way?  I know I can test it all but was hoping maybe you may have come across a similar scenario with proxmox.  

Guess it depends on what your other services are doing. I wouldn't obviously peg all cores as yeah that will cause pain but you have 2 cpus so i would start by assigning say 60%-70% all cores, if all your services run fine still then up it until you find the sweet spot

Also for reference one thing you might want to look at is the following docker Media manager / encoder this will scan your whole library and media-probe it, then you can apply a filter to it for example h265 4k over 60mbps and then set it to apply a "plugin" these are downloadable media profiles that you can apply to those files. That way the process is automated. A profile could be for example apply 4k - hdr - hd audio + AAC Audio track, 20mbps etc etc

And i think the issue  probably lies with the bitrate most likely so switching down to a more efficient 4k file shouldn't be to bad a process

It manages the process but you also set up a simple ffmpeg or handbrake vm or docker and it sets them off to do the heavy lifting so Tdarr doesent use much resources itself so if you created a VM with docker installed you could run tdarr docker plus and ffmpeg or even several ffmpeg dockers, plus i believe although not 100% sure it can send to other machines also to do the work i.e a pc etc. Hell you could probably also set up a few VPS in the cloud and also let them do some work, and if you are in Europe most German vps provides offer 14 days no questions ask refund / trial of there servers but of course one would not abuse that fairness 😉 And i know u keep your server off public but if u did get additional resources i.e cloud vps you could use rclone to create a secure sftp mount to your local server so it sends it off for processing and brings it back securely alongside your existing

I only played around with it for a short while so cannot vouch for the full process as i was just curious but may be worth looking at

Edited by CassTG
  • Like 1
Posted

I tried it.  I created a vm with 8GB ram and tested with 4, 8, 16 vcpu.  I installed win10 ltsc and the necessary virtio stuff for the vm.  Handbrake saw no significant time reduction even on 16 vcpu.  I have another box laying around (2 actually) and it has quicksync (2016 cpu) and handbrake was significantly faster using quicksync for encoding.  The r420 server didn't flinch when 16 vcpus were showing 70%+ usage total against the host total.  It's just amazing that handbrake can't seem to get faster on regular cpu encoding even on 16 vcpu.  Anyways, handbrake alternatives will be on the to-do list.

 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...