Jump to content

Throttle Cleanup


jaketame
 Share

Recommended Posts

Napsterbater
17 minutes ago, softworkz said:

When you want to do some math, you must not forget that video processing (same as decoding-to and encoding-from) is done on uncompressed frames:

A 4k image has 3840×2160=8,294,400 pixels for simplicity, let's assume BGRA with 4 bytes per pixel, makes 32 MB per video frame. 
With 30fps, that already makes 960MB per each second of video that might need to be moved around. When it needs to be copied back and forth between GPU and CPU memory, that's already 2GBps. When we would have 5 simultaneous transcodings at 5x speed, this would make 50 GBps. And that is only for one transfer from GPU-to-system memory and back. It doesn't include all the other memory IO that is happening at the same time, so you could end up at an even higher value. All that is everything but "very minor".

Granted - it's an extreme example, but I hope it helps to understand the dimensions of these things a bit better.

I am not doing Hardware decoding (system has a 3950x no need for GPU for current use), and I rarely have more than 4 streams transcoding at a time, most are direct. I am also not doing 4k.

So even with the hardware decoding, cutting it to 1080p should be roughly a 4x reduction (give or take), making 5 streams at 5x about 12.5 GBps (if they are all hitting at the same time, if there are even 5 streams), on a system with a little over 40 GBps using conservative estimates) of memory bandwidth, with it not really doing anything else CPU/Memory intensive on even a remotely regular basis though does quite a bit of constant disk IO.

My emby system is more disk IO bound then CPU/Memory Bandwidth, in my setup I have little concern for Memory bandwidth.

Also why add another hard drive just for temp transcoding when I have free RAM space and bandwidth to deal with it.

There is no noticeable impacts on anything that the system is doing when transcoding is going on with the current setup. But the IO saved for the other task using that disk array are improved, it may not be a huge improvement but it was noticeable, and yes I could waste a HDD, but I don't need as it would not make any noticeable difference to how the server is currently setup and would require an extra physical item needing power, generating heat and wear, yes insignificant amounts, but more then the current setup for no benefit. 

I am not saying what you are saying is wrong, again you make valid points that need to be considered for each setup and usage, but not every situation or setup is the same. In my case the ONLY point for this was to cut down on Disk IO, nothing else, In the future things may change for my usage or setup, but for now it is working perfectly fine, transcodes start really fast they never run out of buffer and the server it is running on never skips or misses any other beats for what it is doing.

Link to comment
Share on other sites

 

9 minutes ago, Napsterbater said:

I rarely have more than 4 streams transcoding at a time, most are direct. I am also not doing 4k

What's the fuzz about all this then? That's not even a case worth discussing because there's no concern either way.

15 minutes ago, Napsterbater said:

But the IO saved for the other task using that disk array are improved, it may not be a huge improvement but it was noticeable

No need for a ramdisk, no need for a dedicated HD in that case you are describing. Activate throttling and you should be fine. There won't be any relevant impact on your storage, then.

20 minutes ago, Napsterbater said:

but not every situation or setup is the same

That's very true. Your case has nothing to do with those I was describing. If you have fun by using a ram disk instead of a HD that's fine - like all other possible options you have.

25 minutes ago, Napsterbater said:

In the future things may change for my usage or setup,

Yes! Same like us and what I can say is that the goal for transcoding-temp storage will not be about optimizing for low storage memory. You need to expect this to require even more storage space for certain use cases (e.g. TV). 

What's clear though is that the missing cleanup in certain cases is an incorrect behavior which will be fixed.

There are more streaming changes in the pipeline as well... 🙂 

  • Like 1
Link to comment
Share on other sites

I disagree a lot and you can read this over and over in our forums. We see many systems IO bound all the time.  Often it's the difference between Live TV working and Live TV not working when recording even sometimes just with one recording. My Synology 920+ wasn't good for DVR and watching at the same until I dropped two 1 TB nvme sticks in and set them up to do read/write caching which acts very similar to a ram disc since I have writes delayed.  But it's still only a fast SSD and using a pure RAM disc would still be a lot better and faster if the size was managed.

My two i7 computers are the same way.  I can't run Live TV well on either of them either without using an SSD as the IO is a problem.  It's NOT about the total throughput but the queue management of reading/writing all these files when they can't be written immediately. The slower the write media the worse this becomes even though it's the same IO throughput. This is especially true with SATA vs SCSI or SAS for example as the CPU has to be involved in the process. Ram disc change the way this works as the files don't touch the SATA bus.

<cough>the competition has done this a while back removing the oldest files from the transcode folder and you can easily run 5 or 6 transcodes with a small ram disc and things work pretty well.  Obviously if you remove older parts of the media being transcoded and the person backs up it might need to get transcoded again but that's a fine trade off on small ram discs and the person can always add more memory and expand the size from 2 or 3 GB to 8 or 16GB.

But it does make a huge difference. Even going from HDD to SDD can make a substantial difference to Live TV and recording!

  • Agree 2
Link to comment
Share on other sites

7 minutes ago, cayars said:

My Synology 920+ wasn't good for DVR and watching at the same until I dropped two 1 TB nvme sticks in and set them up to do read/write caching which acts very similar to a ram disc since I have writes delayed.  But it's still only a fast SSD and using a pure RAM disc would still be a lot better and faster if the size was managed.

A separate HD would have done it. SSD is fine as well of course. But especially for devices like Synology, a ram disk would make things worse rather than better.

Try it. Create a RAM disk, then run two transcodings with subtitle burn-in.

Compare with the ram disk and the SSD for transcoding-temp.

And then we can talk about your can disagreements 😉

Link to comment
Share on other sites

Experience has shown over and over again it doesn't help. It's not about the throughput on one disc but the overhead of writing the small files and reading them back from slow media. Switching the transcoding path from one slow disc to another slow disc likely on the SATA bus doesn't help.

We already know the answer to the question you asked because I always suggest people pick up a cheap SSD to use for the transcoding folder and their issues go away. Just moving the transcode folder to another HDD offers little or no improvement.

Moving it to an SSD helps but it's usually still on the SATA bus which is what I'd like to remove from this by having a ram disc option.

Edited by cayars
  • Agree 2
Link to comment
Share on other sites

I don't want to doubt your experience, but the segment-writing is a really simple and lightweight operation, and when you say that a dedicated normal HDD can't handle it, then we're doing something wrong that we need to find out about.

(I have a certain possible suspect cause, actually)

Link to comment
Share on other sites

1 hour ago, softworkz said:

(I have a certain possible suspect cause, actually)

When you say it's specific to TV, then there's another possible culprit.
(which won't apply anymore in the future).

Link to comment
Share on other sites

I would say it shows up quicker when using Live TV but not just with this.  Anytime we're writing out small TS files.  If these TS files happen to be on the same disc used for other purposes like the windows drive with the swap file and other programs running it will be similar to the example below when recording.

Test below is recording 3 channels off a Prime tuner on an i7 machine with 32GB memory. There was one playback as well.
Windows and Emby running on NVME SSD with nothing else running or touching the disks being measured.  

I made sure there was no fragmentation on any of the discs before running this test.
image.png.a0d1cdc871edacf60f8e17d560dd6599.png

C is the windows drive on NVME which is running Emby.
E is a normal HDD (not SDD) and had both the DVR and Transcoder paths on it and was used for nothing else.
D was not used at all.
I turned off all security software to make sure they did not interfere with testing.

Look at the queue length for Drive E which is very high.
The Queue length for drive C running windows and Emby is totally fine and under 0.1

image.png.d90ae06088142bd2823143e20fa75e92.png

The rule of thumb is that the queue length should always be 1/2 or lower then the spindle count.  This is 1 disc and should be under 0.5 which we can see is not even close.

I used DiskPart to check the allocation size:
image.png.e6c0a1f4657d7af247e08bb7a9090e48.png

Imagine what the performance and split I/O would be if I had used something like 64K or larger allocation size which is what I typically do for media drives.

After doing that I ran a diagnostic test while still recording and get a warning about high split I/O. I of course didn't format these drives with the intention of such small files being constantly written to them so I get a serious performance hit because of that.  With SSDs it doesn't matter nearly as much and of course with a RAM disc you can format it for very small file/segment use to make it a perfect fit.
image.thumb.png.1f2042c9ab93c2e5b1cfe3b0c41c09b2.png

There is also what I'd call a flaw when recording in Emby.  Any/all recordings are being written twice simultaneously which makes no sense.  One file is written to the DVR folder and one file is written to the transcode folder. I can't think of a reason to do that. Even if the one in the Transcode folder is being used to allow the person to start playback from the beginning it could use the file in the DVR folder to do that instead since it's writing them both as it goes. The person is watching the guide entry not the channel per say so if the playback stops the same time the recording stops, that is fine. So there is wasted IO because of the "double write" as well.

But the main killer to performance is the size of the TS files. Being able to adjust them higher would cut down a lot on disc IO issues. Don't know if there is any wiggle room on this compared to what the clients need to receive but I believe we're on the smaller size right now compared to many other streaming platforms (but would need to check that).

This is the reason I immediately added the NVME chips to my Synology to use for read/write caching so the discs would never see these types of I/O splits taking place as the TS files would never actually get written to the HDDs.

Edited by cayars
Link to comment
Share on other sites

 

12 hours ago, cayars said:

Imagine what the performance and split I/O would be if I had used something like 64K or larger allocation size

Better performance and lower split I/O, but higher waste of disk space.

12 hours ago, cayars said:

Look at the queue length for Drive E which is very high.

Take the ffmpeg command and run it manually and with Emby Server stopped. (or even all three commands simultaneously)

I'm sure the queue length will be fine. It's not about the segments, it's about certain things that Emby is doing.

The only thing I can say publicly is that these "things" will go away soon 😉 

Link to comment
Share on other sites

jaketame

I realise I created this thread and has been a lot of activity last few days. The original request was

- When you have 10 people transcoding movies specifically then you can run 100/200GB in transcoding-temp very quickly thus filling up potential disks. If we can have a 10-15 minute buffer either side of fast-forward / rewind then that would be ideal. 

I've never had issues with performance of an SSD with 10+ people its more the space limitation that is the issue.

Edited by jaketame
Link to comment
Share on other sites

18 minutes ago, jaketame said:

When you have 10 people transcoding movies specifically then you can run 100/200GB

Yes, that's a good size for 10 transcode sessions.

I know that it appears to be weird that it takes so long to deliver a seemingly simple feature. It's just unfortunately something at the end of a longer causality chain (before this, we need X and before X we need Y and before Y we need....) But we will have a solution for this in the future - it has never been forgotten...

But first of all we need to identify those cases where segments don't get deleted on playback stop.

Link to comment
Share on other sites

jaketame
2 minutes ago, softworkz said:

Yes, that's a good size for 10 transcode sessions.

I know that it appears to be weird that it takes so long to deliver a seemingly simple feature. It's just unfortunately something at the end of a longer causality chain (before this, we need X and before X we need Y and before Y we need....) But we will have a solution for this in the future - it has never been forgotten...

But first of all we need to identify those cases where segments don't get deleted on playback stop.

Completely understand! I float between all media sever platforms but for mass transcode unfortunately Plex does just work for this use case. I look forward to the day Emby has this 🙂 and all the other behind the scenes things that have been going on in the background over last few months.

Link to comment
Share on other sites

  • 1 month later...
malecoda

Urgh, ended up with a 2.5TB transcode file on 4.7.0.3 beta. Is there any chance this will be fixed anytime soon? I'd classify this as more a bugfix than a feature request. Disabling scrubbing is preferable to breaking things.

Link to comment
Share on other sites

Happy2Play
1 minute ago, malecoda said:

4.7.0.3 beta.

In that version no as it is an obsolete beta.

Not sure what is in the current beta but release 4.6.7.0 has

  • Improve cleanup of transcoding processes

But in the end Emby will maintain and entire transcode until session is ended.  So this improvement will only apply to ended sessions.

Link to comment
Share on other sites

malecoda

updated to 4.7.0.18 beta, same issue listed as feature request persists. Stream transcoding file sizes still appear to be uncapped or at least not user editable and can grow to fill all available space. I assume that this request is still waiting to be implemented. 

Link to comment
Share on other sites

7 hours ago, Happy2Play said:

Improve cleanup of transcoding processes

Sometimes, the files in transcoding-temp were not deleted after playback has stopped. 
This is what has been fixed or improved.

Link to comment
Share on other sites

On 11/2/2021 at 10:19 AM, cayars said:

I would say it shows up quicker when using Live TV but not just with this.  Anytime we're writing out small TS files.  If these TS files happen to be on the same disc used for other purposes like the windows drive with the swap file and other programs running it will be similar to the example below when recording.

Test below is recording 3 channels off a Prime tuner on an i7 machine with 32GB memory. There was one playback as well.
Windows and Emby running on NVME SSD with nothing else running or touching the disks being measured.  

I made sure there was no fragmentation on any of the discs before running this test.
image.png.a0d1cdc871edacf60f8e17d560dd6599.png

C is the windows drive on NVME which is running Emby.
E is a normal HDD (not SDD) and had both the DVR and Transcoder paths on it and was used for nothing else.
D was not used at all.
I turned off all security software to make sure they did not interfere with testing.

Look at the queue length for Drive E which is very high.
The Queue length for drive C running windows and Emby is totally fine and under 0.1

image.png.d90ae06088142bd2823143e20fa75e92.png

 

@cayarsIs your HDD a SMR disk by any chance?

I have read that this type of disks are not recommanded for nas usage and the are very slow to copy small files.

Can it be the source of the lags you see ?

Link to comment
Share on other sites

No way. A buddy of mine bought 16 external drives, shucked the cases and had them in the case needing my help with a ZFS. Tried a few different layouts and even with only 4 drives per vdev in a stripe0 and then the 4 vdevs striped (no parity) for maximum performance the speed was no better when empty then a typical desktop HDD. As the drives filled up the speed just went down hill from there in the 30Mb/s range which is horrible.

This is just me personally saying this but I would use those drives for only one purpose and that would only if I got them at significant discount from CMR to even consider it. I would use them as a one time backup of existing media.  So I'm going to start writing data to the drive and not stop till it's filled and then never touch the drive again unless I had to recover.

So they have their place for "cold storage" use or "online archives" using cloud nomenclature but you would not want to use them for your daily IO or daily processing.  I'm sure companies like Backblaze and Amazon Glacier make fair use of them however as they get more storage for dollar.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...