Jump to content

Recommended Posts

Napsterbater
Posted
17 minutes ago, softworkz said:

When you want to do some math, you must not forget that video processing (same as decoding-to and encoding-from) is done on uncompressed frames:

A 4k image has 3840×2160=8,294,400 pixels for simplicity, let's assume BGRA with 4 bytes per pixel, makes 32 MB per video frame. 
With 30fps, that already makes 960MB per each second of video that might need to be moved around. When it needs to be copied back and forth between GPU and CPU memory, that's already 2GBps. When we would have 5 simultaneous transcodings at 5x speed, this would make 50 GBps. And that is only for one transfer from GPU-to-system memory and back. It doesn't include all the other memory IO that is happening at the same time, so you could end up at an even higher value. All that is everything but "very minor".

Granted - it's an extreme example, but I hope it helps to understand the dimensions of these things a bit better.

I am not doing Hardware decoding (system has a 3950x no need for GPU for current use), and I rarely have more than 4 streams transcoding at a time, most are direct. I am also not doing 4k.

So even with the hardware decoding, cutting it to 1080p should be roughly a 4x reduction (give or take), making 5 streams at 5x about 12.5 GBps (if they are all hitting at the same time, if there are even 5 streams), on a system with a little over 40 GBps using conservative estimates) of memory bandwidth, with it not really doing anything else CPU/Memory intensive on even a remotely regular basis though does quite a bit of constant disk IO.

My emby system is more disk IO bound then CPU/Memory Bandwidth, in my setup I have little concern for Memory bandwidth.

Also why add another hard drive just for temp transcoding when I have free RAM space and bandwidth to deal with it.

There is no noticeable impacts on anything that the system is doing when transcoding is going on with the current setup. But the IO saved for the other task using that disk array are improved, it may not be a huge improvement but it was noticeable, and yes I could waste a HDD, but I don't need as it would not make any noticeable difference to how the server is currently setup and would require an extra physical item needing power, generating heat and wear, yes insignificant amounts, but more then the current setup for no benefit. 

I am not saying what you are saying is wrong, again you make valid points that need to be considered for each setup and usage, but not every situation or setup is the same. In my case the ONLY point for this was to cut down on Disk IO, nothing else, In the future things may change for my usage or setup, but for now it is working perfectly fine, transcodes start really fast they never run out of buffer and the server it is running on never skips or misses any other beats for what it is doing.

Posted

 

9 minutes ago, Napsterbater said:

I rarely have more than 4 streams transcoding at a time, most are direct. I am also not doing 4k

What's the fuzz about all this then? That's not even a case worth discussing because there's no concern either way.

15 minutes ago, Napsterbater said:

But the IO saved for the other task using that disk array are improved, it may not be a huge improvement but it was noticeable

No need for a ramdisk, no need for a dedicated HD in that case you are describing. Activate throttling and you should be fine. There won't be any relevant impact on your storage, then.

20 minutes ago, Napsterbater said:

but not every situation or setup is the same

That's very true. Your case has nothing to do with those I was describing. If you have fun by using a ram disk instead of a HD that's fine - like all other possible options you have.

25 minutes ago, Napsterbater said:

In the future things may change for my usage or setup,

Yes! Same like us and what I can say is that the goal for transcoding-temp storage will not be about optimizing for low storage memory. You need to expect this to require even more storage space for certain use cases (e.g. TV). 

What's clear though is that the missing cleanup in certain cases is an incorrect behavior which will be fixed.

There are more streaming changes in the pipeline as well... 🙂 

  • Like 1
Posted

I disagree a lot and you can read this over and over in our forums. We see many systems IO bound all the time.  Often it's the difference between Live TV working and Live TV not working when recording even sometimes just with one recording. My Synology 920+ wasn't good for DVR and watching at the same until I dropped two 1 TB nvme sticks in and set them up to do read/write caching which acts very similar to a ram disc since I have writes delayed.  But it's still only a fast SSD and using a pure RAM disc would still be a lot better and faster if the size was managed.

My two i7 computers are the same way.  I can't run Live TV well on either of them either without using an SSD as the IO is a problem.  It's NOT about the total throughput but the queue management of reading/writing all these files when they can't be written immediately. The slower the write media the worse this becomes even though it's the same IO throughput. This is especially true with SATA vs SCSI or SAS for example as the CPU has to be involved in the process. Ram disc change the way this works as the files don't touch the SATA bus.

<cough>the competition has done this a while back removing the oldest files from the transcode folder and you can easily run 5 or 6 transcodes with a small ram disc and things work pretty well.  Obviously if you remove older parts of the media being transcoded and the person backs up it might need to get transcoded again but that's a fine trade off on small ram discs and the person can always add more memory and expand the size from 2 or 3 GB to 8 or 16GB.

But it does make a huge difference. Even going from HDD to SDD can make a substantial difference to Live TV and recording!

  • Agree 2
Posted
7 minutes ago, cayars said:

My Synology 920+ wasn't good for DVR and watching at the same until I dropped two 1 TB nvme sticks in and set them up to do read/write caching which acts very similar to a ram disc since I have writes delayed.  But it's still only a fast SSD and using a pure RAM disc would still be a lot better and faster if the size was managed.

A separate HD would have done it. SSD is fine as well of course. But especially for devices like Synology, a ram disk would make things worse rather than better.

Try it. Create a RAM disk, then run two transcodings with subtitle burn-in.

Compare with the ram disk and the SSD for transcoding-temp.

And then we can talk about your can disagreements 😉

Posted (edited)

Experience has shown over and over again it doesn't help. It's not about the throughput on one disc but the overhead of writing the small files and reading them back from slow media. Switching the transcoding path from one slow disc to another slow disc likely on the SATA bus doesn't help.

We already know the answer to the question you asked because I always suggest people pick up a cheap SSD to use for the transcoding folder and their issues go away. Just moving the transcode folder to another HDD offers little or no improvement.

Moving it to an SSD helps but it's usually still on the SATA bus which is what I'd like to remove from this by having a ram disc option.

Edited by cayars
  • Agree 2
Posted

I don't want to doubt your experience, but the segment-writing is a really simple and lightweight operation, and when you say that a dedicated normal HDD can't handle it, then we're doing something wrong that we need to find out about.

(I have a certain possible suspect cause, actually)

Posted
1 hour ago, softworkz said:

(I have a certain possible suspect cause, actually)

When you say it's specific to TV, then there's another possible culprit.
(which won't apply anymore in the future).

Posted (edited)

I would say it shows up quicker when using Live TV but not just with this.  Anytime we're writing out small TS files.  If these TS files happen to be on the same disc used for other purposes like the windows drive with the swap file and other programs running it will be similar to the example below when recording.

Test below is recording 3 channels off a Prime tuner on an i7 machine with 32GB memory. There was one playback as well.
Windows and Emby running on NVME SSD with nothing else running or touching the disks being measured.  

I made sure there was no fragmentation on any of the discs before running this test.
image.png.a0d1cdc871edacf60f8e17d560dd6599.png

C is the windows drive on NVME which is running Emby.
E is a normal HDD (not SDD) and had both the DVR and Transcoder paths on it and was used for nothing else.
D was not used at all.
I turned off all security software to make sure they did not interfere with testing.

Look at the queue length for Drive E which is very high.
The Queue length for drive C running windows and Emby is totally fine and under 0.1

image.png.d90ae06088142bd2823143e20fa75e92.png

The rule of thumb is that the queue length should always be 1/2 or lower then the spindle count.  This is 1 disc and should be under 0.5 which we can see is not even close.

I used DiskPart to check the allocation size:
image.png.e6c0a1f4657d7af247e08bb7a9090e48.png

Imagine what the performance and split I/O would be if I had used something like 64K or larger allocation size which is what I typically do for media drives.

After doing that I ran a diagnostic test while still recording and get a warning about high split I/O. I of course didn't format these drives with the intention of such small files being constantly written to them so I get a serious performance hit because of that.  With SSDs it doesn't matter nearly as much and of course with a RAM disc you can format it for very small file/segment use to make it a perfect fit.
image.thumb.png.1f2042c9ab93c2e5b1cfe3b0c41c09b2.png

There is also what I'd call a flaw when recording in Emby.  Any/all recordings are being written twice simultaneously which makes no sense.  One file is written to the DVR folder and one file is written to the transcode folder. I can't think of a reason to do that. Even if the one in the Transcode folder is being used to allow the person to start playback from the beginning it could use the file in the DVR folder to do that instead since it's writing them both as it goes. The person is watching the guide entry not the channel per say so if the playback stops the same time the recording stops, that is fine. So there is wasted IO because of the "double write" as well.

But the main killer to performance is the size of the TS files. Being able to adjust them higher would cut down a lot on disc IO issues. Don't know if there is any wiggle room on this compared to what the clients need to receive but I believe we're on the smaller size right now compared to many other streaming platforms (but would need to check that).

This is the reason I immediately added the NVME chips to my Synology to use for read/write caching so the discs would never see these types of I/O splits taking place as the TS files would never actually get written to the HDDs.

Edited by cayars
Posted

 

12 hours ago, cayars said:

Imagine what the performance and split I/O would be if I had used something like 64K or larger allocation size

Better performance and lower split I/O, but higher waste of disk space.

12 hours ago, cayars said:

Look at the queue length for Drive E which is very high.

Take the ffmpeg command and run it manually and with Emby Server stopped. (or even all three commands simultaneously)

I'm sure the queue length will be fine. It's not about the segments, it's about certain things that Emby is doing.

The only thing I can say publicly is that these "things" will go away soon 😉 

Posted (edited)

I realise I created this thread and has been a lot of activity last few days. The original request was

- When you have 10 people transcoding movies specifically then you can run 100/200GB in transcoding-temp very quickly thus filling up potential disks. If we can have a 10-15 minute buffer either side of fast-forward / rewind then that would be ideal. 

I've never had issues with performance of an SSD with 10+ people its more the space limitation that is the issue.

Edited by jaketame
Posted
18 minutes ago, jaketame said:

When you have 10 people transcoding movies specifically then you can run 100/200GB

Yes, that's a good size for 10 transcode sessions.

I know that it appears to be weird that it takes so long to deliver a seemingly simple feature. It's just unfortunately something at the end of a longer causality chain (before this, we need X and before X we need Y and before Y we need....) But we will have a solution for this in the future - it has never been forgotten...

But first of all we need to identify those cases where segments don't get deleted on playback stop.

Posted
2 minutes ago, softworkz said:

Yes, that's a good size for 10 transcode sessions.

I know that it appears to be weird that it takes so long to deliver a seemingly simple feature. It's just unfortunately something at the end of a longer causality chain (before this, we need X and before X we need Y and before Y we need....) But we will have a solution for this in the future - it has never been forgotten...

But first of all we need to identify those cases where segments don't get deleted on playback stop.

Completely understand! I float between all media sever platforms but for mass transcode unfortunately Plex does just work for this use case. I look forward to the day Emby has this 🙂 and all the other behind the scenes things that have been going on in the background over last few months.

  • 1 month later...
Posted

Urgh, ended up with a 2.5TB transcode file on 4.7.0.3 beta. Is there any chance this will be fixed anytime soon? I'd classify this as more a bugfix than a feature request. Disabling scrubbing is preferable to breaking things.

Happy2Play
Posted
1 minute ago, malecoda said:

4.7.0.3 beta.

In that version no as it is an obsolete beta.

Not sure what is in the current beta but release 4.6.7.0 has

  • Improve cleanup of transcoding processes

But in the end Emby will maintain and entire transcode until session is ended.  So this improvement will only apply to ended sessions.

Posted

updated to 4.7.0.18 beta, same issue listed as feature request persists. Stream transcoding file sizes still appear to be uncapped or at least not user editable and can grow to fill all available space. I assume that this request is still waiting to be implemented. 

Posted
7 hours ago, Happy2Play said:

Improve cleanup of transcoding processes

Sometimes, the files in transcoding-temp were not deleted after playback has stopped. 
This is what has been fixed or improved.

Posted
On 11/2/2021 at 10:19 AM, cayars said:

I would say it shows up quicker when using Live TV but not just with this.  Anytime we're writing out small TS files.  If these TS files happen to be on the same disc used for other purposes like the windows drive with the swap file and other programs running it will be similar to the example below when recording.

Test below is recording 3 channels off a Prime tuner on an i7 machine with 32GB memory. There was one playback as well.
Windows and Emby running on NVME SSD with nothing else running or touching the disks being measured.  

I made sure there was no fragmentation on any of the discs before running this test.
image.png.a0d1cdc871edacf60f8e17d560dd6599.png

C is the windows drive on NVME which is running Emby.
E is a normal HDD (not SDD) and had both the DVR and Transcoder paths on it and was used for nothing else.
D was not used at all.
I turned off all security software to make sure they did not interfere with testing.

Look at the queue length for Drive E which is very high.
The Queue length for drive C running windows and Emby is totally fine and under 0.1

image.png.d90ae06088142bd2823143e20fa75e92.png

 

@cayarsIs your HDD a SMR disk by any chance?

I have read that this type of disks are not recommanded for nas usage and the are very slow to copy small files.

Can it be the source of the lags you see ?

Posted

No way. A buddy of mine bought 16 external drives, shucked the cases and had them in the case needing my help with a ZFS. Tried a few different layouts and even with only 4 drives per vdev in a stripe0 and then the 4 vdevs striped (no parity) for maximum performance the speed was no better when empty then a typical desktop HDD. As the drives filled up the speed just went down hill from there in the 30Mb/s range which is horrible.

This is just me personally saying this but I would use those drives for only one purpose and that would only if I got them at significant discount from CMR to even consider it. I would use them as a one time backup of existing media.  So I'm going to start writing data to the drive and not stop till it's filled and then never touch the drive again unless I had to recover.

So they have their place for "cold storage" use or "online archives" using cloud nomenclature but you would not want to use them for your daily IO or daily processing.  I'm sure companies like Backblaze and Amazon Glacier make fair use of them however as they get more storage for dollar.

  • 2 years later...
Posted

Has there been any updates for this feature request, or the option to disable the Live TV DVR/Time-Shift or limit it to only 1 hour of record time or X amount of GBs to keep the disk from filling up.

By the way I would also like to take a minute to give praise to the Dev team, I last used Emby back in 2020 and Live TV was very buggy with tuners not releasing if clients where not closed properly and such but you guys seemed to have ironed out a lot those bugs for the most part. Very impressed with the amount of work put in over the years so i bought a lifetime premiere this morning and will be migrating all my friends and family over to Emby from plex in the coming weeks. Great job guys! 

  • Thanks 1
Posted
On 12/28/2023 at 11:32 AM, vash5978 said:

Has there been any updates for this feature request, or the option to disable the Live TV DVR/Time-Shift or limit it to only 1 hour of record time or X amount of GBs to keep the disk from filling up.

By the way I would also like to take a minute to give praise to the Dev team, I last used Emby back in 2020 and Live TV was very buggy with tuners not releasing if clients where not closed properly and such but you guys seemed to have ironed out a lot those bugs for the most part. Very impressed with the amount of work put in over the years so i bought a lifetime premiere this morning and will be migrating all my friends and family over to Emby from plex in the coming weeks. Great job guys! 

Hi, this is planned for future updates. Thanks.

  • 6 months later...
Posted
On 11/2/2021 at 12:13 AM, softworkz said:

It's a complex subject and I wanted to respond to some other things first and see whether I'd be still in the mood to explain 😉
In fact, the recent couples of posts are covering so many things that I'm not quite sure where to start. Let's go for the ramdisk part first:

First of all: Sure! We have all learned that we can improve performance of certain processing tasks by things like using SSDs instead of conventional HDs and ultimately even using RAM disks.

What are those "certain tasks" that can be accelerated by such measures?

=> IO-bound tasks, i.e. tasks where the limiting factor is IO throughput

Is this actually a problem that we're having with transcoding that we'd need to solve?

=> No - not at all. Even with the fastest hw accelerations or even when assuming one that would be infinitely fast - the source data needs to come from somewhere. It doesn't come from RAM - it comes from some disk. The output bandwidth when transcoding is usually much smaller than the input, so what comes from some disk storage (network or local) can easily be saved to another disk storage.

Are there any reasons at all, why we would want to accelerate disk IO?

Now, let's assume an odd case, like 5 transcodings of videos from 5 different source disks, with the results all landing at our single disk with the transcoding-temp folder.
Even in that case, we would need an extremely powerful hardware acceleration to saturate the transcoding-temp disk's, IO capacity. And there, we would have each of those 5 transcodings running at a multiple (e.g. 5x or even 50x) of the required transcoding speed (1x).
By default, we are running transcodings "as fast as possible". That's not really a good thing, because this will always cause to max out one of the following resources: CPU processing, system memory IO, disk IO, hwa IO or hwa processing - to its limit, even though that wouldn't be required in any way to improve the user experience. It's often a waste of resources - e.g. when a user stops watching after 5 minutes and we have already transcoded the whole video..

To deal with those things and use resources more wisely, we have introduced throttling - just the opposite of "as fast as possible", for all those cases where transcoding is running much faster than needed.

Looking at the other end: None of all those cases where transcoding is too slow, can be accelerated by using a RAM disk.

Actually, this has just the opposite effect: it can slow down your transcodings instead of accelerating.

And not just that: A ramdisk can even negatively affect Emby server operation in general.

The problem with the throttling we have is that it is binary - like a refrigerator: there's just ON or OFF switching over time. While it's on, it still runs with maximum speed. Transcoding itself requires a lot of memory IO - all this IO bandwidth will be taken away and not available for regular Emby operation.

And now you want to add a ramdisk on top of this, creating even additional memory IO for writing and reading? Very bad idea! 
The more shoulders on which we can put these things, the better - the right "shoulder" for transcoding temp is a disk. Even when the CPU does the transfer of memory in either case - it's still totally different: RAM is just not optimized for such access patterns. When you use the RAM in this way, you will permanently flush the memory caches (specifically L3) with other data, and this can in turn very badly affect sw filtering operations where the same memory needs to be accessed repeatedly.

Edit: And as mentioned, also Emby Server and all other server operations. 

In Summary

With a RAM disk, you would:

  • Either not accelerate anything or at best, accelerate something that is already running much faster than needed
  • Use memory IO bandwidth and CPU cycles (for memcopy)  - which are badly needed for other operations
  • Negatively impact overall server operation

It’s an old message but worth mentioning again in 2024 :)

  • Disagree 1
  • 1 month later...
Posted

I was wanting to bring this up again to see what the current temperature on transcoding to ram is. I grabbed a high endurance SSD months ago to use as a dedicated Emby transcode scratch drive and was surprised to find that I've already blown through about 30% of it's expected life. At this rate I will be replacing it in less than 2 years. The majority of writes to the drive are due remuxing container incompatiblities or audio downmixing, not video transcoding. 

Posted

Transcoding to RAM is absolutely not recommended. 

I've given exhaustive explanations on that subject, maybe somebody can dig out a link to that conversation?

  • Disagree 1
  • Agree 1
Posted
2 hours ago, softworkz said:

Transcoding to RAM is absolutely not recommended. 

I've given exhaustive explanations on that subject, maybe somebody can dig out a link to that conversation?

Not recommended but.. It works. :)
As i don't want my transcoding to go to the disks my server normally uses for cache or to the array i don't really have any other choice than to transcode to memory.
It works perfectly fine for now, i got plenty of memory and till i find a scratch disk, only once i have lying around are crappy QLC sata SSDs that are not fit for anything that requires any writing.

  • Like 1
Posted

As mentioned, explained in detail already:

On 3/14/2023 at 11:20 AM, GrimReaper said:

As for transcoding in RAM, there's not much point besides saving some wear and tear on your SSD as transcoding is usually CPU - not I/O - limited, and can even have detrimental effects, as stipulated here:

On 11/1/2021 at 10:20 PM, softworkz said:

I'm afraid, but I have to disagree on that whole subject.

With regards to using a RAM disk for transcoding-temp, I can only say: Don't ever do this! It's a very very bad idea!

You don't even need an SSD for transcoding-temp, but (important!): It should be a local disk - no NAS, no SAN or anything like that. 
For a perfect setup, you'd have an average magnetic disk - but dedicated only for that single purpose.

In a while, I'll come back and explain why.

 

On 11/2/2021 at 12:13 AM, softworkz said:

It's a complex subject and I wanted to respond to some other things first and see whether I'd be still in the mood to explain 😉
In fact, the recent couples of posts are covering so many things that I'm not quite sure where to start. Let's go for the ramdisk part first:

First of all: Sure! We have all learned that we can improve performance of certain processing tasks by things like using SSDs instead of conventional HDs and ultimately even using RAM disks.

What are those "certain tasks" that can be accelerated by such measures?

=> IO-bound tasks, i.e. tasks where the limiting factor is IO throughput

Is this actually a problem that we're having with transcoding that we'd need to solve?

=> No - not at all. Even with the fastest hw accelerations or even when assuming one that would be infinitely fast - the source data needs to come from somewhere. It doesn't come from RAM - it comes from some disk. The output bandwidth when transcoding is usually much smaller than the input, so what comes from some disk storage (network or local) can easily be saved to another disk storage.

Are there any reasons at all, why we would want to accelerate disk IO?

Now, let's assume an odd case, like 5 transcodings of videos from 5 different source disks, with the results all landing at our single disk with the transcoding-temp folder.
Even in that case, we would need an extremely powerful hardware acceleration to saturate the transcoding-temp disk's, IO capacity. And there, we would have each of those 5 transcodings running at a multiple (e.g. 5x or even 50x) of the required transcoding speed (1x).
By default, we are running transcodings "as fast as possible". That's not really a good thing, because this will always cause to max out one of the following resources: CPU processing, system memory IO, disk IO, hwa IO or hwa processing - to its limit, even though that wouldn't be required in any way to improve the user experience. It's often a waste of resources - e.g. when a user stops watching after 5 minutes and we have already transcoded the whole video..

To deal with those things and use resources more wisely, we have introduced throttling - just the opposite of "as fast as possible", for all those cases where transcoding is running much faster than needed.

Looking at the other end: None of all those cases where transcoding is too slow, can be accelerated by using a RAM disk.

Actually, this has just the opposite effect: it can slow down your transcodings instead of accelerating.

And not just that: A ramdisk can even negatively affect Emby server operation in general.

The problem with the throttling we have is that it is binary - like a refrigerator: there's just ON or OFF switching over time. While it's on, it still runs with maximum speed. Transcoding itself requires a lot of memory IO - all this IO bandwidth will be taken away and not available for regular Emby operation.

And now you want to add a ramdisk on top of this, creating even additional memory IO for writing and reading? Very bad idea! 
The more shoulders on which we can put these things, the better - the right "shoulder" for transcoding temp is a disk. Even when the CPU does the transfer of memory in either case - it's still totally different: RAM is just not optimized for such access patterns. When you use the RAM in this way, you will permanently flush the memory caches (specifically L3) with other data, and this can in turn very badly affect sw filtering operations where the same memory needs to be accessed repeatedly.

Edit: And as mentioned, also Emby Server and all other server operations. 

In Summary

With a RAM disk, you would:

  • Either not accelerate anything or at best, accelerate something that is already running much faster than needed
  • Use memory IO bandwidth and CPU cycles (for memcopy)  - which are badly needed for other operations
  • Negatively impact overall server operation

Full read starts here:

 

  • Disagree 1
  • Agree 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...