Jump to content

Emby Unresponsive


ShadowKindjal
Go to solution Solved by Happy2Play,

Recommended Posts

rbjtech
15 hours ago, ShadowKindjal said:

I completely agree that the latencies are ridiculously large and I appreciate you pointing that out. My current networking hardware utilizes a Ubiquiti USG-3P. The WAN port connects directly to my my ISP supplied ONT, the LAN port connects to a Unifi switch that handles all my LAN traffic, and the server connects directly to the LAN2 port on the USG.

So that network equipment should not have any IP queue issues.  Is see you are just using http - is this via emby directly (port forward) - you are not using any Reverse proxy etc ?

A quick look at the log has revealed that you are low on disk space for the transcoding folder - is this on the SSD ?

You had what looks to be 26 x transcoding sessions in parallel - if all of these were writing to the SSD then there may be a good chance of it struggling.   

I'm unsure what OS you are using - but something like Performance Monitor in Windows will show the disk queues in real time (the blue line on my disk #8 below) - if this goes above 2 - then this signifies the disk is the bottleneck.  In my case, I transcode to the O drive (and nvme SSD) - I see no queues at all (it stays at zero) but this is just a single transcode, not 26 transcodes ;) - Disk H: is a SATA disk.

I think you'll just need to monitor things as you get loaded and watch for areas of concern - do some screen captures of this and it will certainly show where the hot spots are.  You can look at all areas with this tool - disk, network, cpu etc - all will be monitored in real time.

image.thumb.png.ce9fa5187d9c31c84f2d71a0040c25fc.png

You are also running a PersonNfoprovider refresh at the same time ?  I would schedule this for out of hours.

This is the log for when it is running low of disk space.

image.thumb.png.6e79bf91a6af9c7408ce578e1a4b9311.png

Link to comment
Share on other sites

ShadowKindjal
3 hours ago, rbjtech said:

So that network equipment should not have any IP queue issues.  Is see you are just using http - is this via emby directly (port forward) - you are not using any Reverse proxy etc

My emby server is behind an nginx reverse proxy

3 hours ago, rbjtech said:

A quick look at the log has revealed that you are low on disk space for the transcoding folder - is this on the SSD ?

Yes, this is on my boot SSD. I have to monitor my disk space because the transcoding-temp folder doesn't always delete old files. I currently have files in this folder dated back to November. It's been creeping up and I just need to clear it out again but it's annoying. I believe I've reported this issue in the past.

image.png.9d23574b7a7ed8605ebea8b863b7eda1.png

3 hours ago, rbjtech said:

You had what looks to be 26 x transcoding sessions in parallel - if all of these were writing to the SSD then there may be a good chance of it struggling.   

I'm unsure what OS you are using - but something like Performance Monitor in Windows will show the disk queues in real time (the blue line on my disk #8 below) - if this goes above 2 - then this signifies the disk is the bottleneck.  In my case, I transcode to the O drive (and nvme SSD) - I see no queues at all (it stays at zero) but this is just a single transcode, not 26 transcodes ;) - Disk H: is a SATA disk.

I think you'll just need to monitor things as you get loaded and watch for areas of concern - do some screen captures of this and it will certainly show where the hot spots are.  You can look at all areas with this tool - disk, network, cpu etc - all will be monitored in real time.

image.thumb.png.ce9fa5187d9c31c84f2d71a0040c25fc.png

Thank you for this insight. I ordered a gen 4 NVME SSD (WD SN850X) that I will be installing before the week is up. Hopefully that will resolve my bottleneck issues. In regards to my OS, my setup is Ubuntu 18.04 but my whole stack is configured with docker-compose. I'll have to research a similar utility to measure queue times in Linux.

 

3 hours ago, rbjtech said:

You are also running a PersonNfoprovider refresh at the same time ?  I would schedule this for out of hours.

I think you're referring to the metadata scan event? If not, please elaborate. I currently scan my metadata folder nightly at 1:00am.

 

Link to comment
Share on other sites

rbjtech
6 minutes ago, ShadowKindjal said:

My emby server is behind an nginx reverse proxy

ok - worth checking this config to optimise the latency - - as in the back of my mind, I think this has a lot to do with it .. 

8 minutes ago, ShadowKindjal said:

Thank you for this insight. I ordered a gen 4 NVME SSD (WD SN850X) that I will be installing before the week is up. Hopefully that will resolve my bottleneck issues. In regards to my OS, my setup is Ubuntu 18.04 but my whole stack is configured with docker-compose. I'll have to research a similar utility to measure queue times in Linux.

The nvme on a different PCIe bus will undoubtably help - but unless you are surpassing the SATA3 bandwidth (for the current SSD) in your transcode streams + all the other workload on that SSD - then I'm not convinced.. 

Fingers crossed.

 

  • Like 1
Link to comment
Share on other sites

ShadowKindjal
15 minutes ago, rbjtech said:

ok - worth checking this config to optimise the latency - - as in the back of my mind, I think this has a lot to do with it .. 

A lazy man's way to diagnose this would be to access the server over the local IP if it acts up again, right?

15 minutes ago, rbjtech said:

The nvme on a different PCIe bus will undoubtably help - but unless you are surpassing the SATA3 bandwidth (for the current SSD) in your transcode streams + all the other workload on that SSD - then I'm not convinced.. 

Fingers crossed.

 

If this fixes it I will forever be grateful. Thanks again!

Link to comment
Share on other sites

ShadowKindjal

I haven't swapped out the SSD yet but I just wanted to mention that everything is taking about five seconds to load or longer and that the average queue length on my SSD is only 0.26 (the only nvme drive in the screenshot). Is there anything else that could be causing this issue. I've attached the latest server log.

 

Screenshot_20230112-211806.thumb.png.9e158eac2ab5409a8f1ea1d6708f3196.png

embyserver.txt

Link to comment
Share on other sites

ShadowKindjal

Accessing the server over local IP doesn't make a difference so it's not the reverse proxy. Here's a screen cap of the load times. And this was shorter than most. Typically all of these interactions are instant when the server isn't running slow.

 

Edited by ShadowKindjal
Link to comment
Share on other sites

ShadowKindjal
2 minutes ago, Happy2Play said:

@ShadowKindjal What us your library.db size and your Database cache size (App settings-Database)?

library.db is 995 megabytes and my database cache size is 96 megabytes 

Screenshot_20230112-221934.thumb.png.afeb48e991a86595daa82a25ff101253.pngScreenshot_20230112-221946.thumb.png.6b788ea109fe0fad47832650e9fcc889.png

Link to comment
Share on other sites

  • Solution
Happy2Play

@ShadowKindjal Best practice if you have the RAM is to set the DB cache size to 1.5 to 2 times the database size.

Edited by Happy2Play
  • Thanks 1
Link to comment
Share on other sites

ShadowKindjal
5 minutes ago, Happy2Play said:

@ShadowKindjal What us your library.db size and your Database cache size (App settings-Database)?

I'm noticing in other forum posts you've mentioned that the cache size should be 1.5 to 2 times the size of the database. Does that mean i need to increase mine to two gigs?

Link to comment
Share on other sites

ShadowKindjal
Just now, Happy2Play said:

@ShadowKindjal Best practice if you have the RAM is to set the DB cache size to 1.5 to 2 times the database size.

I have plenty of available RAM. I'll up it to 4GB if that won't cause any issues.

Link to comment
Share on other sites

Happy2Play
1 minute ago, ShadowKindjal said:

I'm noticing in other forum posts you've mentioned that the cache size should be 1.5 to 2 times the size of the database. Does that mean i need to increase mine to two gigs?

If you have the RAM yes that should help with performance and also when was the last time you vacuumed the database?

1 minute ago, ShadowKindjal said:

I have plenty of available RAM. I'll up it to 4GB if that won't cause any issues.

2 in theory should be plenty but don't know of any drawbacks of setting it to high.

Link to comment
Share on other sites

ShadowKindjal
7 minutes ago, Happy2Play said:

If you have the RAM yes that should help with performance and also when was the last time you vacuumed the database?

It's been at least a year.... Possibly longer..... 😅

It's a pretty big server with 140TB of data. 5000 movies and 1000 tv series. I'll vacuum my server in the morning.

  • Like 1
Link to comment
Share on other sites

Mnejing

I know you're running down the network rabbit hole, but quick question, are your drives SMR or CMR?

The reason I ask is because SMR drives are absolutely horrible in NAS situations, and are even worse at random reads (I'm assuming you're running a RAID array). They're generally fine with continuous reads, but if you've got multiple clients reading from the drive at the same time, those are all going to be random reads, and SMR drives can literally crawl to a halt. What's worse, the system still responds just fine.

I had this problem, and legitimately thought the HDD was dying. SMART wasn't showing any errors, and deep scans of the drive showed it was absolutely fine. Turns out it was an SMR drive. I just had to adjust how I worked with the drive, and it's fine.

Edited by Mnejing
Link to comment
Share on other sites

rbjtech
5 hours ago, ShadowKindjal said:

It's been at least a year.... Possibly longer..... 😅

It's a pretty big server with 140TB of data. 5000 movies and 1000 tv series. I'll vacuum my server in the morning.

@ShadowKindjal You marked as the solution - was it the database sizing ?

I helped fix somebody elses slow system for pictures the other day by suggesting this - increasing theirs to 2 Gbyte transformed the system.

@Happy2Play Is there a case of this needing to be dynamically increased as the size of the db grows ?   Do you you know why this analysis is not done on every emby boot ?  It would seem sensible to change this if RAM is available no ?

Link to comment
Share on other sites

ShadowKindjal
3 hours ago, rbjtech said:

@ShadowKindjal You marked as the solution - was it the database sizing ?

I helped fix somebody elses slow system for pictures the other day by suggesting this - increasing theirs to 2 Gbyte transformed the system.

That was a mis-click. I do believe the database sizing is the solution but I'd like to see if the problem reoccurs at any point over the next few days. I will say that my server feels snappier than it has ever been after increasing the cache to 4 gigabytes.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

rbjtech
32 minutes ago, ShadowKindjal said:

That was a mis-click. I do believe the database sizing is the solution but I'd like to see if the problem reoccurs at any point over the next few days. I will say that my server feels snappier than it has ever been after increasing the cache to 4 gigabytes.

It would be interesting to compare logs - if the latency has gone (which wouldn't surprise me - as there is no way those latency metrics were valid - the connection would simply time out with those values..) .. then it must have been including the sql write delay into the end-end calculation ... 🤔

Link to comment
Share on other sites

ShadowKindjal
11 hours ago, Happy2Play said:

If you have the RAM yes that should help with performance and also when was the last time you vacuumed the database?

Vacuuming the server dropped the size from 995 megabytes to 866.

Link to comment
Share on other sites

ShadowKindjal
1 hour ago, rbjtech said:

It would be interesting to compare logs - if the latency has gone (which wouldn't surprise me - as there is no way those latency metrics were valid - the connection would simply time out with those values..) .. then it must have been including the sql write delay into the end-end calculation ... 🤔

I'll get the logs tonight when more people are online

  • Like 1
Link to comment
Share on other sites

ShadowKindjal
10 hours ago, Mnejing said:

I know you're running down the network rabbit hole, but quick question, are your drives SMR or CMR?

All my drives are CMR. I get my drives from shucking WD Easystores or WD Elements. I have the following drives in my system.

WDC WD100EMAZ-00
WDC WD140EDGZ-11
WDC WD101EMAZ-11
WDC WD100EDAZ-11

Link to comment
Share on other sites

Mnejing

There's non-zero chance this drive is SMR: WDC WD100EDAZ-11. The rest are trivial to verify as CMR, and that's great, but it's really really hard to find anything about that last one in particular. If you're using the model numbers to confirm, it might be the wrong path, because *AZ models all seemed to be CMR until they weren't. https://community.wd.com/t/wd-blue-4tb-smr-or-cmr/271114. Obviously this isn't even kind of the same model. The real problem is there just doesn't seem to be much information about that particular SKU.

I wish there was an easier way for you to isolate that drive and test it, but I'm not personally convinced that drive isn't part of the issue.

Link to comment
Share on other sites

ShadowKindjal
9 minutes ago, Mnejing said:

There's non-zero chance this drive is SMR: WDC WD100EDAZ-11. The rest are trivial to verify as CMR, and that's great, but it's really really hard to find anything about that last one in particular. If you're using the model numbers to confirm, it might be the wrong path, because *AZ models all seemed to be CMR until they weren't. https://community.wd.com/t/wd-blue-4tb-smr-or-cmr/271114. Obviously this isn't even kind of the same model. The real problem is there just doesn't seem to be much information about that particular SKU.

I wish there was an easier way for you to isolate that drive and test it, but I'm not personally convinced that drive isn't part of the issue.

I ruled out the hard drives because load times on metadata and images stored on the SSD were affected by the slowdown. Typically, once you started streaming, you didn't have any issues until seeking. 

Link to comment
Share on other sites

Happy2Play
12 hours ago, rbjtech said:

@Happy2Play Is there a case of this needing to be dynamically increased as the size of the db grows ?   Do you you know why this analysis is not done on every emby boot ?  It would seem sensible to change this if RAM is available no ?

Devs would have to comment on if dynamically increasing this could be done.  But think there should be a write-up in KB or Tut/Guide section for db cache size and SqlLiteMMIO setting.

  • Agree 1
Link to comment
Share on other sites

ShadowKindjal
13 hours ago, rbjtech said:

It would be interesting to compare logs - if the latency has gone (which wouldn't surprise me - as there is no way those latency metrics were valid - the connection would simply time out with those values..) .. then it must have been including the sql write delay into the end-end calculation ... 🤔

I have 13 users actively streaming and it feels like the server is starting to slowdown a bit again. Not as bad as before but still noticeable. Disk queue times are normal and this happens when browsing the site locally and when browsing the site via the reverse proxy.

embyserver (1).txt

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...