Jump to content

The UI is soooo slow... until it isn't?


randywest55

Recommended Posts

randywest55
Long post, meat and logs at the end.
 
My setup:
* Emby v3.2.26.0 (lifetime premier)
* QNAP TS-251+, Intel Celeron Quad-Core 2.0GHz, 2GB DDR3L RAM, 2x WD Red 4TB in RAID 1
* Library:
** 879 movies + 33 shorts
** 5143 TV episodes
** 21238 songs
* Plugins:
** Backup & Restore
** Trailers
 
Related (maybe):
 
A little background, first: I came over from Plex about six months ago, because I wanted Quick Sync support. Basically, Plex did everything it was supposed to: the UI was responsive to multiple clients (usually no more than 2-3, which was all I had hoped for on my hardware), and stream capability was likewise acceptable for a small number of simultaneous users. However, whenever Plex had to transcode something, it would start chugging badly, and it didn't seem to always make the most intelligent decisions about when to transcode, either (by the way, among its many wonderful features, I love Emby's stats for nerds in the playback screen, which answers the "why" question [and more!]).
 
I eventually got fed up and decided to cross over. Initially, Emby was a delight: The UI is miles better than Plex's, and the dashboard and metadata browser allow for actual administration, something that Plex barely exposes. Emby's file identification is pretty significantly worse than Plex's, but after I bit the bullet and put in the time to manually correct everything (I'll never get around to this for all my music, but oh well), the better overall experience in Emby was worth it.
 
However, and this seemed to creep up and then get way worse once I endeavored to add my music collection, it soon became clear that I'd traded in for the inverse of the Plex experience: Once Emby is playing something, it's super fast, even when transcoding, up to a limit that makes sense given my hardware, but the UI crawls almost all the time, even when nothing else is going on. It's worth emphasizing "almost": sometimes, everything is really snappy. The bottleneck is in the database (more on that in a minute), so this is probably caching behavior, but anecdotally, new queries, e.g. a person page I've never looked at before, are sometimes very quick, so at the very least, query-level cache misses aren't the problem (I know very little about Sqlite, so apologies if that's a naive statement).
 
Before I get into more details, the reason I spent time above outlining my Plex v. Emby experiences is this: Plex, to my knowledge, also uses Sqlite, and with an equivalent library on an identical and identically loaded machine, it never had any of the UI performance issues that I see with Emby. So please, consider this point before passing this off as a problem of underpowered hardware that can't be helped. The competition works, so clearly, Emby can do better.
 
Now, onto the details. First off, machine load observed when the UI is hanging. My server runs no other significant applications aside from Emby:
 
* CPU - very low, single digits to practically idle.
* Memory - no uptick from baseline of about 35% usage. Rarely gets near 50% even when transcoding. Note application usage is ~350MB for Emby, about the same for system processes, and negligible otherwise. I'll also note for completeness that my swap usage seems to get pretty high after significant uptime (unknown why), but the issues described herein exhibit even when the machine has been freshly restarted and swap is at 0.
* Disk - uptick in throughput and IOPS, latency remains negligible. Throughput is negligible (< 2MB/s in a small sample of cases), but IOPS peaks around 200, which according e.g. to http://www.storagereview.com/wd_red_4tb_hdd_review_wd40efrx is around peak performance for WD Red 4TB drives.
 
So our bottleneck is pretty clear, I'd say (IOPS). Two things stand out from the threads referenced above: 1) SSD installations and cache drives (not happening, please don't suggest), 2) and manually rebuilding DB indices. The latter seems like a reasonable suggestion, actually, but I have an embarrasing admission: I don't have SSH access to my box. I lost it somehow in a QNAP update a while back for unknown reasons, and since the box is headless, I'm not sure there's an alternative to wiping it, which I won't do since I don't have a backup mechanism outside of RAID. So, yea... my hands are a little tied for administration tasks that I can't do through Emby directly. I'll also say that I'd really prefer not to have to reinstall Emby from scratch, either, since manually fixing misidentified metadata is a pretty hefty task for my library. Hopefully that doesn't put too much of a wrench in figuring this out...
 
What I have done on my own is poked around a few days worth of debug logs and tried to see if the DB was doing anything obviously stupid (by looking at a fresh DB on my Windows machine in DB Browser for SQLite). I did find a few queries doing joins/filtering on unindexed columns (see the attached below), and it seems like searching just pounds the DB with a bunch of queries and could probably be optimized. That said, there are plenty of other queries operating on appropriate indices that still take forever some or all of the time, so the only thing I can think of there is slimming the set of selected columns, which probably isn't always an option, if it would even help.
 
With that, I'll leave it to the experts to ask questions and dig through the logs. Thanks in advance, and apologies if I happen to be slow replying.

 

query_analysis.txt - My notes from going through the attached logs. Some comments are "yea, that looks good," so may or may not be useful.

server-63648719999.emby-debug-log.escalating.txt - A day when it seems like things got progressively worse.

server-63648806400.emby-debug-log.txt - The next day. Things are pretty bad throughout.

Link to comment
Share on other sites

randywest55

Thanks for the reply. I'm a bit worried about migrating to the QNAP community-independent version for reasons mentioned in my post, but I'll upgrade though the community path ASAP. Note this has been a persistent issue for some time, though. Are there relevant fixes in the latest version?

Link to comment
Share on other sites

randywest55

Also, I suppose I could do a parallel install of the .NET Core version to test, if you think that would yield improvement, e.g. if the SQLite interface is somehow optimized there. Let me know your thoughts.

Link to comment
Share on other sites

It probably won't be radically different but i think the .net core version will have 1-2 fewer columns in the items table and will be on the latest version of sqlite. i didn't catch what version the mono install is running. it would be near the beginning of a server log when starting up the server. but sqlite has seen some nice performance improvements over the last year.

Link to comment
Share on other sites

randywest55

DB rebuild takes a long time. Probably won't be ready to report back on the .NET core version until Friday. I did upgrade the mono version to 3.2.60.0, but there was no improvement.

Link to comment
Share on other sites

randywest55

Mono SQLite is at 3.14.0.

 

So overall, the .NET core version experience is much better. It's worth noting, though, that it's unclear how much of this can be attributed to newer SQLite/ditching mono versus the DB being all fresh and newly build. As I mentioned in the original post, the perf issues definitely crept up when I first started using Emby, so it's possible that rebuilding the DB indices (as someone suggested elsewhere for a similar problem) might have yielded some improvement on the mono version.

 

In any event, now that I've gone through the reinstall process, I think it makes sense to focus on the .NET core version. It's definitely snappier, but far from perfect. I'm still seeing things like the home screen sometimes not completely loading (next up, latest, etc. fail to appear even after several minutes), and when this happens, other endpoints become unresponsive for a bit, presumably because there's a queue of queries running in background. Search is also still pretty spotty (it often gets stuck on results from the first letter or two of the query or fails to return anything outright). 

 

Going from here, I'm going to turn debug logging on for the new server and use it for a while to see what crops up in the logs.

Link to comment
Share on other sites

randywest55

Huh, so I'm seeing "EnableDebugLevelLogging":false at the top of my logs even though the Enable debug logging box is ticked on the dashboard. Possible bug?

Link to comment
Share on other sites

randywest55

New log files were written since I ticked the box. And besides which, there's no debug level output in the log...

 

I restarted the app and got it to work, though. I see debug output now as well. Maybe something to look into, but I guess I'm good now.

Link to comment
Share on other sites

randywest55

Here's a log with some slower queries exhibiting again: server-63649576061.txt

 

Honestly, I'm pretty happy with the performance I've seen so far. It's not instant, but it seems very reasonable on my hardware, more or less like what I remember from using Plex. I'm not *really* using the app right now though, just clicking through menus and playing things in multiple tabs from time to time trying to get it to trip up, which it barely has. I guess the main difference between now and the old installation, aside from an overall increase in speed, seems to be that when something does take a long time to load, it isn't followed by a chain of long loads. E.g. maybe the home screen takes a long time to load, and I have to wait for that to clear up. Once it does, though, I can navigate freely around the graph without experiencing another hangup for some time, if at all. Before, if something was slow, it usually meant that any subsequent navigation was also going to be slow, so it was this painful series of click, wait, click, pray, refresh after waiting three minutes for a page to load, etc. to get anywhere. I'm also getting page structure quickly almost all the time now, so even if it might take 20-30 seconds to serve every image, I get text instantly and an indication of progress (images populating) instead of the rainbow spinner over a blank page taunting me to lose my patience and refresh the page.

 

Please do look at the slow queries in the attached to see if any opportunity for improvement jumps out. For my part, I guess I'm tentatively happy with the way things are working now, but I'm always suspicious when long-standing, consistently occurring problems clear up without a targeted fix. I guess the latest SQLite and .NET core, to a lesser degree, kind of explains the improvement, but so could a lot of other things, namely index fragmentation over time.

 

So how about this: I'll use my new installation normally for a few days, maybe a week, and then report back one way or the other (sooner if things go off the cliff again). If everything's still good, I suppose we can close the thread, keeping my fingers crossed that the issue won't creep back over a few months.

Link to comment
Share on other sites

Thanks. I have removed the redundant index you mentioned and adding UserDataKey to some of the existing indexes is a good idea, although not easy to do because some of them are fairly complex. But at some point I will look at that. And in general we are gradually eliminating as many excess columns as possible.

Link to comment
Share on other sites

randywest55

So search is still pretty problematic due to the large number of queries involved. Could you put a significant delay, at least a second, on the time between keyboard input and getting the query, and maybe get immediately on pressing enter? I'm sure practically instantaneous results by keypress is nice given sufficient hardware, but I doubt most users would notice a second delay. If you're worried, maybe make it configurable somewhere in the advanced options? Or, and this is obviously a lot more work, but I could see this applying to a variety of settings, currently configurable or otherwise, so there could be a hardware profile, similar to what you might see e.g. in a game's graphics settings, that provides some presets (low, medium, high, etc.) and then allows for setting by setting custom configuration. Hell, add in auto detection even. Tons of work, and I'm sure your plate is always overflowing, but food for thought. Please, though, do seriously considering the search change, if nothing else, as it would significantly increase the usability of my current setup.

Link to comment
Share on other sites

randywest55

server-63649584000.2017-12-23 22:36:18.356.txt

 

Here's a specific case that I've seen a few times. Start at 2017-12-23 22:36:18.356. I open the web app for the first time in a many hours, and the server is idle. It hangs on the home screen and appears unlikely to recover, so I click into a library, which also fails to load. Back out to the home screen and it appears to have lost my server entirely, until finally, a little over a minute afterward, queries start to return. After that, performance is acceptable everywhere, but the start up from idle cost is huge, for some reason.

 

Unrelated to the rest of the thread, but Configuration Backup is failing on the new installation (see the stack trace near the top of the log). I can report that separately if it makes it easier for you to track.

Link to comment
Share on other sites

randywest55

Unfortunately, it looks like slow start from idle is a consistent thing now. There are two cases, which are probably the same case at some level:

 

1) The server has been sitting idle for some time and then is accessed.

2) A video has been playing, but there hasn't been any other activity for a bit. Then, the UI is accessed, either while the video is playing from a different client, or immediately after the video finishes.

 

I'm not sure I mentioned before, but I was definitely noticing these cases in particular on the old installation. The difference remains, for now, that once the initial wait (usually a minute or so) has passed on the new installation, UI navigation picks up to an acceptable speed, whereas any navigation was rarely fast enough to be deemed acceptable on the old. However, since the issue did appear to creep before and is demonstrably creeping now, it wouldn't surprise me if I'm back to everything being slow before long.

 

The important question would seem to be, given that my library has changed only insignificantly since the initial scan completed, what *has* changed? I believe I switched chapter image extraction on for the video libraries in the interim, so perhaps entries for those blew up the DB. I may also have turned image prefetch, or whatever the setting is, on after the fact, and regardless, just doing a lot of navigating around for testing must have forced a lot of image caching, especially for people. And then there's the full library scan scheduled task, which defaulted to running every 12 hours (I dialed it way back yesterday, since all my libraries use realtime monitoring). If that's leaving some unintended junk around, it might explain the creeping degradation.

 

Hopefully there's something here to work with. It's disappointing that I seem to be headed for the same bad state only a few days after a fresh install, but the nature of the problem does seem more weep-defined, at least. Best for the holidays, hope to hear back once they're passed.

Link to comment
Share on other sites

Sorry, haven't read the whole thread but slowness after idle periods sounds a lot like sleeping hardware.  Have you investigated that?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...