JuJuJurassic 50 Posted 1 hour ago Posted 1 hour ago So yesterday evening my emby server just died, everything I did gave me a blue spinning circle. I spent hours on it then thought I'd try AI, used Claude code in a ssh session. It found the problem, effecivly a dos from an IP address, but no one was in. I think it may be a bad client, but in the end I just blocked it at the firewall. In finding it, Claude also suggested a few things, that i got it to put in a text file, including hints for the developers. Watch out Luke, Claude is after you So I've put the text file below for you to look at , hopefully it'll help if you're ever in my situation, but it does have a few modifications that woudl have stopped the problem for the developers. It's actually quite a bit faster now too! ----- ================================================================================ EMBY SERVER PERFORMANCE ISSUE: TV SHOWS LIBRARY HANGING / 100% CPU Emby Version: 4.9.3.0 OS: Ubuntu 20.04, Kernel 5.4.0-227-generic SQLite: 3.49.2 .NET: 8.0.22 ================================================================================ SYMPTOMS -------- - Emby web UI shows a spinning loading wheel; some pages never load - CPU pegged at 100% continuously on the server (one core fully saturated) - Emby API requests from some clients time out after 30 seconds - The TV Shows library in particular refuses to load / shows a loading spinner - Other libraries (Films, Music) may continue to work normally - Issue persists after restarting Emby; CPU immediately climbs back to 100% - `top` shows the EmbyServer process at or near 100% CPU with one .NET TP Worker thread rotating between LWPs at full utilisation HOW TO IDENTIFY THE SPECIFIC QUERY CAUSING IT ---------------------------------------------- 1. Check the Emby log (/var/lib/emby/logs/embyserver.txt) for lines like: Info SqliteItemRepository: Interrupt query due to cancellation. Info ItemsService-...: http/2 Response completed after client disconnected to <CLIENT_IP>. Time: 30050ms. GET https://<HOST>/emby/Users/.../Items? ...StartIndex=150&...SortBy=SortName&IncludeItemTypes=Series&ParentId=4... If you see this repeating every ~30 seconds from the same client IP, that client is stuck in a retry loop and is hammering one specific query. 2. Identify the hot thread on Linux: ps -L -p $(pgrep EmbyServer) -o lwp,pcpu,comm --sort=-pcpu | head -5 You will see a ".NET TP Worker" thread at 90-100% CPU. ROOT CAUSE ---------- The Emby server generates the following SQL for browsing TV Shows sorted by name when a client requests page 2 (StartIndex >= 150): WITH WithAncestors AS ( SELECT itemid FROM AncestorIds2 WHERE AncestorId=3 ) SELECT count(*) OVER() AS TotalRecordCount, A.Id, A.Name, A.SortName, ... , UserDatas.IsFavorite FROM MediaItems A LEFT JOIN ( SELECT AncestorIds2.ItemId FROM AncestorIds2 JOIN ItemLinks2 ON ItemLinks2.Type=4 AND ItemLinks2.LinkedId=<tag_id> AND ItemLinks2.ItemId=AncestorIds2.AncestorId ) itemlinksexcludeinheritedtagids ON itemlinksexcludeinheritedtagids.ItemId=A.Id LEFT JOIN UserDatas ON A.UserDataKeyId=UserDatas.UserDataKeyId AND UserDatas.UserId=<uid> WHERE A.Type=6 AND itemlinksexcludeinheritedtagids.itemid IS NULL AND A.Id IN WithAncestors GROUP BY A.PresentationUniqueKey ORDER BY A.SortName COLLATE NATURALSORT ASC LIMIT 500 OFFSET 150; The problem is the combination of: 1. GROUP BY on one column (PresentationUniqueKey) 2. ORDER BY on a DIFFERENT column (SortName) using the custom NATURALSORT collation Because GROUP BY and ORDER BY use different columns, SQLite cannot satisfy the ORDER BY using any index. It must sort ALL matching rows (~600 in a typical TV library) in a temporary B-tree using the NATURALSORT comparator. 3. NATURALSORT is a custom collation implemented as a .NET managed delegate. Every comparison SQLite makes during the sort crosses the native-to-managed boundary (a P/Invoke callback). With ~600 TV series, a quicksort requires approximately 5,500 comparisons. Each managed callback takes ~5ms under a loaded .NET runtime (due to string marshalling, GC pressure, etc.). 5,500 x 5ms = ~27 seconds, which exceeds the client's 30-second timeout. 4. The count(*) OVER() window function means SQLite cannot short-circuit even when enough rows have been returned — it must see all rows before it can compute the total count. This applies equally to page 1 and page 2. WHY IT APPEARS INTERMITTENTLY ------------------------------ The query has always been slow. It becomes a visible problem when a client app gets into a retry loop: - Client requests TV Shows page 2 (StartIndex=150) - Emby starts the SQLite query - After 30 seconds the client disconnects and retries immediately - Emby calls sqlite3_interrupt() to cancel the query, logs the timeout - The cycle repeats indefinitely, pinning one CPU core continuously - Other Emby operations become slow or unresponsive due to CPU starvation The trigger is usually: a client app (e.g. Infuse, Emby for Apple TV) being left open on the TV Shows grid page. When the app reconnects (e.g. after the TV wakes from sleep) it re-fetches the library list, hits page 2, and loops. Closing the client app or blocking the retrying client IP stops the CPU load immediately. Restarting Emby alone does NOT fix it because the client reconnects and resumes retrying within seconds. HOW TO CONFIRM IT IS THIS ISSUE --------------------------------- Run this on the server while the problem is active: # 1. Show the retrying requests in the log (should repeat every ~30s): tail -f /var/lib/emby/logs/embyserver.txt | grep "StartIndex=150.*Series" # 2. Identify the hot thread: watch -n2 "ps -L -p \$(pgrep EmbyServer) -o lwp,pcpu,comm --sort=-pcpu | head -6" # 3. Confirm the query is slow in isolation (takes ~0.1s without NATURALSORT): sudo -u emby sqlite3 /var/lib/emby/data/library.db ".timer on SELECT COUNT(*) FROM MediaItems WHERE Type=6;" # Should return in <1ms. If SQLite itself is slow, check disk/IO instead. # 4. Check whether the client has stopped retrying after blocking/closing: tail -20 /var/lib/emby/logs/embyserver.txt | grep "Interrupt query" # Should show no new entries once the client is gone. WHAT DOES NOT FIX IT --------------------- - Restarting Emby (client reconnects and retries) - Adding standard SQLite indexes on (Type, SortName) — SQLite cannot use a BINARY-collated index for an ORDER BY that specifies NATURALSORT - Adding a NATURALSORT-collated index on (Type, SortName) — the GROUP BY on a different column forces a temporary sort table, bypassing the index - Removing or adding plugins (the issue is in the core SQL query builder) IMMEDIATE WORKAROUND --------------------- Block the retrying client at your firewall or router (not iptables on the Emby server — the client will still reach Emby). The IP in our case was the Apple TV / Infuse client. Once blocked, CPU drops immediately to normal. Alternatively: close/force-quit the client app that is stuck retrying. SUGGESTED FIX (for Emby developers) ------------------------------------- 1. The SQL query builder should avoid generating GROUP BY + ORDER BY on different columns simultaneously. If deduplication is needed, consider using a subquery or ROW_NUMBER() window function so that ORDER BY can operate on an indexed column using the correct collation. 2. Consider implementing NATURALSORT as a native C function registered via sqlite3_create_collation_v2() rather than as a managed .NET delegate. This would eliminate the P/Invoke callback overhead on every sort comparison, potentially reducing query time from ~30 seconds to ~100ms for a library of 600 series. 3. Consider adding a query result cache for library browse queries. The TV Shows list changes infrequently; even a 60-second cache would serve all retry attempts from a cached result and completely eliminate the CPU spike. 4. The 30-second database query timeout (cancelled via sqlite3_interrupt when the HTTP client disconnects) causes no data loss, but the retry loop it creates can make the server unusable. An exponential backoff on the client side, or a server-side per-client rate limit on identical requests, would prevent the retry storm. SERVER CHANGES MADE DURING DIAGNOSIS -------------------------------------- The following were changed on this server and should be noted if you are helping reproduce or reverting the issue: - /etc/systemd/system/emby-server.service.d/override.conf Added: After=network-online.target remote-fs.target, Wants=network-online.target Reason: Emby was starting before SMB/CIFS shares were mounted. - /etc/fstab Added nofail,x-systemd.device-timeout=30 to one share with an unreachable host; fixed a missing trailing zero on another line. - /var/lib/emby/data/library.db VACUUM was run (database shrank from 2.0GB to 1.4GB, 600MB freed). Two indexes added (these do not harm but also do not fix the root issue): CREATE INDEX idx_MediaItems_type_sortname ON MediaItems(Type, SortName); CREATE INDEX idx_MediaItems_type_topparent_sortname ON MediaItems(Type, TopParentId, SortName); CREATE INDEX idx_MediaItems_type_naturalsort_sortname ON MediaItems(Type, SortName COLLATE NATURALSORT); ANALYZE run on MediaItems. - Scheduled task trigger files (StartupTrigger removed from 3 tasks): /var/lib/emby/config/ScheduledTasks/6330ee8f-*.js (Scan media library) /var/lib/emby/config/ScheduledTasks/66ff02a8-*.js (Scan Metadata Folder) /var/lib/emby/config/ScheduledTasks/9492d30c-*.js (Refresh Guide/EPG) Reason: All three heavy tasks were firing simultaneously on every restart, causing additional CPU load on top of the query issue. ================================================================================ emby_performance_issue_forum.txt
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now