Jump to content

Dos Attack / Hacker? Claude AI Fix.


Recommended Posts

JuJuJurassic
Posted

So yesterday evening my emby server just died, everything I did gave me a blue spinning circle. 

I spent hours on it then thought I'd try AI, used Claude code in a ssh session. It found the problem, effecivly a dos from an IP address, but no one was in. I think it may be a bad  client, but in the end I just blocked it at the firewall. In finding it, Claude also suggested a few things, that i got it to put in a text file, including hints for the developers. Watch out Luke, Claude is after you 🙂

So I've put the text file below for you to look at , hopefully it'll help if you're ever in my situation, but it does have a few modifications that woudl have stopped the problem for the developers.

It's actually quite a bit faster now too!

-----

================================================================================
EMBY SERVER PERFORMANCE ISSUE: TV SHOWS LIBRARY HANGING / 100% CPU
Emby Version: 4.9.3.0
OS: Ubuntu 20.04, Kernel 5.4.0-227-generic
SQLite: 3.49.2
.NET: 8.0.22
================================================================================

SYMPTOMS
--------
- Emby web UI shows a spinning loading wheel; some pages never load
- CPU pegged at 100% continuously on the server (one core fully saturated)
- Emby API requests from some clients time out after 30 seconds
- The TV Shows library in particular refuses to load / shows a loading spinner
- Other libraries (Films, Music) may continue to work normally
- Issue persists after restarting Emby; CPU immediately climbs back to 100%
- `top` shows the EmbyServer process at or near 100% CPU with one .NET TP Worker
  thread rotating between LWPs at full utilisation

HOW TO IDENTIFY THE SPECIFIC QUERY CAUSING IT
----------------------------------------------
1. Check the Emby log (/var/lib/emby/logs/embyserver.txt) for lines like:

   Info SqliteItemRepository: Interrupt query due to cancellation.
   Info ItemsService-...: http/2 Response completed after client disconnected
   to <CLIENT_IP>. Time: 30050ms. GET https://<HOST>/emby/Users/.../Items?
   ...StartIndex=150&...SortBy=SortName&IncludeItemTypes=Series&ParentId=4...

   If you see this repeating every ~30 seconds from the same client IP, that
   client is stuck in a retry loop and is hammering one specific query.

2. Identify the hot thread on Linux:
   ps -L -p $(pgrep EmbyServer) -o lwp,pcpu,comm --sort=-pcpu | head -5
   You will see a ".NET TP Worker" thread at 90-100% CPU.

ROOT CAUSE
----------
The Emby server generates the following SQL for browsing TV Shows sorted by
name when a client requests page 2 (StartIndex >= 150):

  WITH WithAncestors AS (
      SELECT itemid FROM AncestorIds2 WHERE AncestorId=3
  )
  SELECT count(*) OVER() AS TotalRecordCount,
         A.Id, A.Name, A.SortName, ... , UserDatas.IsFavorite
  FROM MediaItems A
  LEFT JOIN (
      SELECT AncestorIds2.ItemId FROM AncestorIds2
      JOIN ItemLinks2 ON ItemLinks2.Type=4
        AND ItemLinks2.LinkedId=<tag_id>
        AND ItemLinks2.ItemId=AncestorIds2.AncestorId
  ) itemlinksexcludeinheritedtagids
    ON itemlinksexcludeinheritedtagids.ItemId=A.Id
  LEFT JOIN UserDatas
    ON A.UserDataKeyId=UserDatas.UserDataKeyId AND UserDatas.UserId=<uid>
  WHERE A.Type=6
    AND itemlinksexcludeinheritedtagids.itemid IS NULL
    AND A.Id IN WithAncestors
  GROUP BY A.PresentationUniqueKey
  ORDER BY A.SortName COLLATE NATURALSORT ASC
  LIMIT 500 OFFSET 150;

The problem is the combination of:

  1. GROUP BY on one column (PresentationUniqueKey)
  2. ORDER BY on a DIFFERENT column (SortName) using the custom NATURALSORT
     collation

  Because GROUP BY and ORDER BY use different columns, SQLite cannot satisfy
  the ORDER BY using any index. It must sort ALL matching rows (~600 in a
  typical TV library) in a temporary B-tree using the NATURALSORT comparator.

  3. NATURALSORT is a custom collation implemented as a .NET managed delegate.
     Every comparison SQLite makes during the sort crosses the native-to-managed
     boundary (a P/Invoke callback). With ~600 TV series, a quicksort requires
     approximately 5,500 comparisons. Each managed callback takes ~5ms under a
     loaded .NET runtime (due to string marshalling, GC pressure, etc.).
     5,500 x 5ms = ~27 seconds, which exceeds the client's 30-second timeout.

  4. The count(*) OVER() window function means SQLite cannot short-circuit even
     when enough rows have been returned — it must see all rows before it can
     compute the total count. This applies equally to page 1 and page 2.

WHY IT APPEARS INTERMITTENTLY
------------------------------
The query has always been slow. It becomes a visible problem when a client
app gets into a retry loop:

  - Client requests TV Shows page 2 (StartIndex=150)
  - Emby starts the SQLite query
  - After 30 seconds the client disconnects and retries immediately
  - Emby calls sqlite3_interrupt() to cancel the query, logs the timeout
  - The cycle repeats indefinitely, pinning one CPU core continuously
  - Other Emby operations become slow or unresponsive due to CPU starvation

The trigger is usually: a client app (e.g. Infuse, Emby for Apple TV) being
left open on the TV Shows grid page. When the app reconnects (e.g. after the
TV wakes from sleep) it re-fetches the library list, hits page 2, and loops.

Closing the client app or blocking the retrying client IP stops the CPU load
immediately. Restarting Emby alone does NOT fix it because the client
reconnects and resumes retrying within seconds.

HOW TO CONFIRM IT IS THIS ISSUE
---------------------------------
Run this on the server while the problem is active:

  # 1. Show the retrying requests in the log (should repeat every ~30s):
  tail -f /var/lib/emby/logs/embyserver.txt | grep "StartIndex=150.*Series"

  # 2. Identify the hot thread:
  watch -n2 "ps -L -p \$(pgrep EmbyServer) -o lwp,pcpu,comm --sort=-pcpu | head -6"

  # 3. Confirm the query is slow in isolation (takes ~0.1s without NATURALSORT):
  sudo -u emby sqlite3 /var/lib/emby/data/library.db ".timer on
  SELECT COUNT(*) FROM MediaItems WHERE Type=6;"
  # Should return in <1ms. If SQLite itself is slow, check disk/IO instead.

  # 4. Check whether the client has stopped retrying after blocking/closing:
  tail -20 /var/lib/emby/logs/embyserver.txt | grep "Interrupt query"
  # Should show no new entries once the client is gone.

WHAT DOES NOT FIX IT
---------------------
- Restarting Emby (client reconnects and retries)
- Adding standard SQLite indexes on (Type, SortName) — SQLite cannot use a
  BINARY-collated index for an ORDER BY that specifies NATURALSORT
- Adding a NATURALSORT-collated index on (Type, SortName) — the GROUP BY on a
  different column forces a temporary sort table, bypassing the index
- Removing or adding plugins (the issue is in the core SQL query builder)

IMMEDIATE WORKAROUND
---------------------
Block the retrying client at your firewall or router (not iptables on the
Emby server — the client will still reach Emby). The IP in our case was the
Apple TV / Infuse client. Once blocked, CPU drops immediately to normal.

Alternatively: close/force-quit the client app that is stuck retrying.

SUGGESTED FIX (for Emby developers)
-------------------------------------
1. The SQL query builder should avoid generating GROUP BY + ORDER BY on
   different columns simultaneously. If deduplication is needed, consider
   using a subquery or ROW_NUMBER() window function so that ORDER BY can
   operate on an indexed column using the correct collation.

2. Consider implementing NATURALSORT as a native C function registered via
   sqlite3_create_collation_v2() rather than as a managed .NET delegate. This
   would eliminate the P/Invoke callback overhead on every sort comparison,
   potentially reducing query time from ~30 seconds to ~100ms for a library
   of 600 series.

3. Consider adding a query result cache for library browse queries. The TV
   Shows list changes infrequently; even a 60-second cache would serve all
   retry attempts from a cached result and completely eliminate the CPU spike.

4. The 30-second database query timeout (cancelled via sqlite3_interrupt when
   the HTTP client disconnects) causes no data loss, but the retry loop it
   creates can make the server unusable. An exponential backoff on the client
   side, or a server-side per-client rate limit on identical requests, would
   prevent the retry storm.

SERVER CHANGES MADE DURING DIAGNOSIS
--------------------------------------
The following were changed on this server and should be noted if you are
helping reproduce or reverting the issue:

  - /etc/systemd/system/emby-server.service.d/override.conf
    Added: After=network-online.target remote-fs.target, Wants=network-online.target
    Reason: Emby was starting before SMB/CIFS shares were mounted.

  - /etc/fstab
    Added nofail,x-systemd.device-timeout=30 to one share with an
    unreachable host; fixed a missing trailing zero on another line.

  - /var/lib/emby/data/library.db
    VACUUM was run (database shrank from 2.0GB to 1.4GB, 600MB freed).
    Two indexes added (these do not harm but also do not fix the root issue):
      CREATE INDEX idx_MediaItems_type_sortname ON MediaItems(Type, SortName);
      CREATE INDEX idx_MediaItems_type_topparent_sortname
        ON MediaItems(Type, TopParentId, SortName);
      CREATE INDEX idx_MediaItems_type_naturalsort_sortname
        ON MediaItems(Type, SortName COLLATE NATURALSORT);
    ANALYZE run on MediaItems.

  - Scheduled task trigger files (StartupTrigger removed from 3 tasks):
      /var/lib/emby/config/ScheduledTasks/6330ee8f-*.js  (Scan media library)
      /var/lib/emby/config/ScheduledTasks/66ff02a8-*.js  (Scan Metadata Folder)
      /var/lib/emby/config/ScheduledTasks/9492d30c-*.js  (Refresh Guide/EPG)
    Reason: All three heavy tasks were firing simultaneously on every restart,
    causing additional CPU load on top of the query issue.

================================================================================

emby_performance_issue_forum.txt

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...