mojavaid36 1 Posted December 1, 2025 Posted December 1, 2025 Hi everyone, I wanted to share a solution I built with a lot of help from ChatGPT. I’m not a programmer, but I needed a smart, automatic way to clean up old movies and TV episodes on my Emby server. Since I couldn’t find anything that did what I wanted, I ended up putting together a Docker-based pruning script. It’s been working really well for me, so I’m posting it here in case it helps someone else. What This Does This script automatically deletes media based on real viewing activity across all users on the server. The retention values below are fully customisable – see the Environment Variables section. Movies Deleted if no user has watched them in 120 days Or if never watched, deleted only if older than 120 days TV Episodes If played at least once → delete after 90 days with no recent activity If never played → delete only if older than 180 days The script talks to Emby via the API and calls DELETE /Items/{Id}, so items are removed cleanly from the library and the filesystem. No orphaned metadata. Why I Built It I share my Emby with multiple users Storage was growing fast I wanted cleanup based on what people actually watch, not just file dates I wanted different rules for movies and TV And I’m not a programmer, so it needed to be simple and repeatable This solution is: Docker-based (works nicely on Unraid (My Setup), Synology, Linux Docker, etc.) Configurable via environment variables Safe to test using dry-run mode Setup Overview I’ll show the steps using a Linux/Unraid style setup with nano, but the same idea works on any box with Docker. Folder structure Create a folder for the script and Dockerfile, for example: mkdir -p /mnt/user/appdata/emby-prune cd /mnt/user/appdata/emby-prune Step 1 – Create the Dockerfile (with nano) In a terminal on your server: cd /mnt/user/appdata/emby-prune nano Dockerfile Paste this into nano: FROM python:3.12-alpine WORKDIR /app RUN pip install --no-cache-dir requests COPY emby_prune.py /app/emby_prune.py CMD ["python", "/app/emby_prune.py"] Save & exit: Ctrl + O, Enter to save Ctrl + X to exit Step 2 – Create the Python Script (with nano) Still in the same folder: nano emby_prune.py Paste the entire script below into nano: import os import sys import requests from datetime import datetime, timedelta, timezone def parse_iso(dt_str): if not dt_str: return None # Emby uses ISO 8601 with offset, e.g. 2024-02-10T12:34:56.0000000+00:00 dt_str = dt_str.replace("Z", "+00:00") try: dt = datetime.fromisoformat(dt_str) # Ensure timezone-aware; assume UTC if none if dt.tzinfo is None: dt = dt.replace(tzinfo=timezone.utc) return dt except Exception: return None def get_env(name, default=None, required=False): value = os.getenv(name, default) if required and not value: print(f"Missing required env var: {name}", file=sys.stderr) sys.exit(1) return value def fetch_users(session, emby_url, api_key): """Fetch all Emby users.""" params = {"api_key": api_key} r = session.get(f"{emby_url}/Users", params=params, timeout=30) r.raise_for_status() users = r.json() print(f"Found {len(users)} users in Emby.") return users def fetch_user_item_userdata(session, emby_url, api_key, user_id, item_id): """Fetch per-user item data (UserData) for given user and item.""" params = { "api_key": api_key, "Fields": "UserData" } r = session.get(f"{emby_url}/Users/{user_id}/Items/{item_id}", params=params, timeout=30) r.raise_for_status() data = r.json() return data.get("UserData") or {} def main(): emby_url = get_env("EMBY_URL", required=True) # e.g. http://emby:8096/emby api_key = get_env("EMBY_API_KEY", required=True) # Defaults: movies 120d, TV played 90d, TV never-played 180d movie_days = int(get_env("MOVIE_PRUNE_DAYS", get_env("PRUNE_DAYS", "120"))) tv_days = int(get_env("TV_PRUNE_DAYS", "90")) tv_never_played_days = int(get_env("TV_NEVER_PLAYED_DAYS", "180")) dry_run = get_env("DRY_RUN", "true").lower() == "true" now_utc = datetime.now(timezone.utc) cutoff_movies = now_utc - timedelta(days=movie_days) cutoff_tv = now_utc - timedelta(days=tv_days) cutoff_tv_never = now_utc - timedelta(days=tv_never_played_days) print(f"Emby prune starting (movies + TV, any-user mode)") print(f" URL : {emby_url}") print(f" Movie cutoff : {cutoff_movies.isoformat()} (older than {movie_days} days)") print(f" TV cutoff (played) : {cutoff_tv.isoformat()} (older than {tv_days} days)") print(f" TV cutoff (never played): {cutoff_tv_never.isoformat()} (older than {tv_never_played_days} days)") print(f" Dry run : {dry_run}") print("") session = requests.Session() base_params = {"api_key": api_key} # 1) Get all users try: users = fetch_users(session, emby_url, api_key) except Exception as e: print(f"ERROR: Failed to fetch users: {e}", file=sys.stderr) sys.exit(1) if not users: print("No users found – aborting.") sys.exit(0) page_size = 200 start_index = 0 total_deleted = 0 total_candidates = 0 while True: # 2) Get movies + episodes params = { **base_params, "Recursive": "true", "IncludeItemTypes": "Movie,Episode", "Fields": "Path,DateCreated,Type,SeriesName,SeasonName,IndexNumber,ParentIndexNumber", "StartIndex": start_index, "Limit": page_size, } r = session.get(f"{emby_url}/Items", params=params, timeout=60) r.raise_for_status() data = r.json() items = data.get("Items", []) total_records = data.get("TotalRecordCount", 0) if not items: break for item in items: item_id = item.get("Id") name = item.get("Name", "Unknown") path = item.get("Path") date_created = parse_iso(item.get("DateCreated")) item_type = item.get("Type", "") # For episodes, we can build a bit more context (optional) series_name = item.get("SeriesName") season_num = item.get("ParentIndexNumber") episode_num = item.get("IndexNumber") # 3) Look at all users' UserData for this item most_recent_play = None total_plays_all_users = 0 for user in users: user_id = user.get("Id") user_name = user.get("Name", "Unknown") try: ud = fetch_user_item_userdata(session, emby_url, api_key, user_id, item_id) except Exception as e: print(f"Warning: failed to fetch UserData for item {item_id} and user {user_name}: {e}") continue lp = parse_iso(ud.get("LastPlayedDate")) pc = ud.get("PlayCount", 0) or 0 total_plays_all_users += pc if lp: if (most_recent_play is None) or (lp > most_recent_play): most_recent_play = lp stale = False reason = "" if item_type == "Movie": # Movie rule: stale if no plays in movie_days if most_recent_play: if most_recent_play < cutoff_movies: stale = True reason = f"movie: last played by some user at {most_recent_play.isoformat()}" else: if date_created and date_created < cutoff_movies: stale = True reason = f"movie: never played by any user; created {date_created.isoformat()}" elif item_type == "Episode": # TV rule: # - if played: stale if last play older than tv_days # - if never played: stale only if older than tv_never_played_days if most_recent_play: if most_recent_play < cutoff_tv: stale = True reason = f"episode: last played by some user at {most_recent_play.isoformat()}" else: if date_created and date_created < cutoff_tv_never: stale = True reason = f"episode: never played by any user; created {date_created.isoformat()}" # Unknown type: skip if not stale: continue total_candidates += 1 # Nicely formatted label for episodes if item_type == "Episode" and series_name: if season_num is not None and episode_num is not None: label = f"{series_name} S{season_num:02d}E{episode_num:02d} - {name}" else: label = f"{series_name} - {name}" else: label = name print(f"[STALE] {label}") print(f" Type : {item_type}") print(f" ID : {item_id}") print(f" Path : {path}") print(f" Reason : {reason}") print(f" Total plays (all users) : {total_plays_all_users}") if dry_run: print(" Action : SKIP (dry-run)\n") continue del_params = {"api_key": api_key} del_url = f"{emby_url}/Items/{item_id}" try: resp = session.delete(del_url, params=del_params, timeout=60) if resp.status_code in (200, 204): total_deleted += 1 print(" Action : DELETED\n") else: print(f" Action : FAILED (status {resp.status_code})\n") except Exception as e: print(f" Action : FAILED ({e})\n") start_index += len(items) if start_index >= total_records: break print("") print(f"Scan complete.") print(f" Stale candidates : {total_candidates}") print(f" Deleted : {total_deleted} (dry_run={dry_run})") if __name__ == "__main__": main() Save & exit: Ctrl + O, Enter Ctrl + X Step 3 – Build the Docker Image From the same folder: cd /mnt/user/appdata/emby-prune docker build -t emby-prune . You should see something like: Successfully built ... Successfully tagged emby-prune:latest Environment Variables (Retention Rules) These control how aggressive the pruning is: Variable Meaning Default MOVIE_PRUNE_DAYS Movies: del after X days inactivity 120 TV_PRUNE_DAYS TV: del after X days if episode was played 90 TV_NEVER_PLAYED_DAYS TV: delete after X days if never played 180 DRY_RUN Preview only (true / false) true EMBY_URL/EMBY_API_KEY Your Emby server base URL/API http://server:8096/emby Generate an API Key Step 4 – Test with a Dry Run (Safe Mode) This does not delete anything, just prints what would be removed: docker run --rm \ --name emby-prune-test \ -e EMBY_URL="http://YOURSERVER:8096/emby" \ -e EMBY_API_KEY="YOUR_API_KEY" \ -e MOVIE_PRUNE_DAYS="120" \ -e TV_PRUNE_DAYS="90" \ -e TV_NEVER_PLAYED_DAYS="180" \ -e DRY_RUN="true" \ emby-prune Check the output Make sure the stuff marked [STALE] looks sensible Step 5 – Run for Real (Optional) Once you’re happy with the dry-run output, you can run it for real: docker run --rm \ --name emby-prune-run \ -e EMBY_URL="http://YOURSERVER:8096/emby" \ -e EMBY_API_KEY="YOUR_API_KEY" \ -e MOVIE_PRUNE_DAYS="120" \ -e TV_PRUNE_DAYS="90" \ -e TV_NEVER_PLAYED_DAYS="180" \ -e DRY_RUN="false" \ emby-prune Set the schedule to: e.g. Weekly, Monday, 03:00 Final Notes This has been running nicely on my Emby setup: Library stays tidy Storage doesn’t explode Old/watched stuff eventually ages out New and recently watched content is kept All rules are tweakable via environment variables I may update when I get time to do the following ( No promises!) whitelists, never delete tags notifications logs to file 1
ophate 0 Posted January 22 Posted January 22 Exactly what I am looking for, but I don't see the variable for Movies between MOVIE_PRUNE_DAYS and MOVIE_NEVER_PLAYED_DAYS I also tried adding an INTERVAL variable to allow for rerun of the container script every so often. How are you scheduling this (I am using compose)? Dockerfile FROM python:3.12-alpine WORKDIR /app RUN pip install --no-cache-dir requests COPY emby_prune.py /app/emby_prune.py CMD ["sh", "-c", "\ HOURS=${INTERVAL:-24}; \ SLEEP_SECONDS=$((HOURS*3600)); \ echo \"[emby-prune] Running every ${HOURS} hours\"; \ while true; do \ python /app/emby_prune.py; \ sleep ${SLEEP_SECONDS}; \ done \ "]
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now