Jump to content

Automatic Emby Media Pruning (Movies & TV Episodes) Using a Custom Docker Script – Any-User Watch Logic


Recommended Posts

Posted

Hi everyone,

I wanted to share a solution I built with a lot of help from ChatGPT.
I’m not a programmer, but I needed a smart, automatic way to clean up old movies and TV episodes on my Emby server.

Since I couldn’t find anything that did what I wanted, I ended up putting together a Docker-based pruning script. It’s been working really well for me, so I’m posting it here in case it helps someone else.

🎯 What This Does

This script automatically deletes media based on real viewing activity across all users on the server.

The retention values below are fully customisable – see the Environment Variables section.

✅ Movies

  • Deleted if no user has watched them in 120 days

  • Or if never watched, deleted only if older than 120 days

✅ TV Episodes

  • If played at least once → delete after 90 days with no recent activity

  • If never played → delete only if older than 180 days

The script talks to Emby via the API and calls DELETE /Items/{Id}, so items are removed cleanly from the library and the filesystem. No orphaned metadata.

🧠 Why I Built It

  • I share my Emby with multiple users

  • Storage was growing fast

  • I wanted cleanup based on what people actually watch, not just file dates

  • I wanted different rules for movies and TV

  • And I’m not a programmer, so it needed to be simple and repeatable

This solution is:

  • Docker-based (works nicely on Unraid (My Setup), Synology, Linux Docker, etc.)

  • Configurable via environment variables

  • Safe to test using dry-run mode

📦 Setup Overview

I’ll show the steps using a Linux/Unraid style setup with nano, but the same idea works on any box with Docker.

Folder structure

Create a folder for the script and Dockerfile, for example:

mkdir -p /mnt/user/appdata/emby-prune
cd /mnt/user/appdata/emby-prune

🧾 Step 1 – Create the Dockerfile (with nano)

  • In a terminal on your server:

cd /mnt/user/appdata/emby-prune
nano Dockerfile
  • Paste this into nano
FROM python:3.12-alpine

WORKDIR /app

RUN pip install --no-cache-dir requests

COPY emby_prune.py /app/emby_prune.py

CMD ["python", "/app/emby_prune.py"]

Save & exit:

  • Ctrl + O, Enter to save

  • Ctrl + X to exit

📜 Step 2 – Create the Python Script (with nano)

  • Still in the same folder:

nano emby_prune.py
  • Paste the entire script below into nano:
import os
import sys
import requests
from datetime import datetime, timedelta, timezone

def parse_iso(dt_str):
    if not dt_str:
        return None
    # Emby uses ISO 8601 with offset, e.g. 2024-02-10T12:34:56.0000000+00:00
    dt_str = dt_str.replace("Z", "+00:00")
    try:
        dt = datetime.fromisoformat(dt_str)
        # Ensure timezone-aware; assume UTC if none
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)
        return dt
    except Exception:
        return None

def get_env(name, default=None, required=False):
    value = os.getenv(name, default)
    if required and not value:
        print(f"Missing required env var: {name}", file=sys.stderr)
        sys.exit(1)
    return value

def fetch_users(session, emby_url, api_key):
    """Fetch all Emby users."""
    params = {"api_key": api_key}
    r = session.get(f"{emby_url}/Users", params=params, timeout=30)
    r.raise_for_status()
    users = r.json()
    print(f"Found {len(users)} users in Emby.")
    return users

def fetch_user_item_userdata(session, emby_url, api_key, user_id, item_id):
    """Fetch per-user item data (UserData) for given user and item."""
    params = {
        "api_key": api_key,
        "Fields": "UserData"
    }
    r = session.get(f"{emby_url}/Users/{user_id}/Items/{item_id}", params=params, timeout=30)
    r.raise_for_status()
    data = r.json()
    return data.get("UserData") or {}

def main():
    emby_url = get_env("EMBY_URL", required=True)       # e.g. http://emby:8096/emby
    api_key  = get_env("EMBY_API_KEY", required=True)

    # Defaults: movies 120d, TV played 90d, TV never-played 180d
    movie_days            = int(get_env("MOVIE_PRUNE_DAYS", get_env("PRUNE_DAYS", "120")))
    tv_days               = int(get_env("TV_PRUNE_DAYS", "90"))
    tv_never_played_days  = int(get_env("TV_NEVER_PLAYED_DAYS", "180"))
    dry_run               = get_env("DRY_RUN", "true").lower() == "true"

    now_utc = datetime.now(timezone.utc)
    cutoff_movies   = now_utc - timedelta(days=movie_days)
    cutoff_tv       = now_utc - timedelta(days=tv_days)
    cutoff_tv_never = now_utc - timedelta(days=tv_never_played_days)

    print(f"Emby prune starting (movies + TV, any-user mode)")
    print(f"  URL                     : {emby_url}")
    print(f"  Movie cutoff            : {cutoff_movies.isoformat()} (older than {movie_days} days)")
    print(f"  TV cutoff (played)      : {cutoff_tv.isoformat()} (older than {tv_days} days)")
    print(f"  TV cutoff (never played): {cutoff_tv_never.isoformat()} (older than {tv_never_played_days} days)")
    print(f"  Dry run                 : {dry_run}")
    print("")

    session = requests.Session()
    base_params = {"api_key": api_key}

    # 1) Get all users
    try:
        users = fetch_users(session, emby_url, api_key)
    except Exception as e:
        print(f"ERROR: Failed to fetch users: {e}", file=sys.stderr)
        sys.exit(1)

    if not users:
        print("No users found – aborting.")
        sys.exit(0)

    page_size = 200
    start_index = 0
    total_deleted = 0
    total_candidates = 0

    while True:
        # 2) Get movies + episodes
        params = {
            **base_params,
            "Recursive": "true",
            "IncludeItemTypes": "Movie,Episode",
            "Fields": "Path,DateCreated,Type,SeriesName,SeasonName,IndexNumber,ParentIndexNumber",
            "StartIndex": start_index,
            "Limit": page_size,
        }

        r = session.get(f"{emby_url}/Items", params=params, timeout=60)
        r.raise_for_status()
        data = r.json()

        items = data.get("Items", [])
        total_records = data.get("TotalRecordCount", 0)

        if not items:
            break

        for item in items:
            item_id = item.get("Id")
            name = item.get("Name", "Unknown")
            path = item.get("Path")
            date_created = parse_iso(item.get("DateCreated"))
            item_type = item.get("Type", "")

            # For episodes, we can build a bit more context (optional)
            series_name = item.get("SeriesName")
            season_num = item.get("ParentIndexNumber")
            episode_num = item.get("IndexNumber")

            # 3) Look at all users' UserData for this item
            most_recent_play = None
            total_plays_all_users = 0

            for user in users:
                user_id = user.get("Id")
                user_name = user.get("Name", "Unknown")

                try:
                    ud = fetch_user_item_userdata(session, emby_url, api_key, user_id, item_id)
                except Exception as e:
                    print(f"Warning: failed to fetch UserData for item {item_id} and user {user_name}: {e}")
                    continue

                lp = parse_iso(ud.get("LastPlayedDate"))
                pc = ud.get("PlayCount", 0) or 0
                total_plays_all_users += pc

                if lp:
                    if (most_recent_play is None) or (lp > most_recent_play):
                        most_recent_play = lp

            stale = False
            reason = ""

            if item_type == "Movie":
                # Movie rule: stale if no plays in movie_days
                if most_recent_play:
                    if most_recent_play < cutoff_movies:
                        stale = True
                        reason = f"movie: last played by some user at {most_recent_play.isoformat()}"
                else:
                    if date_created and date_created < cutoff_movies:
                        stale = True
                        reason = f"movie: never played by any user; created {date_created.isoformat()}"

            elif item_type == "Episode":
                # TV rule:
                # - if played: stale if last play older than tv_days
                # - if never played: stale only if older than tv_never_played_days
                if most_recent_play:
                    if most_recent_play < cutoff_tv:
                        stale = True
                        reason = f"episode: last played by some user at {most_recent_play.isoformat()}"
                else:
                    if date_created and date_created < cutoff_tv_never:
                        stale = True
                        reason = f"episode: never played by any user; created {date_created.isoformat()}"

            # Unknown type: skip
            if not stale:
                continue

            total_candidates += 1

            # Nicely formatted label for episodes
            if item_type == "Episode" and series_name:
                if season_num is not None and episode_num is not None:
                    label = f"{series_name} S{season_num:02d}E{episode_num:02d} - {name}"
                else:
                    label = f"{series_name} - {name}"
            else:
                label = name

            print(f"[STALE] {label}")
            print(f"        Type      : {item_type}")
            print(f"        ID        : {item_id}")
            print(f"        Path      : {path}")
            print(f"        Reason    : {reason}")
            print(f"        Total plays (all users) : {total_plays_all_users}")
            if dry_run:
                print("        Action    : SKIP (dry-run)\n")
                continue

            del_params = {"api_key": api_key}
            del_url = f"{emby_url}/Items/{item_id}"
            try:
                resp = session.delete(del_url, params=del_params, timeout=60)
                if resp.status_code in (200, 204):
                    total_deleted += 1
                    print("        Action    : DELETED\n")
                else:
                    print(f"        Action    : FAILED (status {resp.status_code})\n")
            except Exception as e:
                print(f"        Action    : FAILED ({e})\n")

        start_index += len(items)
        if start_index >= total_records:
            break

    print("")
    print(f"Scan complete.")
    print(f"  Stale candidates : {total_candidates}")
    print(f"  Deleted          : {total_deleted} (dry_run={dry_run})")

if __name__ == "__main__":
    main()

 

Save & exit:

  • Ctrl + O, Enter

  • Ctrl + X

🧱 Step 3 – Build the Docker Image

From the same folder:

cd /mnt/user/appdata/emby-prune
docker build -t emby-prune .

You should see something like:

  • Successfully built ...

  • Successfully tagged emby-prune:latest

⚙️ Environment Variables (Retention Rules)

These control how aggressive the pruning is:

Variable Meaning Default
MOVIE_PRUNE_DAYS Movies: del after X days inactivity 120
TV_PRUNE_DAYS TV: del after X days if episode was played 90
TV_NEVER_PLAYED_DAYS TV: delete after X days if never played 180
DRY_RUN Preview only (true / false) true
EMBY_URL/EMBY_API_KEY Your Emby server base URL/API

http://server:8096/emby

Generate an API Key

🧪 Step 4 – Test with a Dry Run (Safe Mode)

This does not delete anything, just prints what would be removed:

docker run --rm \
  --name emby-prune-test \
  -e EMBY_URL="http://YOURSERVER:8096/emby" \
  -e EMBY_API_KEY="YOUR_API_KEY" \
  -e MOVIE_PRUNE_DAYS="120" \
  -e TV_PRUNE_DAYS="90" \
  -e TV_NEVER_PLAYED_DAYS="180" \
  -e DRY_RUN="true" \
  emby-prune
  • Check the output

  • Make sure the stuff marked [STALE] looks sensible

🗑️ Step 5 – Run for Real (Optional)

Once you’re happy with the dry-run output, you can run it for real:

docker run --rm \
  --name emby-prune-run \
  -e EMBY_URL="http://YOURSERVER:8096/emby" \
  -e EMBY_API_KEY="YOUR_API_KEY" \
  -e MOVIE_PRUNE_DAYS="120" \
  -e TV_PRUNE_DAYS="90" \
  -e TV_NEVER_PLAYED_DAYS="180" \
  -e DRY_RUN="false" \
  emby-prune

Set the schedule to:

  • e.g. Weekly, Monday, 03:00

🙋‍♂️ Final Notes

This has been running nicely on my Emby setup:

  • Library stays tidy

  • Storage doesn’t explode

  • Old/watched stuff eventually ages out

  • New and recently watched content is kept

  • All rules are tweakable via environment variables

I may update when I get time to do the following ( No promises!)

  • whitelists, never delete tags
  • notifications
  • logs to file
  • Thanks 1
Posted

Thanks for sharing.

  • Like 1
  • 3 weeks later...
Posted

thx! i will try :)

  • Like 1
  • 1 month later...
Posted

Exactly what I am looking for, but I don't see the variable for Movies between MOVIE_PRUNE_DAYS and MOVIE_NEVER_PLAYED_DAYS

I also tried adding an INTERVAL variable to allow for rerun of the container script every so often. How are you scheduling this (I am using compose)?

Dockerfile

FROM python:3.12-alpine
WORKDIR /app
RUN pip install --no-cache-dir requests
COPY emby_prune.py /app/emby_prune.py
CMD ["sh", "-c", "\
  HOURS=${INTERVAL:-24}; \
  SLEEP_SECONDS=$((HOURS*3600)); \
  echo \"[emby-prune] Running every ${HOURS} hours\"; \
  while true; do \
    python /app/emby_prune.py; \
    sleep ${SLEEP_SECONDS}; \
  done \
"]

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...