Jump to content

Best path for cleaning up duplicate People profiles / folders?


Go to solution Solved by Smitty018210,

Recommended Posts

Smitty018210
Posted

I am working on a small helper app for my Emby People metadata.

The app scans the Emby metadata/people area and looks for possible duplicate person profiles. The goal is not to blindly delete anything. The goal is to help review possible duplicate People folders before doing anything risky.

Right now the app is looking for duplicates by things like:

  • Same normalized person name

  • Same TMDB person ID

  • Same IMDb person ID

  • Linked duplicate chains, where one profile matches by name and another matches by provider ID

  • Skipped/problem profiles with missing or unreadable info

The issue I am trying to solve is duplicate People folders like this:

Vin Diesel
Vin Diesel-tmdb-12835

or cases where one profile has provider IDs and another is name-only.

My concern is this:

If I delete the “bad” / name-only People folder, will Emby just recreate it later from movie/show/episode cast metadata?

From what I understand, the person profile NFO itself does not store credits/media links. The media item metadata is what contains cast/actor information. So deleting a People folder may only delete the stored person metadata/image folder, but it may not fix whatever source caused Emby to create that person entry in the first place.

I am trying to find the safest and most Emby-correct cleanup path.

Questions for the Emby dev team:

  • Is the correct preferred People folder format still like this?

Person Name-tmdb-123456
Person Name-imdb-nm1234567
  • Should TMDB ID be treated as the preferred primary person ID, with IMDb as fallback?

  • If a duplicate name-only People folder exists beside a provider-ID folder, what is the safest way to remove or merge that duplicate?

  • Does Emby recreate name-only People folders when media NFO/cast metadata does not include a provider ID?

  • If local NFOs are used, should the fix be made in the movie/show/episode NFO actor entries first, before deleting any duplicate People folder?

  • Is there any safe way through Emby API or metadata refresh to tell Emby that one person profile should be treated as the correct one?

  • Is there a recommended process for cleaning duplicate People records without damaging existing images, locked profiles, or manually edited person metadata?

What I am trying to avoid is building a delete/cleanup feature that appears to work, but then Emby recreates the same duplicate People profile on the next scan because the source metadata still points to a name-only actor entry.

Any guidance on the correct order of operations would help.

For example, should the safe process be something like:

  • Identify the good provider-ID People profile.

  • Check movie/show/episode cast metadata for name-only actor entries.

  • Correct actor/provider IDs in the media NFOs first.

  • Refresh/rescan Emby.

  • Only delete the duplicate People folder after Emby stops recreating or using it.

I would appreciate any clarification on how Emby decides when to create a new People folder versus reuse an existing People profile, especially when one profile is name-only and another has TMDB/IMDb IDs.

image.thumb.png.86c389dbae43d8c99ea61004df7ea9a0.png

ginjaninja
Posted

Is the end goal to ensure your library metadata is right/deduped or the metadata folder on filesystem? I would be tempted to focus on getting the library metadata correct and trust that emby will sort out the filesystem (apart from wasted space, does it matter?). That wont clean duplicates on the filesystem but at least your library will be 'correct'. I dont recall the api exposes the filesystem related metadata links, i think you just have to trust Emby in that regard..

Smitty018210
Posted (edited)
1 hour ago, ginjaninja said:

Is the end goal to ensure your library metadata is right/deduped or the metadata folder on filesystem? I would be tempted to focus on getting the library metadata correct and trust that emby will sort out the filesystem (apart from wasted space, does it matter?). That wont clean duplicates on the filesystem but at least your library will be 'correct'. I dont recall the api exposes the filesystem related metadata links, i think you just have to trust Emby in that regard..

Right now I have 14,883 possible duplicate People profiles showing in my People folder. A lot of them have wrong information, duplicated information, or look like old/bad profiles that are just sitting there doing nothing.

Emby could be using a profile with outdated info, missing info, or just plain wrong info, while the correct profile/folder is sitting right next to it.

I know some of these are probably not true duplicates. Some are likely different people who happen to share the same name. I understand that part.

What I am trying to figure out is why Emby created so many of these People folders in the first place.

These are not just empty folders. They have pictures, NFO files, and profile data.

Right now I am trying to do two things:

  • Remove old, bad, wrong, or duplicate profiles.
  • Understand why Emby created this many People folders, so when I do clean them up, I don't end up having the same problem happen again.
Edited by Smitty018210
Smitty018210
Posted (edited)

@LukeIs there anyway you can point me in the right direction/give me info on this? 

Edited by Smitty018210
Posted
Quote

The app scans the Emby metadata/people area and looks for possible duplicate person profiles.

For starters I would have it look in the database rather than the folders. You could have old folders in there left behind that aren't being used anymore but that doesn't mean that you have a duplicate person in the database.

Posted
Quote

If I delete the “bad” / name-only People folder

You're not deleting a person when you do this. The only thing this accomplishes is deleting metadata saved to disk.

Posted

I think first you have to decide whether your goal is to cleanup duplicates in the emby user interface, or in the folders.

Smitty018210
Posted
2 hours ago, Luke said:

I think first you have to decide whether your goal is to cleanup duplicates in the emby user interface, or in the folders.

It’s kind of both. This is only part of a larger app I’m creating to help clean up and manage Emby people profiles, including profile images, backdrop images, metadata, and related files. 

You might not remember, but last year I asked about adding backdrop support for people profiles in Emby, and it was added. Thanks again for that. While I was deciding how I wanted to add and organize my backdrop images, I discovered that I had a large number of duplicate people folders and profiles.

I decided that I need to clean up those duplicate folders and profiles before moving forward with the backdrop images.

The ultimate goal is to have one profile/folder for each person, and not have old unused data just sitting there taking up space. 

Posted

In the UI it would be going to the content the actor is tagged in and editing the people list to make sure data matches across all usages of that person. it's not about editing the person themselves.

Smitty018210
Posted

Ok, so you mean going to the metadata of each of media/videos that the person is in and making sure that the name is correct across all of them. Ok, that make sense. 
But that does not explain how this happened.

Vin Diesel
Vin Diesel-tmdb-12835 

Why would Emby create this profile "Vin Diesel-tmdb-12835" when the other one was already there? 
How did that happen? 
Why does profile have "tmdb-12835 " in the name?

Posted

it's not a profile, it's just a folder holding data.

Quote


Why does profile have "tmdb-12835 " in the name?

This helps deal with cases of different people of the same name.

Smitty018210
Posted
9 minutes ago, Luke said:

it's not a profile, it's just a folder holding data.

This helps deal with cases of different people of the same name.

Ok, that does not make sense with my example? 
Vin Diesel
Vin Diesel-tmdb-12835 
 

Posted
48 minutes ago, Smitty018210 said:

Why would Emby create this profile "Vin Diesel-tmdb-12835" when the other one was already there? 
How did that happen? 

This happens at the time of the library scan if the meta data provider doesn't know the persons TMDB id, most of the times if TVDB is the primary meta data provider. This will just create the "Vin Diesel" folder with only the person image inside. If you now click on the person in Emby, it will use TMDB to search for the person and it's details. This will create the "Vin Diesel-tmdb-12835" folder with the person nfo inside. But the person image is still referenced in the old folder!

Smitty018210
Posted
21 minutes ago, pünktchen said:

This happens at the time of the library scan if the meta data provider doesn't know the persons TMDB id, most of the times if TVDB is the primary meta data provider. This will just create the "Vin Diesel" folder with only the person image inside. If you now click on the person in Emby, it will use TMDB to search for the person and it's details. This will create the "Vin Diesel-tmdb-12835" folder with the person nfo inside. But the person image is still referenced in the old folder!

Ok, that makes more sense now. Emby can be using multiple folders to store info/images for one person. 
Seems weird that happened for someone like Vin Diesel, but at least that makes more sense. 
What would be the best way to clean this up? Quarantine these folders/ move them to different folder, and then wait for Emby to recreate person folder over time?  

Posted
22 minutes ago, Smitty018210 said:

What would be the best way to clean this up?

Copy the image from the old to the new folder, update the image path in the library, delete the old folder.

  • Solution
Smitty018210
Posted (edited)
On 6/10/2026 at 6:48 PM, pünktchen said:

This happens at the time of the library scan if the meta data provider doesn't know the persons TMDB id, most of the times if TVDB is the primary meta data provider. This will just create the "Vin Diesel" folder with only the person image inside. If you now click on the person in Emby, it will use TMDB to search for the person and it's details. This will create the "Vin Diesel-tmdb-12835" folder with the person nfo inside. But the person image is still referenced in the old folder!

After doing some testing, what I’m finding does not match what you described.

Most of these duplicate folders do not have images. Most are just folders with NFO files. In a few cases, they are completely empty folders.

Sadly, it looks like my app dies here. Without a reliable way to clean up these folders, and without truly understanding why Emby creates two folders for the same person in the first place, I cannot move forward with testing. I really do thank you for your help though. 

Edited by Smitty018210
ginjaninja
Posted

 

On 10/06/2026 at 21:47, Smitty018210 said:

The ultimate goal is to have one profile/folder for each person, and not have old unused data just sitting there taking up space. 

I have 3 "Tom Hollands" for what is ultimately 2 people in the real world. Separate nfos share the same tvdbid and different tmdbid which isnt possible. So emby might not be right all the time or have received duff information form a provider...

Observations

  1. The delete endpoint can not be use for persons (access denied).
  2. there is no person merge function in ui or api
  3. Each person ID is hard coded to a meta location (delete a metadata folder and refresh person in  emby to log error saying folder). The person.nfos dont contain an emby id, the link is mystical. An bodgy empirical algorithm is needed to identify the person to folder link (unless the devs say its exposed somewhere)
  4. removing all content off a person seems to remove the person id from database but as a tombstone, under some unknown circumstance the object can be reinstated when you refresh shows with that person (string match?) on. Does emby clean these tombstones up. time>? reboot?

With

3 emby persons Tom Holland (detail below)

3 folders

"Emby-Server\programdata\metadata\people\Tom Holland\person.nfo"

"Emby-Server\programdata\metadata\people\Tom Holland-tmdb-64796\person.nfo"

"Emby-Server\programdata\metadata\people\Tom Holland-tmdb-1136406\person.nfo"

Suggested Approach to test manually (not proven out)

I will assume the tmdbids on user objects are unique for now (to be proven)

  1. (identify which folder the Emby ID is associated to, maybe use tmdb as primary key, will that uniquely identify them, did for me), assuming you will be left with 1 Emby object without a tmDB id and that would be associated to the \Tom Holland folder
  2. identify which person ids you will keep, possibly copy known good metadata to ids you will keep (method to be determined)
  3. identify which emby object and associated folder is the bad one (s) (by looking at content comparisons across tmdb/tvdb api).
    1. In my case "Emby-Server\programdata\metadata\people\Tom Holland-tmdb-64796\person.nfo" was perfect, all the ids matched the content
    2. The person associated to "Emby-Server\programdata\metadata\people\Tom Holland-tmdb-1136406\person.nfo" was the bad one
  4. remove tom holland from all content from the bad one
  5. This will delete the emby id and possibly folder, delete the metadata folder
  6. After tombston period has expired tbc Refresh the content associated to the bad one identified for 6 and hopefully it will be attached correctly to the now curated person objects.

 

2 and 3 are doing alot of heavy lifting, easier said than done. Maybe for persons with duplicate names, you need to retrive all tmdb and tvdb metadata for their media objects and use the content response to establish the link between TVDBID and TMDBID. maybe filmography, type, description comparison to weight confidence.

Of course you could do all that perfectly or your own version of it, but it wont help you retain any manually curated metadata and people images en masse. but maybe the observations help some of your thinking.

but maybe just having a utility which is double checking duplicate persons out of band from the refresh metadata cycle which optimises for speed i suspect, would be interesting.
 

Spoiler

 

Emby ID<>Metadata Folder association checked by changing person description.
[tvdb primary for series,season and episode in metadata providers]

"Emby-Server\programdata\metadata\people\Tom Holland\person.nfo"
EMBY: 448429
IMDB:
TMDB:
TVDB: 287561 Spiderman Actor
Content for: Spiderman Actor, correct

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<person>
  <plot />
  <outline />
  <lockdata>false</lockdata>
  <dateadded>2026-05-23 04:03:52</dateadded>
  <title>Tom Holland</title>
  <year>1996</year>
  <sorttitle>Tom Holland</sorttitle>
  <tvdbid>287561</tvdbid>
  <premiered>1996-06-01</premiered>
  <releasedate>1996-06-01</releasedate>
  <placeofbirth>Kingston upon Thames, England, UK</placeofbirth>
  <uniqueid type="tvdb">287561</uniqueid>
</person>

{
  "Items": [
    {
      "Name": "Wolf Hall",
      "ServerId": "fccaa6f897f140dbb689938224ae1938",
      "Id": "111122",
      "RunTimeTicks": 35400000000,
      "IsFolder": true,
      "Type": "Series",
      "AirDays": [],
      "ImageTags": {
        "Primary": "c25bec6f440ef442b982f88093dec1a4_637283552741521396",
        "Banner": "d1ed3e75521cf132e2c5130e5c7c8ab5_637283552635303345",
        "Logo": "a7ac711050b90fd1874308620218888b_638715206661622092",
        "Thumb": "3e28477acf03ef1c493fc75a651bc7d5_637283535038983080"
      },
      "BackdropImageTags": [
        "0ca147aec86f19f1ec7c0448638d856b_637283552619725098"
      ]
    }
  ],
  "TotalRecordCount": 1
}

"Emby-Server\programdata\metadata\people\Tom Holland-tmdb-64796\person.nfo"
EMBY: 64339
IMDB: nm0276169 American Director
TMDB: 64796 American Director
TVDB: 8163543 American Director
Content for: American Director, correct


<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<person>
  <plot><![CDATA[truncated]]></plot>
  <outline><![CDATA[truncated]]></outline>
  <lockdata>false</lockdata>
  <dateadded>2024-02-24 20:36:13</dateadded>
  <title>Tom Holland</title>
  <year>1943</year>
  <sorttitle>Tom Holland</sorttitle>
  <imdbid>nm0276169</imdbid>
  <tvdbid>8163543</tvdbid>
  <tmdbid>64796</tmdbid>
  <premiered>1943-07-11</premiered>
  <releasedate>1943-07-11</releasedate>
  <placeofbirth>Poughkeepsie, New York, USA</placeofbirth>
  <uniqueid type="tmdb">64796</uniqueid>
  <uniqueid type="tvdb">8163543</uniqueid>
  <uniqueid type="imdb">nm0276169</uniqueid>
</person>


{
  "Items": [
    {
      "Name": "Fright Night",
      "ServerId": "fccaa6f897f140dbb689938224ae1938",
      "Id": "296332",
      "RunTimeTicks": 63858800000,
      "IsFolder": false,
      "Type": "Movie",
      "ImageTags": {
        "Primary": "8e6f95ca90ddc64ec85470d7bf12b4a1_634633397735412404",
        "Art": "30db2ab3effdf0beba4d6184d543a06c_636164683841507209",
        "Banner": "b0e34bc42c6ca9d902d0608779b89cb7_635410232446056574",
        "Disc": "c74e51dd8ddc3f9fbb2a9c5dd5301c88_635685841568169686",
        "Logo": "9b3c46a7e256036bec27da1797cc638b_635043905979142152",
        "Thumb": "dcbefd2c56448c10c1217021b24517f3_635862286494212979"
      },
      "BackdropImageTags": [
        "9afcdd2afc4abb0d660ba13205495ef7_635384247309409086",
        "37a282f0ae4ad3650889a898b8d109f6_635384247321100570",
        "03fc7425ce8c3fb46d321bfe3fa3834d_635384247336682549",
        "d16269564469274884aa0e618990c4ea_635446568471039422",
        "d772e4fe5e4bc91c65f3d764dfcc2f04_635446568484970262",
        "b6eb65de5b53bbf22b0f2db3d08f161e_635446568500131215",
        "b801c82b30fc08da96bf02ac60f8c960_635446568510491808",
        "3f299e65c15c0e53274f1e98e52c3ba8_635615123334434443",
        "cc2bdf29a894b075b565f252aa4b1996_635615123344114997",
        "0b35526d92e096fce56e7a09786fb81f_635615123358665829"
      ],
      "MediaType": "Video"
    }
  ],
  "TotalRecordCount": 1
}


"Emby-Server\programdata\metadata\people\Tom Holland-tmdb-1136406\person.nfo"
EMBY: 22141
IMDB: nm4043618
TMDB: 1136406 Spiderman Actor
TVDB: 8163543 American Director
Content for: Mixture of the two


<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<person>
  <plot><![CDATA[truncated]]></plot>
  <outline><![CDATA[truncated]]></outline>
  <lockdata>false</lockdata>
  <dateadded>2024-02-24 20:36:13</dateadded>
  <title>Tom Holland</title>
  <year>1943</year>
  <sorttitle>Tom Holland</sorttitle>
  <imdbid>nm0276169</imdbid>
  <tvdbid>8163543</tvdbid>
  <tmdbid>64796</tmdbid>
  <premiered>1943-07-11</premiered>
  <releasedate>1943-07-11</releasedate>
  <placeofbirth>Poughkeepsie, New York, USA</placeofbirth>
  <uniqueid type="tmdb">64796</uniqueid>
  <uniqueid type="tvdb">8163543</uniqueid>
  <uniqueid type="imdb">nm0276169</uniqueid>
</person>

{
  "Items": [
    {
      "Name": "Avengers: Endgame",
      "ServerId": "fccaa6f897f140dbb689938224ae1938",
      "Id": "297189",
      "RunTimeTicks": 108713710000,
      "IsFolder": false,
      "Type": "Movie",
      "ImageTags": {
        "Primary": "d73e9b0f68e114d38ebb536e725a5aea_637003270720013484",
        "Logo": "2b407252439a9c16b0a1d8c917fb5b62_637004820709961544",
        "Thumb": "7f6bcf946416d3f573bfcaa49ca7d369_637343124785411803"
      },
      "BackdropImageTags": [
        "0092364ba62479baa48cbdd081d057e5_637343124783466192"
      ],
      "MediaType": "Video"
    },
truncated
    {
      "Name": "King of the Road",
      "ServerId": "fccaa6f897f140dbb689938224ae1938",
      "Id": "7639",
      "RunTimeTicks": 16459780000,
      "IndexNumber": 9,
      "ParentIndexNumber": 4,
      "IsFolder": false,
      "Type": "Episode",
      "ParentLogoItemId": "85",
      "ParentBackdropItemId": "85",
      "ParentBackdropImageTags": [
        "1374cd0e8adc2e5b9c22e6d92e409a9d_636623748341737564"
      ],
      "SeriesName": "Tales from the Crypt",
      "SeriesId": "85",
      "SeasonId": "7588",
      "SeriesPrimaryImageTag": "deabbc3e8e7459ef262c01c2d88f72a0_636588139927544264",
      "SeasonName": "Season 4",
      "ImageTags": {
        "Primary": "669e5c0bb1c9c6c19c97c37276f2772e_637654997025502913"
      },
      "BackdropImageTags": [],
      "ParentLogoImageTag": "74472d9abeb0105aff440a059c14a5e4_636623748369641620",
      "ParentThumbItemId": "85",
      "ParentThumbImageTag": "35337cda87609272f42d240c4633bc01_636588140216838302",
      "MediaType": "Video"
    }
    
  ],
  "TotalRecordCount": 15
}

The TVshows assigned to this emby id are incorrect. TVDB has eg
https://thetvdb.com/series/tales-from-the-crypt/episodes/105605#castcrew
as
https://thetvdb.com/people/8163543-tom-holland
ie Tom Holland the director

 

 


 

 

Neminem
Posted

I think things changed with 4.9 where "tmdb + id" was implemented.

I also see a lot of file system duplicates.

But not duplicates in emby.

yet another thing not disclosed in a update post from the Emby team.

Come to think of it, nothing was disclosed about the update from 4.8 to 4.9 🤷‍♂️

Not even a blog.

pünktchen
Posted
3 hours ago, Neminem said:

I think things changed with 4.9 where "tmdb + id" was implemented.

That's true for collections. For people it was already added with version 4.6.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...