Jump to content
ginjaninja

Script: Identify all artists by as many data points as possible

Recommended Posts

ginjaninja
Posted (edited)

Goal

To use Album and Title as well as Artist tags when identifying artists on Musizbrainz (not just artist)

Improve Artwork and Biographies in Emby.

A proof of concept to lend weight  that more accurate matching is possible.

 

Design

Adds some flexibility into the artist match around common substitutions (started/WIP)

Matches artist against MBZ lookup of album+artist+title, falling back when necessary to artist+title, then just title

Using MBZ recording endpoint which provides artist credits (ready) and aliases (WIP/if necessary), increasing the chance of a correct match

test mode logs proposed changes without actioning

log includes links to old and new data to aid checking of results.

 

My results

10% improvement (by number of correct matches) mostly made of correcting previously incorrectly identified artists via the automated emby fetcher + a few new matches. 1000 corrected matches.

Some limited ability to work around errors in the source tags.

Helped identify a number of incorrectly tagged data. ('99.9%' is matchable if correctly tagged)

 

some example highlights...

Artist=Final Fantasy,

Track=Flare Gun

Album = Indie/Rock Playlist: June (2007)

correctly (fixed) matched by recording on artist+title to Owen Pallett (by obscure artist credit)

https://musicbrainz.org/artist/6d394418-a565-4c16-9dec-f5a89e213cde

 

Artist=Agnes

Track=Release Me

Album=Now That's What I Call Music! 73

correctly (fixed) matched by recording on artist+title+album  disambiguated

Agnes - male Swiss techno/house producer, who performed a track release me but but not on this album

 https://musicbrainz.org/artist/e9708921-d68d-43e2-a838-fdca7c68ad01

to correct 

Agnes - female Agnes Carlsson, Swedish Idol 2005 winner

https://musicbrainz.org/artist/54e93235-2034-478d-b47b-a985828193b7

 

Installation

requires latest Powershell, run script under latest version pwsh.exe 

tested on windows, although should be agnostic.

requires an api key and url in config.

 

Issues and suggestions welcome.

 

caveats

a bug in beta with character encoding is causing an issue with matching artists with special characters. keep in test mode until fixed.

 

Changes

V0.0.0.2

New - Only 1 example track from each album potentially used (if the one track is not found by album, unlikely 2nd will be)

New - Add path of track into log to aid fixing tags, if there is no match then almost certain the tags are wrong.

Fix - attempted to zero variables correctly at end of the loop

Fix - references to hardcoded paths

 

 

ArtistFetcher v0.0.0.2.zip

ArtistFetcher v0.0.0.1.zip

Edited by ginjaninja
New functionality and Fixes
  • Like 1

Share this post


Link to post
Share on other sites
PenkethBoy

I gave this a run out

Some things i noticed

1. you have a reference to a:\ in your code for artist+title

2. The script has issues with Artists from AudioBook and MusicVideo libraries - these i suggest should be excluded from your selection of artists. AudioBooks may or may not have MBz entries - also i use AudioBook libraries for Podcasts - and all the podcast artists were processed which works but "fails" on all tests so not worth the effort. MusicVideos can have Artists assigned - but these may not match Music Artists - in that i have MV's for Artists i dont have in Music so they dont have songs to search against - these Artists are not visible directly in Emby so no point in adding MBz to them.

3. When an Artist is not found on MBz then incorrect results are written to the csv and would be applied to an Artist if i had not run in test mode

3.1 Variables are not cleared correctly when an Artist is new - i.e. values from the previous Matched artist are used!!

Generally it worked ok with only powershell errors for the a:\ drive approx line 230+

Note: re the character encoding - that will be because you are not passing back the data to emby with a utf-8 flag in content type of the post - i had the same issue for a while till the penny dropped :)

Share this post


Link to post
Share on other sites
ginjaninja

Thanks for feedback

will fix these issues, understanding the scope of variables is still on my todo list.

do you know a api syntax to filter the /item results of the artists lookup in line with your suggestion eg only retrieve artists from librarys of type music not music video or audiobook.

i think my issue with the character encoding is that the artists come out of the database with funny characters, which impinges on my searches on MBZ

eg the api gives me artist "Arne Domnérus", indeed when i look at the artist in Emby this is how the artist is shown, do you know of a way to fix that during the retrieve from api? @softworkz said it will be fixed soon in any case.

 

 

21 hours ago, PenkethBoy said:

 

2. The script has issues with Artists from AudioBook and MusicVideo libraries - these i suggest should be excluded from your selection of artists. AudioBooks may or may not have MBz entries - also i use AudioBook libraries for Podcasts - and all the podcast artists were processed which works but "fails" on all tests so not worth the effort. MusicVideos can have Artists assigned - but these may not match Music Artists - in that i have MV's for Artists i dont have in Music so they dont have songs to search against - these Artists are not visible directly in Emby so no point in adding MBz to them.

3. When an Artist is not found on MBz then incorrect results are written to the csv and would be applied to an Artist if i had not run in test mode

3.1 Variables are not cleared correctly when an Artist is new - i.e. values from the previous Matched artist are used!!

Generally it worked ok with only powershell errors for the a:\ drive approx line 230+

Note: re the character encoding - that will be because you are not passing back the data to emby with a utf-8 flag in content type of the post - i had the same issue for a while till the penny dropped :)

 

 

Share this post


Link to post
Share on other sites
PenkethBoy

To get all artists "by library" add the parent id of the music library or if you have more than one cycle through the libraries

you can get this view querying the libraries, checking the type then only applying your script to music libraries. Note Audiobook libraries are just a chopped down version of music libraries so have some of the features of music libraries

The character botch will be fixed as an when we get the next beta - hopefully

But the point i was trying to make was that if you have a string with unicode characters (not already mangled by emby) if you dont return the json as utf-8 - emby will also mangle the import of that string - so there are two issues - one of which you can control - e.g. Björk will get mangled unless you flag the json as utf-8

Artist by Library example

$MediaUrl = $embyServerUrl + "/emby/Artists?Recursive=true&ParentId=e2c00f297a5f80af390f52f72e782147&Fields=providerids&SortBy=Name" + "&api_key=" + $User.AccessToken

You probably need to apply the ParentID when you pull back songs as well to narrow the search

Share this post


Link to post
Share on other sites
ginjaninja
Posted (edited)

Been doing a bit more investigation and there is quite alot that is sub optimal about the scripts ability to find matches, in some places it is worse than emby, and in other places better. am going to rewrite the detection logic so it provides the best of all worlds. the underlying fundamentals of using additional datap points to find matches is proving itself

Edited by ginjaninja

Share this post


Link to post
Share on other sites
G0R3Nzh43W42

- Accidently posted in the wrong topic - 

Edited by G0R3Nzh43W42

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...