Jump to content

New Plugin - Actor Image Web Scraper


mickle026

Recommended Posts

mickle026

** UPDATE **

There are two seperate plugins on this page - you will need to read the thread to see which one (or both) is best for you.

The original plugin (Actor Web Scraper) works at the server dashboard, it cannot be used or accessed by clients.  This is my first proper working attempt at this.  Its a bit clumsy but works.
I may revisit this one and update it, but for now it is what it is.

The newer plugin (Custom Person Provider) is internal and works when you click an Actor card, it is also triggered by clients. (but configured from the dashboard which they cannot access).
This plugin can quickly be turned on/off in the config page.

-----------------------------------------------------

This is the first working beta release - special thanks to @roaku for their help getting the javascript on the webui working :)

It only scrapes one actor at a time, im not going to do one that hits these websites continuously for the whole lot in one go so don't ask.

Please feel free to try it, test it whatever.

 

Built with net standard 2.0, so should be compatible with most emby releases.  I have added linux support but have no way to test it - so let me know if it works :)

Feel free to suggest anything it lack or improvements that could be made.

 

Enjoy :)

There is another plugin lower down in this thread, "custompersonprovider" which uses the internal Actor image system, its in Alpa stage, and it works seperately from this one.  Please read the blub if you are going to try it.

As always the blurb:
I accept no liability for anything - use at your own risk

 

Custom Poster Provider     ** Updated 5 feb 2021 ***

Added a few more scrapers, but also added the ability to pull images from other Emby servers

 

 

Screenshot_2021-01-24 Emby(1).png

Screenshot_2021-01-24 Emby(2).png

 

** UPDATE 25 JAN 2021 WITH FIXES **

 

 

*** NEW 3rd Feb 2021 ***

Added Internal Image Search Scraper Plugin.  This is seperate to the Actor Web Scraper.  This is Called Custom Person Provider because it works when you click the actor image inside Emby.

Try not to use all the scrapers as it will actually slow down load times for your clients who click on Actors that DO NOT have an image already.  I have added a config page so that scrapers can be turned on /off.

Trakt images are webp , they work well but may cause image processing error in the log from emby when it tries to resize it as it acesses it.  I am currently not sure if all "client" app support it either, all of my tests do, but thats not all devices.

 

 

 

Screenshot_2021-02-03 Emby(7).png

Screenshot_2021-02-03 Emby(8).png

 

 

Screenshot_2021-02-04 Emby(1).png



 

 

Screenshot_2021-02-05 Emby(1).png

Screenshot_2021-02-05 Emby(2).png

ActorWebScraper.zip

ActorWebScraper-server 4.6.0.20.zip

 

CustomPersonProvider.zip

Edited by mickle026
Custom Person Provider Plugin for internal server use (can be triggered by clients)
  • Like 3
  • Thanks 1
Link to post
Share on other sites
mickle026

I am aware there is an index issue with the list when the list is quite large.

However the actor found is still correctly added to the right actor in Emby.  Will fix this though as soon as i figure out why.

** UPDATED INFO  AND PLUGIN DLL FILE **

I figured out  what was causing the indexing problem on my sytem, not sure how to fix it yet at the WebUI side, but it is caused by incorrectly imported metadata Actor names from one of the metadata providers.

My plugin exports actornames with ids from Emby to the WebUI, and these are set from the server in json format and then split into the list by commas.  This makes the amount of actors exceed, the id numbers.  Thats where there is sometimes a discrepancy.

So in a nutshell if you experience this error, go to your emby log and look for an entry like:
 

 Info WebScraper: Person Name: Michelle Dockery, Harry Lloyd, Julie Walters, Alun Armstrong, Stephen McCole, Simon Russell Beale, Tom Georgeson, Joe Armstrong , id: 309800

instead of like:

Info WebScraper: Person Name: Che Broadway , id: 309769

ie, 1 person = 1 id

and fix it in your metadata first then try again.

The log will now tell you where there is probably an problem with your metadata

2021-01-24 20:59:38.752 Info WebScraper: --------------------------------------------------------------------------
2021-01-24 20:59:38.752 Info WebScraper: METADATA PROBLEM: Jonathan Slavin, Lukita Maxwell, Marin Hinkle, , id: 307244
2021-01-24 20:59:38.752 Info WebScraper: Actor will not be found if this is not just one name
2021-01-24 20:59:38.752 Info WebScraper: actorname as part - is wrong formatting for a thumbnail to be found
2021-01-24 20:59:38.752 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-24 20:59:38.752 Info WebScraper: --------------------------------------------------------------------------

I have updated the .dll and re-uploaded it with several fixes and a few more image sources.

 

Nearly all of these errors are from TVDB, maybe from an earlier issue with either VTDB or the TVDB plugin, but whatever the reason refreshing your metadata "should" fix this, if not you will be able to find the issues more easily.  Anything reported like James (I) as a metadata error can be ignored.  The plugin is simply reporting brackets as a possible error.

 

Screenshot_2021-01-24 Emby(3).png

Screenshot_2021-01-24 Emby(4).png

Edited by mickle026
Link to post
Share on other sites
mickle026

** Updated the Plugin DLL in the first post

You might find a few people you need to double check, this last image shows Tom Russell, there are two, ones a singer, ones a young boy, in this case i needed the young boy .

1.  Fixed the Indexing issue
2.  Added 3 more scraped sites to the search
3.  Added a search list - incase you want to do someone specifically (case sensitive though at the moment)
4. Fixed a javasript Error.
5. Fixed bug where leaving page an re-entering or clicking load Actors would double the list size.

Screenshot_2021-01-25 Emby.png

Screenshot_2021-01-25 Emby(1).png

Screenshot_2021-01-25 Emby(2).png

Screenshot_2021-01-25 Emby(4).png

Edited by mickle026
add an image
Link to post
Share on other sites
Happy2Play

Yep it appears to work, but refreshing the person in most cases gets the image also as the majority of the people this plugin shows has not metadata so the server has not truly processed them either.  But have only looked at a few examples.

   

 

Link to post
Share on other sites
mickle026

The plugin shows *ONLY* people thumbnails that are missing, so they HAVE NOT been grabbed by a metadata refresh..

It is false to say ALL the people have no metadata, as some do have but still do not have thumbnails. 

The server having not processed them is because they don't exist on the metadata providers - yet or for whatever reason they are not imported.

Yet they do elsewhere  on the internet and thats where this plugin is handy.

 

Quote

but refreshing the person in most cases gets the image also as the majority of the people this plugin shows has not metadata so the server

This statement is not truely accurate either


1. Some people have metadata but no image, none available on the metadata providers (yet)

2. If you have already refreshed them and they updated they wouldn't appear in this list.

 

Newer installs and people whose libraries don't have older metadata pre existing will notice that there are actually lots of images missing from tmdb, tvdb and omdb.  Even newer titles dont have some of the main characters

Also TVDB has a LOT of errors, which this plugin will point out in the logs.  Some refreshing at episode level will fix some, but others refresh and remain the same - ie, still Wrong!

 

refreshing at the current released beta at series level doesn't refresh at episode level properly with tvdb.

 

NJot being funny either but refresh metadata isnt always fetching the correct links.

 

This is Felice Orlandi fetched by metadata refresh, however using the search icon moviedb supplies a completely different image, so what the heck is the first image, provided from metadata refresh? - its not 100% reliable yet.  Looks like a screengrab to me, although that is turned off.

Also attached is a person that refreshes but no thumbnail is fetched.
Roger Spottiswoode

 

 

Screenshot_2021-01-25 Emby(5).png

Screenshot_2021-01-25 Emby(6).png

Screenshot_2021-01-25 Emby(7).png

Edited by mickle026
Link to post
Share on other sites
Happy2Play
5 hours ago, mickle026 said:

It is false to say ALL the people have no metadata, as some do have but still do not have thumbnails. 

True it is just a coincidence the 20 I checked had no metadata.

Link to post
Share on other sites
mickle026

 

2 hours ago, Happy2Play said:

True it is just a coincidence the 20 I checked had no metadata.

So what?

No metadata - So then no-one is going to want as full of a lineup of actor images if possible? ... okie dokie ...

But thanks for pointing out we can click metadata refresh and get actor images that metadata refresh didn't get, still cannot get, and is not yet on the metadata providers websites but this plugin can possibly get elsewhere so this plugin is pointless because we can refresh metadata and still get nothing in these cases :) .  Makes perfect sense to me .... i think . LOL.

Plugin only gets images, nothing else.  Users can still refresh metadata at a later date and see if it has been updated and exists.  It doesn't interfere with any of that.

Its their choice if they want to use it, and your choice not to if you think its pointless.

:)   you're welcome btw, and thanks for the positivity :)

Link to post
Share on other sites
Brendon

This is exactly what i have been looking for, testing now on a massive library, will report back 🙂

EDIT: It's still scraping and showing a lot of errors in my metadata (thanks showing me all the work needing done haha), but it does seem to be having trouble with any abbreviation in a name such as M.D, Jnr or Snr as shown below;

2021-01-26 12:35:02.412 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:02.412 Info WebScraper: POSSIBLE METADATA PROBLEM: Ross Bagdasarian, Sr. , id: 37947
2021-01-26 12:35:02.412 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:02.412 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:02.412 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:03.707 Info WebScraper: --------------------------------------------------------------------------

 

2021-01-26 12:35:04.919 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:04.919 Info WebScraper: POSSIBLE METADATA PROBLEM: Ben Young, III , id: 168130
2021-01-26 12:35:04.919 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:04.919 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:04.919 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:05.022 Info WebScraper: --------------------------------------------------------------------------

 

2021-01-26 12:35:05.239 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:05.239 Info WebScraper: POSSIBLE METADATA PROBLEM: Elizabeth Ignacio, M.D. , id: 172023
2021-01-26 12:35:05.239 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:05.239 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:05.239 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:05.247 Info WebScraper: --------------------------------------------------------------------------


Also that dark blue text needs to be changed, on a dark background it's impossible to read.

2021-01-26_12-53-35.png

Edited by Brendon
Link to post
Share on other sites
mickle026
13 hours ago, Brendon said:

This is exactly what i have been looking for, testing now on a massive library, will report back 🙂

EDIT: It's still scraping and showing a lot of errors in my metadata (thanks showing me all the work needing done haha), but it does seem to be having trouble with any abbreviation in a name such as M.D, Jnr or Snr as shown below;

2021-01-26 12:35:02.412 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:02.412 Info WebScraper: POSSIBLE METADATA PROBLEM: Ross Bagdasarian, Sr. , id: 37947
2021-01-26 12:35:02.412 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:02.412 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:02.412 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:03.707 Info WebScraper: --------------------------------------------------------------------------

 

2021-01-26 12:35:04.919 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:04.919 Info WebScraper: POSSIBLE METADATA PROBLEM: Ben Young, III , id: 168130
2021-01-26 12:35:04.919 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:04.919 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:04.919 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:05.022 Info WebScraper: --------------------------------------------------------------------------

 

2021-01-26 12:35:05.239 Info WebScraper: --------------------------------------------------------------------------
2021-01-26 12:35:05.239 Info WebScraper: POSSIBLE METADATA PROBLEM: Elizabeth Ignacio, M.D. , id: 172023
2021-01-26 12:35:05.239 Info WebScraper:  Actor will not be found if this is not just one name
2021-01-26 12:35:05.239 Info WebScraper:  actorname 'as' role - is wrong formatting for a thumbnail to be found
2021-01-26 12:35:05.239 Info WebScraper: (director) (co-director) (creator) will also cause problems when finding thumbs
2021-01-26 12:35:05.247 Info WebScraper: --------------------------------------------------------------------------


Also that dark blue text needs to be changed, on a dark background it's impossible to read.

2021-01-26_12-53-35.png

 

Thanks for this, it helps .

 

I am working currently working to improve quite a few things and this will help.
 

Quote

Also that dark blue text needs to be changed, on a dark background it's impossible to read.

I will look in to the colours too
 

With Ben Young, III - i think that would probably be correctly reported as there isn't a comma in most of the web links I found, but Sr and M.D. are different.

https://www.imdb.com/name/nm7379295/

https://www.themoviedb.org/person/2426865-ben-young-iii

Ross Bagdasarian, Sr. doesn't have the Sr. , on most lookups so that may stop images from being found but Jr. does - not sure which is the best way to go with that one.

 

Has it shown you any other errors in metadata that do need fixing - like my example in a tv series above where some names are "name, name, name" ?

Link to post
Share on other sites
mickle026

** Updated to server version 4.6.0.20 **

I have left the older version on here, because this one isnt tested with older server versions.

Includes some scraper fixes
Includes a seach box fix

Much faster , improved by using a diffferent method to parse html and retrieve links.

More results in some cases.

 

I've not fixed the blue on black yet though  or the typo, sorry that'll get done later

 

If you see any broken images, they are probably broken on the websites too.
 

 

Screenshot_2021-01-27 Emby.png

Edited by mickle026
Link to post
Share on other sites
mickle026
2 hours ago, MRobi said:

@mickle026Question.... is this process automated or do you need to initiate the search and then go through and approve the results one by one?

One by one

A few reasons,

1. There is no 100% guaratee that they are the correct images, most searches are accurate, but a few may not be, especially where a few Actors have the same names.

2. There is no way to sort or rank the images as they are from external sites.  So automatically choosing the best one is impossible via programming code.

3. Automation would barage the sites with traffic, so they would likely change things so this plugin cannot access them.

Link to post
Share on other sites
mickle026

Here is a smaller search plugin, There is absolutely nothing to configure.  Even trying to get to the config pages will do nothing.

Its internal in the images clickable through the normal media views, its currently in aplha state (not even beta), it does work though. 

Warning:

It might slow down your current images search a few seconds or a person page load if it has no image while it looks for one. - so remove it if you're not happy, its not that fast at the moment.
Please ignore the provider names.

The images only download when you click a person or search the image.

Check the laws in your country if you are allowed to web scrape.  You are responsible as the end user.

again, no warranty whatsoever.

built and tested with 4.6.0.22 beta server

Screenshot_2021-02-01 Emby(3).png

1837983767_Screenshot_2021-02-01Emby(2).thumb.png.f9bf3a672e3d7cb9fc49d77abce1eca4.png

 

Screenshot_2021-02-01 Emby(4).png

Screenshot_2021-02-01 Emby(5).png

 

Edited by mickle026
Link to post
Share on other sites
tr0nllam

I'm really pleased with this new plugin, but it seems to prevent the deletion of actor images because it immediately tries to grab a new image when you delete the current one.

Edited by tr0nllam
Link to post
Share on other sites
mickle026

Thats Emby behaviour not the plugin.  It does it too with other fetchers, the TheMovieDB or TheTVDB, same with Artists in Music etc.

Emby polls the fetchers if there is no image, for you guessed it an image.  So when you delete one, it fetches an available image straight away.  I don't personally think it should do it after a delete.  I agree that it can be annoying if the image its grabbing isn't the one you want, or just plain wrong.

I have a few where i just had to upload a blank rectangle to overwrite it, to stop it.  You can still add + a correct image or an image of your choice over the fetchers, that will also stop it. (because the image isnt missing then)

This is as stated an Alpha release so there are no controls in it at the moment, only uninstalling it will stop it.  The same with page load times at the moment.
Im not sure where i am going with this at the moment but there will be a config page later on and you will be able to turn on/off scrapers.  So effectively you could turn them all off and it would sit dormant until you turn one on.  Thats the idea I have at the moment.

It was built as a test plugin to see if I could actually do this for people for part of a different plugin that i have in design, which also means I may be able to do it for Artists too.  There is absolutely no documentation on this so its all trial and error.  I am not a professional programmer either, learning on the go as it were.  We shall see where I go from here ...

Link to post
Share on other sites
mickle026

Updated the Internal Person Provider plugin with more scrapers and the option to turn them on or off.

Its now listed in the first post called Custom Person Provider

Edited by mickle026
Link to post
Share on other sites
tr0nllam
7 hours ago, mickle026 said:

Updated the Internal Person Provider plugin with more scrapers and the option to turn them on or off.

Its now listed in the first post called Custom Person Provider

Small bug, but I think IMDB is being labeled as Mubi in the search results.

Link to post
Share on other sites
mickle026
3 hours ago, tr0nllam said:

Small bug, but I think IMDB is being labeled as Mubi in the search results.

I have fixed this, thanks for pointing it out, I also added a couple more scrapers.  If you can run this in a broswer with the plugin config on one tab and your page on another, you can turn them on / off as you look for the actors.

Try not to use scrapers that are not valid for a particular search.  So don't use the marvel cinema on for stuff thats not marvel etc.  That way we reduce hits on their sites.

 

Re-uploaded to first post, I think this may be near to being a beta now :)

 

Screenshot_2021-02-04 Emby(1).png

 

Screenshot_2021-02-04 Emby(4).png

Edited by mickle026
  • Like 1
Link to post
Share on other sites
mickle026

Updated in first post

 

Added more scrapers

Added a live ON/OFF button to toggle the plugin ON/OFF without having to change the config

Added Friendly servers.  You now have the ability to pull people images from other Emby servers (each on a new line in the text box)

I am also aware of an issue with TVDuck, Emby is having an issue pulling the images as it cannot identify the type at the moment, but shows the thumbnail?? - weird.

631578386_Screenshot_2021-02-05Emby(1).thumb.png.95e3f404fc2e7ad9900fbade2ceb87ec.png

 

Screenshot_2021-02-05 Emby(3).png

Screenshot_2021-02-05 Emby(2).png

 

Screenshot_2021-02-05 Emby(6).png

Edited by mickle026
Link to post
Share on other sites
mickle026

Fixed the ON/OFF button saying OFF when you left the plugin page and went back to it.  It was still ON just not updating the view when the page was reloaded.

Added 2 more scrapers.

  • Like 1
Link to post
Share on other sites
mickle026

** UPDATED IN FIRST POST **

 

Improved the On/Off button behaviour and the looks/asthetic

Added a timeout slider to adjust when some sites might take longer to respond.

 

Both the on/off button and timeout slider update/save instantly so you dont need to save settings every time for these, only for which scrapers are enabled.

 

 

Screenshot_2021-02-07 Emby.png

Edited by mickle026
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...