Jump to content


Photo

Ever thought about (ML.Net) Machine Learning Media Recommendations?


  • Please log in to reply
31 replies to this topic

#1 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 18 March 2019 - 06:56 PM

I was browsing Github and I came across the repo for ML.Net.

What d'ya know, there is an entire machine learning sample page and project which is based specifically on Movie Recommendation.

I realize that the aspect of media recommendation it written quite well in Emby already, however adding a machine learning aspect to the server kind of puts the server in a whole new league.

I don't think any of the other media server platforms have jumped on the bandwagon.

This is the link to the project below.

https://github.com/d...eRecommendation

It is currently utilizing a MovieLens Dataset, but I'm pretty sure a more personalized Dataset based from user libraries would be neat, at least unique.

That's cool, I think...

Edited by chef, 18 March 2019 - 07:18 PM.


#2 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 131161 posts
  • Local time: 02:14 PM

Posted 18 March 2019 - 09:29 PM

Possibly but given that we're dealing with personal media the amount of possible suggestions is not that large. It would be a nice buzz word to throw around though.

#3 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 22 March 2019 - 02:20 PM

I have been doing quiet a bunch of research on how ML.Net could be used to better the experience of the  Emby server.

 

I have started a proof on concept .netcore console app which is kind of interesting.

 

I read through both the ML.Net GitHub and this MSDN page:

 

https://docs.microso...recommmendation

 

 

I think it is possible to do something really cool with these libraries.

 

Basically, the machine learning algorithms starts by calculating whether a user should be recommended an item based on other users acceptance of the same item. 

 

 

 

              Incredibles 2 (2018)                      The Avengers (2012)                     Guardians of the Galaxy (2014)

 

User 1    Watched and liked movie             Watched and liked movie                   Watched and liked movie

User 2      Watched and liked movie           Watched and liked movie                    Has not watched -- RECOMMEND movie

 

 

Not super smart, but things can get really interesting when applying larger datasets for a deeper learning algorithm.

 

But, in order to make something like that happen,  there would need to be a control group of Emby users, who would be interested in submitting some viewer information.

 

Specifically, running a console app which gathers anonymous like/favorite information, and create a CSV for the learning program ("cringy" I know).

 

 

Once that information was gathered ML.Net would be able to work magic on the out come and create a fairly deep AI to recommend movies.

 

These recommendations would be based on more then just a shallow calculation. 

 

Actors, Genres, Titles, Years, and my favorite  -> Media Overviews Sentiments.

 

Fairly Deep Learning.

 

It's pretty nifty stuff.

 

But, the submission of Likes/Favorites is where I understand people would be hesitant.


Edited by chef, 22 March 2019 - 03:07 PM.


#4 BillOatman OFFLINE  

BillOatman

    Advanced Member

  • Members
  • 313 posts
  • Local time: 02:14 PM

Posted 22 March 2019 - 04:27 PM

I have been doing quiet a bunch of research on how ML.Net could be used to better the experience of the  Emby server.

 

I have started a proof on concept .netcore console app which is kind of interesting.

 

I read through both the ML.Net GitHub and this MSDN page:

 

https://docs.microso...recommmendation

 

 

I think it is possible to do something really cool with these libraries.

 

Basically, the machine learning algorithms starts by calculating whether a user should be recommended an item based on other users acceptance of the same item. 

 

 

 

              Incredibles 2 (2018)                      The Avengers (2012)                     Guardians of the Galaxy (2014)

 

User 1    Watched and liked movie             Watched and liked movie                   Watched and liked movie

User 2      Watched and liked movie           Watched and liked movie                    Has not watched -- RECOMMEND movie

 

 

Not super smart, but things can get really interesting when applying larger datasets for a deeper learning algorithm.

 

But, in order to make something like that happen,  there would need to be a control group of Emby users, who would be interested in submitting some viewer information.

 

Specifically, running a console app which gathers anonymous like/favorite information, and create a CSV for the learning program ("cringy" I know).

 

 

Once that information was gathered ML.Net would be able to work magic on the out come and create a fairly deep AI to recommend movies.

 

These recommendations would be based on more then just a shallow calculation. 

 

Actors, Genres, Titles, Years, and my favorite  -> Media Overviews Sentiments.

 

Fairly Deep Learning.

 

It's pretty nifty stuff.

 

But, the submission of Likes/Favorites is where I understand people would be hesitant.

Could you not use IMDB as your data source?  They have an API that I believe would let you get data at the level you are talking about.

Oh maybe not.  To train a model you are needing individual ratings on a movie per user.  It doesn't look like IMDB APIs give you that detail.  Too bad, would have been a perfect source :)


Edited by BillOatman, 22 March 2019 - 04:41 PM.


#5 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 22 March 2019 - 09:16 PM

This just got more interesting.

 

I was able to use my TmDb Developer keys to create a really nice DataSet for the ML CSV.

 

The Survey app will find users favorite media then get the Tmdb ID for the item.

 

This way the Machine can learn the best way it knows how, by comparing numbers.

 

The output looks like this:

user, mediaId,
2,10681
0,49047
0,20526
0,196
0,300668
0,155
0,299536
0,127380
0,335984
0,353081
0,369972
0,330459
0,75612
0,209112
0,12
0,339403
0,363088
0,102899
0,13475
0,24428
0,284053
0,324849
0,118340
0,10138
0,474395
0,10681
2,10437
2,260514
2,862
2,10681

The above CSV shows Three users, but "user 1" doesn't have and Favorite or Liked content. Only user 0 and user 2

 

If there where a bunch of volunteers, then the Dataset could get big enough for the Learning Algorithm to actually predict things. 


Edited by chef, 22 March 2019 - 09:17 PM.


#6 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 23 March 2019 - 10:27 AM

Well if anyone wants to participate in this little survey

 

Just run the attached exe, and post back the CSV file it creates in this thread.

 

Attached File  ML Recommendation Survey.zip   755.48KB   1 downloads

 

It doesn't save any personal info, it just looks for items you liked or favorited, and then finds the TMdb code for the item.

 

It then places it in a CSV file.

 

NOte: Make sure you have actually "Liked" items in your library or else it won't work. LOL


Edited by chef, 23 March 2019 - 10:28 AM.


#7 PenkethBoy OFFLINE  

PenkethBoy

    Advanced Member

  • Members
  • 3280 posts
  • Local time: 07:14 PM
  • LocationWarrington,UK

Posted 23 March 2019 - 12:19 PM

Chef - Interesting stuff :)

 

Question - re exe above - how is it getting data, how do you tell it which server to use etc, how is it authenticating - tad light on details  :)

 

Not keen on d/l and running an exe without some idea what it will do etc - for obvious reasons  :P



#8 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 23 March 2019 - 12:42 PM

Chef - Interesting stuff :)

 

Question - re exe above - how is it getting data, how do you tell it which server to use etc, how is it authenticating - tad light on details  :)

 

Not keen on d/l and running an exe without some idea what it will do etc - for obvious reasons  :P

 

Yes, I completely understand.

 

 

Using the Emby Api:

 

1. It does a UDP broadcast: "whoIsEmbyServer?" to locate the server on the network, and get the Connection IP.

 

2. The console app will ask you to log in. (Authenticating an admin user would probably be best, and someone with lots of views and "likes/favorites").

 

It will then scan the Emby Database for each user items marked as "favorite" or "liked".

 

Nothing else is saved, everything will be completely anonymous.

 

 

Each user is assigned a number (0 to user.count, no names), and if the app sees that "user[0]" likes a particular Movie or Series, it goes online to TMdb and gets the TMdb.id for the item.

 

It then creates a CSV file which looks like this:

2,10681
0,49047
0,20526
0,196
0,300668
0,155
0,299536
0,127380
0,335984
0,353081
0,369972
0,330459
0,75612
0,209112
0,12
0,339403
0,363088
0,102899
0,13475
0,24428
0,284053
0,324849
0,118340
0,10138
0,474395
0,10681
2,10437
2,260514
2,862
2,10681

Where the first integer is the user[int], and the second integer is the Tmdb.Id.

 

Afterward, that CSV file can be attached here, and I can create a master CSV file (Dataset) that we can feed to the Machine Learning algorithm.

 

The more information,  the better a prediction will occur.

 

If we can create a massive Dataset, and show someone (like Luke, for instance)  that a Machine Learning algorithm might predict and recommend media items better then some conditional statements, Emby could be the first "Smart" Media Server with an actual AI inside it.

 

I get giddy just thinking about that!  


Edited by chef, 23 March 2019 - 12:49 PM.

  • horstepipe likes this

#9 PenkethBoy OFFLINE  

PenkethBoy

    Advanced Member

  • Members
  • 3280 posts
  • Local time: 07:14 PM
  • LocationWarrington,UK

Posted 23 March 2019 - 12:59 PM

Thanks

 

Ok - couple of follow up questions :)

 

1. i have four servers in my network will the exe allow me to choose which server to run this against? ( i could turn three off but would prefer not to)

 

2. Why go online for a TMDB id when it already exists in the db? Can understand going online if its missing?

 

3. Curious - whats the min dataset size thats need for the ML to work - obvs more the better - but wondering if a server with a few users could benefit from a "local" instance of MI rather than a central DB of "likes"


  • chef likes this

#10 PenkethBoy OFFLINE  

PenkethBoy

    Advanced Member

  • Members
  • 3280 posts
  • Local time: 07:14 PM
  • LocationWarrington,UK

Posted 23 March 2019 - 01:03 PM

Ha got me thinking now  :)

 

I have more than one Movie Library - "Movies" and "Movies 4K" - so likes for "Movies" have the possibility of being duplicated for a user so this is likely to produce double likes for some movies - is that a problem?


Edited by PenkethBoy, 23 March 2019 - 01:08 PM.

  • chef likes this

#11 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 23 March 2019 - 01:58 PM

Thanks

Ok - couple of follow up questions :)

1. i have four servers in my network will the exe allow me to choose which server to run this against? ( i could turn three off but would prefer not to)

2. Why go online for a TMDB id when it already exists in the db? Can understand going online if its missing?

3. Curious - whats the min dataset size thats need for the ML to work - obvs more the better - but wondering if a server with a few users could benefit from a "local" instance of MI rather than a central DB of "likes"

1. That's a good question, I'm not sure how the API handles the udp broadcast when there is more then one server. I think it will return first or default.

2. I couldn't find the TMDB Ids in the Emby API. I realize that it is saved somewhere, but wasn't sure which namespace it was located.

I'll look again because I agree requesting that from TMDB is an added expense for the survey utility and on my API key too :).

3. I'm not exactly sure what a minimum dataset should be. There are a couple examples on GitHub that are pretty large. A couple thousand lines.

From what I gather reading the GitHub on ML, it will use a "regressive algorithm" to compare likes between a bunch of people. So even if a person where to submit a short list, it could still be used to add nodes in a neural network, which would help the regression better predict a recommendation.

Edited by chef, 23 March 2019 - 01:59 PM.


#12 BillOatman OFFLINE  

BillOatman

    Advanced Member

  • Members
  • 313 posts
  • Local time: 02:14 PM

Posted 23 March 2019 - 02:01 PM

I don't keep movies around that others in my family won't watch.  So I don't like or favorite anything on my own server.

But it is a cool idea, I'll go through the ones on my server now and like the ones I did like and get at least a little data to you :)

 

Could be expanded to TV Series as well once you get it dialed in.


Edited by BillOatman, 23 March 2019 - 02:04 PM.

  • chef likes this

#13 BillOatman OFFLINE  

BillOatman

    Advanced Member

  • Members
  • 313 posts
  • Local time: 02:14 PM

Posted 23 March 2019 - 02:32 PM

I sent the file to you in a private message.



#14 PenkethBoy OFFLINE  

PenkethBoy

    Advanced Member

  • Members
  • 3280 posts
  • Local time: 07:14 PM
  • LocationWarrington,UK

Posted 23 March 2019 - 02:36 PM

@chef

 

Sent you a DM

 

The TMDB Id's etc are returned as "Provider ID's" and a api call can return them easily but usually you have to specify them "Fields=ProviderIds" as most times they are not returned by default



#15 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 23 March 2019 - 02:37 PM

Thank you so much for participating in this.

Guess we'll see how big the dataset can get.

#16 PenkethBoy OFFLINE  

PenkethBoy

    Advanced Member

  • Members
  • 3280 posts
  • Local time: 07:14 PM
  • LocationWarrington,UK

Posted 23 March 2019 - 03:06 PM

Final Question for now - with TV - the Series has a TVDB id but not the seasons or Episode - so could the data be broadened to include say season and episode numbers in its analysis?



#17 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 23 March 2019 - 03:08 PM

Final Question for now - with TV - the Series has a TVDB id but not the seasons or Episode - so could the data be broadened to include say season and episode numbers in its analysis?


Absolutely, infact, that would give a larger number of outcomes.
  • PenkethBoy likes this

#18 BillOatman OFFLINE  

BillOatman

    Advanced Member

  • Members
  • 313 posts
  • Local time: 02:14 PM

Posted 24 March 2019 - 10:52 AM

This might have some data that could be used,  

http://ai.stanford.e...data/sentiment/



#19 chef OFFLINE  

chef

    Advanced Member

  • Developers
  • 3833 posts
  • Local time: 02:14 PM
  • LocationPeterborough, Canada

Posted 25 March 2019 - 06:35 PM

This might have some data that could be used,  

http://ai.stanford.e...data/sentiment/

 

This is great for sentiment analysis! Thanks, Bill! I think this will come in handy when it comes time to attempt a really in-depth neural network.

 

Once there's a bit more survey information, it can be broken down into,  genres, actors and overview sentiments... pretty much feed all the movie information into the network, and see what happens. 

 

That's the coolest part, you don't even know what the computer will come up with on its own...  :blink:

 

It's going to be pretty cool.


  • PenkethBoy likes this

#20 Vicpa OFFLINE  

Vicpa

    Advanced Member

  • Alpha Testers
  • 1421 posts
  • Local time: 02:14 PM
  • LocationWest Chester, Pa. USA

Posted 26 March 2019 - 02:13 PM

Hi  Chef

 

This is cool stuff.  :)

 

This may be a stupid question but how do I mark something as Liked? or Unliked for that matter. I can make something a Favorite but that is a whole other thing. 

 

I think that we used to be able to do that

 

-vicpa






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users