Jump to content

Experimenting with Machine Learning and Movie Recommendations


chef

Recommended Posts

 

I've spent some of my spare time experimenting with machine learning, and applying what I've learned to some code for Emby.

 

Using the massive dataset supplied by MovieLens (https://movielens.org/),  I was able to create a machine learning plugin that loads a large set of data into a matrix factorization neural network, and can spit out recommendations for users in emby.

 

Following this article by Microsoft: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/movie-recommendation

and  experimenting with their ML.Net libraries things started to get pretty interesting.

 

The project itself is rather large, and I thought I'd share what I've come up with so far, maybe other people are interesting in how the neural network recommendation service is working  when testing it against movies inside an Emby library. It's pretty cool.

The idea could be to create an in-depth recommendation system for emby, or possibly utilize this as a way to create a new "Top Picks" style plugin that is user based. Which also might be really cool.

 

I'll try and answer some questions up front if I can:

  • What is Matrix Factorization, and How does it work in Machine Learning?

matrix-factorization.png.2371532e133f767d8172b498e905fd95.png

In a nut shell, if you have similar tastes in movies as other people in the dataset, and they have rated a movie you haven't seen in a while, or have never seen, then that movie is recommended to you as a possible item to watch.

The Neural Network will predict what you, or your users may rate the movie if it was watched.

 

  • Is this plugin run locally on my machine?

Yes,  this plugin creates a matrix factorization neural network  locally on your machine, which is trained in a scheduled task.

 

  • Is any of my personal data shared during the training process of the neural network?

No, your personal data is completely safe. no user data is shared with any data provider, and stays safely on your server.

 

  • How does the neural network recommend items?

movielens.jpg.f6072b2993e079f51fb937ec14becbde.jpg

The MovieLens dataset is an accumulation of almost 25 million data point on 27 thousand movies with almost 100 thousand users ratings.

The ratings data is altered on your server to include your user data and then the neural network is trained on all the data points, allowing it to predict what movies in your library you and your users may want to watch.

Once the "recommender" is trained it then saves a model and uses that to predict recommendations for users.

The NN can be retrained when new users are added to your emby server, including them in the recommendation process.

Asking your users to "like" or "Favorite" items in the library will certainly help factoring recommendations, however it is not entirely necessary.

 

  • Does the Matrix Factorization Only look at "Liked/Favorite" Data?

No, other columns are also feed into the neural network, like item genre.

If one of your users do not "heart" the media on their account, their watched status and item play-states are factored into the recommendation predictions. If they watch more items that fit into a certain category, this is scanned in and calculated.

It is best to have several movies marked as "liked/favorite" in the library form each user to get best results.

 

  • Is it cross platform

Yes, I believe it is. ML.Net is a cross-platform library using .netstandard 2.0.

With regards to the runtimes associated with the plugin the following are available, and would be included in a dll:

  • Linux64
  • Linux ARM
  • Linux ARM64
  • OSX64
  • OSX ARM64
  • Win86
  • Win64

 

It is all fairly preliminary, definitely keep this in mind :) 

 

On the settings page it is possible to train the NN Model (top), and also run recommendation predictions (bottom).

setting_page_recommmender.thumb.png.e888e3fe192eac72cf8200680329e434.png

Once the network has created it's model, it is possible to run this model again and again against the library (due to it's 27 000 movie titles included).

Selecting the "Recommendation Predictions" begins the second task of calculating what each user might like.

Browsing back to the "Recommendations" tab,  outlines each users predictions.

predictions_page_recommmender.thumb.png.9e222106b1e361e7d84826414b6a482c.png

 

 

It shows what the NN predicts your score would be out of 5. 

 

The results are definitely interesting. Sure, these are all new movies I should watch, and they are pretty much some of the most famous of the year so far.

So the results seem to be pretty close to what I would expect.

The training of the model also only use 20 iterations, to be honest  I don't know if that is enough, or too much. It's all new.

What I can say is that loading the ML.Net Assemblies work, and loading the MovieLens dataset into the NN also works.

I have added toggles and inputs to try and test different training scenarios, but it's all kind of unexplored.

 

 

Edited by chef
  • Like 14
  • Haha 1
  • Agree 1
  • Thanks 2
Link to comment
Share on other sites

3 hours ago, horstepipe said:

I‘m wondering… is there a „like“ button in Emby at all? 

Yeah the "heart" icon 👍

Link to comment
Share on other sites

horstepipe
Just now, chef said:

Yeah the "heart" icon 👍

This is the „favorite button“.

from my experience (including me and all of my users) this is being used to mark movies which are planned to be watched, and not for movies I like. 
 

I‘d guess that most of the users use that button like that…?

Whatever, this looks really promising 👍

 

Link to comment
Share on other sites

sydlexius

I understand why Netflix got rid of ratings, and internally uses other metrics to determine interest in subject matter (things such as completion status, for example).  I think seeding a model with completion data may yield some useful results.  It would require no changes to Emby core or apps to accomplish this.  What I would ultimately want out of such a model is a recommendation for content that does not exist in my library...this can be tricky where you've got users who don't have full access to every share in every library (that how I roll with all of my non-admin accounts), or where you don't want content recommended to those who may have restrictions (age appropriateness, etc).  I think having an ML-driven recommendation is neat, but what should be the end goal?

Link to comment
Share on other sites

16 hours ago, sydlexius said:

I understand why Netflix got rid of ratings, and internally uses other metrics to determine interest in subject matter (things such as completion status, for example).  I think seeding a model with completion data may yield some useful results.  It would require no changes to Emby core or apps to accomplish this.  What I would ultimately want out of such a model is a recommendation for content that does not exist in my library...this can be tricky where you've got users who don't have full access to every share in every library (that how I roll with all of my non-admin accounts), or where you don't want content recommended to those who may have restrictions (age appropriateness, etc).  I think having an ML-driven recommendation is neat, but what should be the end goal?

That's a cool idea.

Completion data, so was the item watched all the way through?

 

The movie lens data set using ratings.

It was one of the only datasets I could find.

 

I think the end goal would be some kind of user based recommendation system. If it is incorporated into top picks, or something else.

 

Because if the size of the dataset from movie lens, and it's use of tmdb IDs, it believe it is totally possible to recommend items that are not yet in the library. 

In fact, I was kind of thinking to try that.

 

Edited by chef
  • Like 2
Link to comment
Share on other sites

sydlexius
1 hour ago, chef said:

Completion data, so was the item watched all the way through?

Exactly! Did a user finish the movie? Did they watch over x% of a complete season of a show? (if a show dataset comes out) That is exactly what I meant!

  • Like 1
Link to comment
Share on other sites

20 minutes ago, sydlexius said:

Exactly! Did a user finish the movie? Did they watch over x% of a complete season of a show? (if a show dataset comes out) That is exactly what I meant!

I did it!

I flipped the ratings, and then used the dataset to recommend movies from inside for users added in emby.

😆 Now, I'll use the tmdb IDs to do a provider lookup and get the trailer url to add to a strm file.

We can show a row of items that may not exist in the library but are recommend.

Dude! That was a great idea!! 

 

  • Like 2
Link to comment
Share on other sites

sydlexius
2 hours ago, chef said:

I did it!

I flipped the ratings, and then used the dataset to recommend movies from inside for users added in emby.

😆 Now, I'll use the tmdb IDs to do a provider lookup and get the trailer url to add to a strm file.

We can show a row of items that may not exist in the library but are recommend.

Dude! That was a great idea!! 

 

I think that things like "Favorites" are useful...if anything, it almost certainly means more than someone watching a movie to completion (or worse, marking a movie as complete).

Link to comment
Share on other sites

Shibboleth

I really like this chef, and would love to beta test a .dll!

 

One thing I will say is that I agree with others, "watched" to completion is a lot more useful than "favorite" hearts in my opinion. I, and most of my users, like others here, use the "heart" to pick out "to be watched" movies into a list. I'm constantly shuffling my favorites around as a result, and they are not a great indication of my likes or dislikes. I would have to completely change how I use favorites to make this work correctly.

 

If there were a setting to turn off "favorites" as an input and instead use "watched" that would be awesome sauceome.

 

Thanks for this chef and can't wait to see a beta version!

Edited by Shibboleth
clarity
  • Like 3
Link to comment
Share on other sites

rbjtech

@chef now you can probably see why I resorted to adding 'favorites/recommendations' as a user (or global) 'playlist' as you can do this easily in the UI today.  Then just scrape the playlist and bingo, you have a list to work with.

I *think* in the latest beta, the playlist may now be held in the db as well,  rather than having to parse the playlist xml..

With all your top picks work.. this could be really snazzy and post who recommended it on the movie poster (instead of saying top pick,  it might say in a heart or something the users name)

 

Edited by rbjtech
  • Like 1
Link to comment
Share on other sites

  • 2 months later...
LazyMonday91

Hi chef,

i test the first version of your plugin , when testing to traine it , it not taking anything but increment the movie.csv file :
image.png.e12aa2748fbf539533874a8e1015013a.png


image.png.f76c3705841747472063f18c0e4fc402.png

 

But i did exactly what you tell in the pre-require :

put the rating.csv in emby/data :
image.png.03882f3d9d54f769f5ebd4e4fa383742.png

Regards,

Lzm :)

Link to comment
Share on other sites

On 2/3/2023 at 6:09 PM, LazyMonday91 said:

Hi chef,

i test the first version of your plugin , when testing to traine it , it not taking anything but increment the movie.csv file :
image.png.e12aa2748fbf539533874a8e1015013a.png


image.png.f76c3705841747472063f18c0e4fc402.png

 

But i did exactly what you tell in the pre-require :

put the rating.csv in emby/data :
image.png.03882f3d9d54f769f5ebd4e4fa383742.png

Regards,

Lzm :)

Sorry, it's been a while since I looked at this plugin.

Perhaps it needs some love. 😃

  • Haha 1
  • Agree 1
Link to comment
Share on other sites

Junglejim

It does look interesting, I only have 6 users but I think it could still be useful. If you add links to trailers if there not in the library then that would be a game changer (Top Picks + Trailers).:)

Not sure if your still interested in this? But it was a cool concept anyway. 👍

  • Agree 1
Link to comment
Share on other sites

horstepipe
On 11/3/2022 at 12:57 PM, horstepipe said:

This is the „favorite button“.

from my experience (including me and all of my users) this is being used to mark movies which are planned to be watched, and not for movies I like. 
 

I‘d guess that most of the users use that button like that…?

Whatever, this looks really promising 👍

 

I need to come back to that.

Would a plugin be powerful enough to add some kind of like / dislike prompt at the end of a movie, which results could be stored in an additional database?

I am still very sure that the favorite button won't be used as assumed.

Best regards

  • Agree 3
Link to comment
Share on other sites

  • 9 months later...

This sounds really great! I'm also on the page of having recommendations made for items not currently in my library as well. Kind of like a discovery tool. Any chance there's data points on TV Series as well?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...