Experimenting with Machine Learning and Movie Recommendations

November 3, 2022

I've spent some of my spare time experimenting with machine learning, and applying what I've learned to some code for Emby.

Using the massive dataset supplied by MovieLens (https://movielens.org/), I was able to create a machine learning plugin that loads a large set of data into a matrix factorization neural network, and can spit out recommendations for users in emby.

Following this article by Microsoft: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/movie-recommendation

and experimenting with their ML.Net libraries things started to get pretty interesting.

The project itself is rather large, and I thought I'd share what I've come up with so far, maybe other people are interesting in how the neural network recommendation service is working when testing it against movies inside an Emby library. It's pretty cool.

The idea could be to create an in-depth recommendation system for emby, or possibly utilize this as a way to create a new "Top Picks" style plugin that is user based. Which also might be really cool.

I'll try and answer some questions up front if I can:

What is Matrix Factorization, and How does it work in Machine Learning?

matrix-factorization.png.2371532e133f767d8172b498e905fd95.png

In a nut shell, if you have similar tastes in movies as other people in the dataset, and they have rated a movie you haven't seen in a while, or have never seen, then that movie is recommended to you as a possible item to watch.

The Neural Network will predict what you, or your users may rate the movie if it was watched.

Is this plugin run locally on my machine?

Yes, this plugin creates a matrix factorization neural network locally on your machine, which is trained in a scheduled task.

Is any of my personal data shared during the training process of the neural network?

No, your personal data is completely safe. no user data is shared with any data provider, and stays safely on your server.

How does the neural network recommend items?

movielens.jpg.f6072b2993e079f51fb937ec14becbde.jpg

The MovieLens dataset is an accumulation of almost 25 million data point on 27 thousand movies with almost 100 thousand users ratings.

The ratings data is altered on your server to include your user data and then the neural network is trained on all the data points, allowing it to predict what movies in your library you and your users may want to watch.

Once the "recommender" is trained it then saves a model and uses that to predict recommendations for users.

The NN can be retrained when new users are added to your emby server, including them in the recommendation process.

Asking your users to "like" or "Favorite" items in the library will certainly help factoring recommendations, however it is not entirely necessary.

Does the Matrix Factorization Only look at "Liked/Favorite" Data?

No, other columns are also feed into the neural network, like item genre.

If one of your users do not "heart" the media on their account, their watched status and item play-states are factored into the recommendation predictions. If they watch more items that fit into a certain category, this is scanned in and calculated.

It is best to have several movies marked as "liked/favorite" in the library form each user to get best results.

Is it cross platform

Yes, I believe it is. ML.Net is a cross-platform library using .netstandard 2.0.

With regards to the runtimes associated with the plugin the following are available, and would be included in a dll:

Linux64
Linux ARM
Linux ARM64
OSX64
OSX ARM64
Win86
Win64

It is all fairly preliminary, definitely keep this in mind

On the settings page it is possible to train the NN Model (top), and also run recommendation predictions (bottom).

Once the network has created it's model, it is possible to run this model again and again against the library (due to it's 27 000 movie titles included).

Selecting the "Recommendation Predictions" begins the second task of calculating what each user might like.

Browsing back to the "Recommendations" tab, outlines each users predictions.

It shows what the NN predicts your score would be out of 5.

The results are definitely interesting. Sure, these are all new movies I should watch, and they are pretty much some of the most famous of the year so far.

So the results seem to be pretty close to what I would expect.

The training of the model also only use 20 iterations, to be honest I don't know if that is enough, or too much. It's all new.

What I can say is that loading the ML.Net Assemblies work, and loading the MovieLens dataset into the NN also works.

I have added toggles and inputs to try and test different training scenarios, but it's all kind of unexplored.

Edited November 3, 2022 by chef

November 3, 2022

Woow, it seems really interesting.

I'll keep this thread under control

November 3, 2022

I‘m wondering… is there a „like“ button in Emby at all?

November 3, 2022

3 hours ago, horstepipe said:

I‘m wondering… is there a „like“ button in Emby at all?

Yeah the "heart" icon

November 3, 2022

Just now, chef said:

Yeah the "heart" icon

This is the „favorite button“.

from my experience (including me and all of my users) this is being used to mark movies which are planned to be watched, and not for movies I like.

I‘d guess that most of the users use that button like that…?

Whatever, this looks really promising

November 3, 2022

I understand why Netflix got rid of ratings, and internally uses other metrics to determine interest in subject matter (things such as completion status, for example). I think seeding a model with completion data may yield some useful results. It would require no changes to Emby core or apps to accomplish this. What I would ultimately want out of such a model is a recommendation for content that does not exist in my library...this can be tricky where you've got users who don't have full access to every share in every library (that how I roll with all of my non-admin accounts), or where you don't want content recommended to those who may have restrictions (age appropriateness, etc). I think having an ML-driven recommendation is neat, but what should be the end goal?

November 4, 2022

16 hours ago, sydlexius said:

I understand why Netflix got rid of ratings, and internally uses other metrics to determine interest in subject matter (things such as completion status, for example). I think seeding a model with completion data may yield some useful results. It would require no changes to Emby core or apps to accomplish this. What I would ultimately want out of such a model is a recommendation for content that does not exist in my library...this can be tricky where you've got users who don't have full access to every share in every library (that how I roll with all of my non-admin accounts), or where you don't want content recommended to those who may have restrictions (age appropriateness, etc). I think having an ML-driven recommendation is neat, but what should be the end goal?

That's a cool idea.

Completion data, so was the item watched all the way through?

The movie lens data set using ratings.

It was one of the only datasets I could find.

I think the end goal would be some kind of user based recommendation system. If it is incorporated into top picks, or something else.

Because if the size of the dataset from movie lens, and it's use of tmdb IDs, it believe it is totally possible to recommend items that are not yet in the library.

In fact, I was kind of thinking to try that.

Edited November 4, 2022 by chef

November 4, 2022

1 hour ago, chef said:

Completion data, so was the item watched all the way through?

Exactly! Did a user finish the movie? Did they watch over x% of a complete season of a show? (if a show dataset comes out) That is exactly what I meant!

November 4, 2022

20 minutes ago, sydlexius said:

Exactly! Did a user finish the movie? Did they watch over x% of a complete season of a show? (if a show dataset comes out) That is exactly what I meant!

I did it!

I flipped the ratings, and then used the dataset to recommend movies from inside for users added in emby.

Now, I'll use the tmdb IDs to do a provider lookup and get the trailer url to add to a strm file.

We can show a row of items that may not exist in the library but are recommend.

Dude! That was a great idea!!

November 4, 2022

2 hours ago, chef said:

I did it!

I flipped the ratings, and then used the dataset to recommend movies from inside for users added in emby.

Now, I'll use the tmdb IDs to do a provider lookup and get the trailer url to add to a strm file.

We can show a row of items that may not exist in the library but are recommend.

Dude! That was a great idea!!

I think that things like "Favorites" are useful...if anything, it almost certainly means more than someone watching a movie to completion (or worse, marking a movie as complete).

November 6, 2022

I really like this chef, and would love to beta test a .dll!

One thing I will say is that I agree with others, "watched" to completion is a lot more useful than "favorite" hearts in my opinion. I, and most of my users, like others here, use the "heart" to pick out "to be watched" movies into a list. I'm constantly shuffling my favorites around as a result, and they are not a great indication of my likes or dislikes. I would have to completely change how I use favorites to make this work correctly.

If there were a setting to turn off "favorites" as an input and instead use "watched" that would be awesome sauceome.

Thanks for this chef and can't wait to see a beta version!

Edited November 6, 2022 by Shibboleth
clarity

November 6, 2022

This sounds like a really good idea!

+1 from me

Thanks

November 7, 2022

I am very interested in this.

Great work already and am looking forward to further developments, should they come.

November 10, 2022

@chef now you can probably see why I resorted to adding 'favorites/recommendations' as a user (or global) 'playlist' as you can do this easily in the UI today. Then just scrape the playlist and bingo, you have a list to work with.

I *think* in the latest beta, the playlist may now be held in the db as well, rather than having to parse the playlist xml..

With all your top picks work.. this could be really snazzy and post who recommended it on the movie poster (instead of saying top pick, it might say in a heart or something the users name)

Edited November 10, 2022 by rbjtech

November 12, 2022

Looking forward to give this a try.

February 3, 2023

Hi chef,

i test the first version of your plugin , when testing to traine it , it not taking anything but increment the movie.csv file :
image.png.e12aa2748fbf539533874a8e1015013a.png

image.png.f76c3705841747472063f18c0e4fc402.png

But i did exactly what you tell in the pre-require :

put the rating.csv in emby/data :
image.png.03882f3d9d54f769f5ebd4e4fa383742.png

Regards,

Lzm

February 9, 2023

Interested to see how this goes

February 10, 2023

On 2/3/2023 at 6:09 PM, LazyMonday91 said:

Hi chef,

i test the first version of your plugin , when testing to traine it , it not taking anything but increment the movie.csv file :

But i did exactly what you tell in the pre-require :

put the rating.csv in emby/data :

Regards,

Lzm

Sorry, it's been a while since I looked at this plugin.

Perhaps it needs some love.

February 12, 2023

It does look interesting, I only have 6 users but I think it could still be useful. If you add links to trailers if there not in the library then that would be a game changer (Top Picks + Trailers).

Not sure if your still interested in this? But it was a cool concept anyway.

February 14, 2023

On 11/3/2022 at 12:57 PM, horstepipe said:

This is the „favorite button“.

from my experience (including me and all of my users) this is being used to mark movies which are planned to be watched, and not for movies I like.

I‘d guess that most of the users use that button like that…?

Whatever, this looks really promising

I need to come back to that.

Would a plugin be powerful enough to add some kind of like / dislike prompt at the end of a movie, which results could be stored in an additional database?

I am still very sure that the favorite button won't be used as assumed.

Best regards

December 8, 2023

Pretty late to the party here but would love to be part of the beta

December 9, 2023

This sounds really great! I'm also on the page of having recommendations made for items not currently in my library as well. Kind of like a discovery tool. Any chance there's data points on TV Series as well?

Sign In

Experimenting with Machine Learning and Movie Recommendations

Recommended Posts

chef 3808

fbrassin 46

horstepipe 377

chef 3808

horstepipe 377

sydlexius 268

chef 3808

sydlexius 268

chef 3808

sydlexius 268

Shibboleth 13

Ninko 75

fridgefins 6

rbjtech 4996

neik 870

LazyMonday91 25

DukeVenator 0

chef 3808

Junglejim 383

horstepipe 377

almgollena 0

MRobi 161

Create an account or sign in to comment

Create an account

Sign in

Activity