A model-free collaborative recommendation system in 20 lines of Python code

Model-free collaborative filtering is a “lightweight” approach to recommendation systems. It’s always based on the implicit “collaboration” (regarding ratings) among users. Still, it is computed in memory without using complex algorithms like ALS (Alternating Least Squares) that can be executed in a parallel environment (like Spark).

If we assume to have N users and M products, it’s possible to define a user-preference sparse matrix:

Each element represents the user’s rating of item j (0 means no rating has been provided). To face the differences belonging to many different contexts, it’s helpful to define two kinds of ratings (or feedback):

- Explicit feedback: an actual rating (like three stars or 8 out of 10)
- Implicit feedback: a page view, song play, move play, and so forth

The different kinds of feedback also determine the best metric to compare users. Once the user preference matrix has been populated, it’s necessary to compute the affinity matrix based on a similarity metric (the most common is the Euclidean one):

This matrix is built considering the rows of Upref. In particular, Upref[i] represents the preference vector (overall products) for the user i.

When the feedback is explicit, it can be helpful to keep using the Euclidean metric, but when it’s implicit, it can have a different meaning, and it can be better to pick another metric function. For example, if we use a binary encoding for the feedback (0 means no interaction and one single or multiple interactions), we could be interested in considering two users similar when they have the same amount of interactions (page views, song/movie plays, …). Therefore the Hamming distance can be more appropriate.

In this implementation, I’ve used Scikit-Learn to compute the pairwise distances, but other exciting solutions like Crab (http://muricoca.github.io/crab/). The source code is also available in this GIST.

from scipy.sparse import dok_matrix
from sklearn.metrics.pairwise import pairwise_distances

import numpy as np

# Set random seed (for reproducibility)
np.random.seed(1000)

# Create a dummy user-item dataset
nb_users = 1000
nb_products = 2500
max_rating = 5
max_rated_products = 500

X_preferences = dok_matrix((nb_users, nb_products), dtype=np.uint8)

for i in range(nb_users):
    # Extract n random products
    n_products = np.random.randint(0, max_rated_products+1)
    products = np.random.randint(0, nb_products, size=n_products)
    
    # Populate preference sparse matrix
    for p in products:
        X_preferences[i, p] = np.random.randint(0, max_rating+1)
        
# Compute pairwise distances
distance_matrix = pairwise_distances(X_preferences, metric='euclidean')

# Sort distances
sorted_distances = np.argsort(distance_matrix, axis=1)

test_user=500

# Take the top-10 simular users
for d in sorted_distances[test_user][::-1][0:10]:
    print(d)

Once the list of top neighbors has been determined, it’s possible to suggest the top-k products, selecting them from the union of the most rated “neighbor” products:

The predicted rating for a suggested item can be obtained as the weighted average of the neighborhood. For example, it’s possible to determine a weigh vector using a softmax function:

Alpha is a normalization parameter introduced to avoid an overflow with too large distances. The weight vector is referred to the subset of suggested items. The average rating is obtained as follows:

Of course, this is only possible when the reader can employ different techniques to compute the expected rating. For example, it’s possible to use a radial basis function with a low variance to penalize the contributions given by the most distant elements. At the same time, it’s possible to implement different affinities for the same task, evaluating the difference in the suggested products. A recommender isn’t seldom based on an “off-the-shelf” solution because of the intrinsic diversities in several business contexts. Therefore, I always suggest trying various alternatives before making the final decision.

If you like this post, you can always donate to support my activity! One coffee is enough!

Share this post on:

A model-free collaborative recommendation system in 20 lines of Python code

Related Posts