RECOMMENDATION ENGINE CONTENT-BASED FILTERING & COLLABORATIVE FILTERING
RECOMMENDATION
ENGINE content-based filtering & COLLABORATIVE
FILTERING
Recommendation engines are probably among the best types of machine
learning model known to the general public. Even if people do not know exactly
what a recommendation engine is, they have most likely experienced one through
the use of popular websites such as Amazon, Netflix, YouTube, Twitter,
LinkedIn, and Facebook. Recommendations are a core part of all these
businesses, and in some cases, they drive significant percentages of their
revenue.
The idea behind recommendation engines is to predict what people might
like and to uncover relationships between items to aid in the discovery process
(in this way, it is similar and, in fact, often complementary to search
engines, which also play a role in discovery). However, unlike search engines,
recommendation engines try to present people with relevant content that they
did not necessarily search for or that they might not even have heard of.
Typically, a recommendation engine tries to model the connections
between users and some type of item. If we can do a good job of showing our
users movies related to a given movie, we could aid in discovery and navigation
on our site, again improving our users' experience, engagement, and the relevance
of our content to them.
However, recommendation engines are not limited to movies, books, or
products. The techniques we will explore in this article can be applied to just
about any user-to-item relationship as well as user-to-user connections, such
as those found on social networks, allowing us to make recommendations such as
people you may know or who to follow.
In the immortal words of Steve Jobs - “a lot of times, people don’t know what they want until you show it to
them.”
The customer personalization journeys of Amazon and Netflix
demonstrate just how powerful recommendation engines can be. See how these
online giants built cutting edge recommendation engines that keep subscribers
coming back for more.
Amazon
Netflix
Google Image Search
·
A recommendation engine can engage
audiences with the right content
·
A recommendations engine can customize
ads or sponsored content for a user based on their preferences
·
A recommendations engine for publishing website
e.g. - https://boomtrain.com/
Types of recommendation
models
Recommender systems are widely studied, and there
are many approaches used, but there are two that are probably most prevalent:
·
Content-based
filtering
·
Collaborative
filtering
Ø
Item-based collaborative filtering
Ø
User- based collaborative
filtering
Content-based filtering
Assume a “real world” case: “John’s favourite cake is
Napoleon (left picture below). He went to a shop for it, but such cakes were
sold out. John asked a marketer to recommend something similar and was
recommended a Napoleon torte (right picture below) that has the same
ingredients. John bought it.”
This is an example of pure content-based filtering in the real world. The
marketer has recommended the torte considering the ingredients similarity. A
content-based filtering system has similar intuition behind it.
Content-based
(CB) filtering systems are
systems recommending items similar to items a user liked in the past.
Before we proceed, let me define a couple of terms:
- Item would refer to content whose attributes are used in the recommender
models. These could be movies, documents, book etc.
- Attribute refers to the characteristic of an item. A movie tag, words
in a document are examples.
These systems focus on algorithms, which assemble user’s preferences
into user’s profiles and all items information into items’
profiles. Then they recommend those items close to the user by similarity
of their profiles.
A
user profile might be seen as a set of
assigned keywords (terms, features) collected by algorithm from items found
relevant (or interesting) by the user.
An
item profile is a set of assigned keywords
(terms, features) of the item itself.
Actual
profiles building process is handled by various information retrieval or
machine learning techniques. For instance, the most frequent terms in the
document describing an item can represent the item’s profile.
Now
the example can be reformulated in recommender terms: John liked cake Napoleon,
its ingredients formed John’s user profile. The system reviewed other available
item profiles and found that the most similar is the “torte Napoleon” item
profile. The similarity is high because both cake and torte have the same
ingredients. This was the reason for the recommendation.
The principal advantage
of the content-based filtering approach is in its nature: it can start to recommend as soon as there is
information about items available. The latter means that a recommender system
does not require any user input to recommend.
How do Content Based Recommender
Systems work?
A
content based recommender works with data that the user provides, either
explicitly (rating) or implicitly (clicking on a link). Based on that data, a
user profile is generated, which is then used to make suggestions to the user.
As the user provides more inputs or takes actions on the recommendations, the
engine becomes more and more accurate.
Collaborative filtering
This is Collaborative
Filtering (CF) approach –
recommendations were given by others who have similar tastes in the past, but who already experienced an item
yet unknown to the current user.
Collaborative filtering systems require users to express opinions
on items. They collect opinions and recommend items based on
people’s opinions similarity. Those who agree most are the contributors.
Now the example can be reformulated again: John asked a recommendation
about “best fit” drink. Collaborative filtering system reviewed opinions only
those from people who have tried and liked Napoleon torte in the past. The
recommended “Mint tea” is merely the highly rated item among others by these
people.
Collaborative
filtering systems usually review more than just one common item to define a set
of users, which influence results. For example, John should been tried many
various cakes, and his friends also must tried the same cakes in past, to get
better recommendation (Movielens requires at least 20 movies to be rated before
it produces recommendations [Movielens.org]).
Item-based
collaborative filtering
Item
based collaborative filtering is a model-based algorithm for recommender
engines. In item based collaborative filtering similarities between items are
calculated from rating-matrix. And based upon these similarities, user’s
preference for an item not rated by him is calculated. Here is a step-by-step
worked out example for four users and three items. We will consider the
following sample data of preference of four users for three items:
ID |
user |
item |
rating |
241 |
u1 |
m1 |
2 |
222 |
u1 |
m3 |
3 |
276 |
u2 |
m1 |
5 |
273 |
u2 |
m2 |
2 |
200 |
u3 |
m1 |
3 |
229 |
u3 |
m2 |
3 |
231 |
u3 |
m3 |
1 |
239 |
u4 |
m2 |
2 |
286 |
u4 |
m3 |
2 |
Step
1: Write the user-item ratings data in
a matrix form. The above table gets rewritten as follows:
m1 |
m2 |
m3 |
|
u1 |
2 |
? |
3 |
u2 |
5 |
2 |
? |
u3 |
3 |
3 |
1 |
u4 |
? |
2 |
2 |
Here
rating of user u1 for item m3 is 3. There is no rating for item m2 by user u1.
And no rating also for item m3 by user u2.
Step
2: We will now create an item-to-item
similarity matrix. The idea is to calculate how similar an item is to
another item. There are a number of ways of calculating this. We will use cosine similarity measure. To calculate similarity between items m1 and m2, for example,
look at all those users who have rated both these items. In our case, both m1
and m2 have been rated by users u2 and u3. We create two item-vectors, v1 for
item m1 and v2 for item m2, in the user-space of (u2, u3) and then find the
cosine of angle between these vectors. A zero angle or overlapping vectors with
cosine value of 1 means total similarity (or per user, across all items, there is
same rating) and an angle of 90 degree would mean cosine of 0 or no similarity.
Thus, the two item-vectors would be,
v1 = 5 u2 + 3 u3
v2 = 3 u2 + 3 u3
The
cosine similarity between the two vectors, v1 and v2, would then be:
cos (v1,v2) = (5*3 + 3*3)/sqrt[(25
+ 9)*(9+9)] = 0.76
Similarly,
to calculate similarity between m1 and m3, we consider only users u1 and u3 who
have rated both these items. The two item vectors, v1 for item m1 and v3 for
item m3, in the user-space would be as follows:
v1 = 2 u1 + 3 u3
v3 = 3 u1 + 1 u3
The
cosine similarity measure between v1 and v3 is:
cos (v1,v3) = (2*3 + 3*1)/sqrt[(4
+ 9)*(9+1)] = 0.78
We
can similarly calculate similarity between items m2 and m3 using ratings given
to both by users u3 and u4. The two item-vectors v3 and v4 would be:
v2 = 3 u3 + 2 u4
v3 = 1 u3 + 2 u4
And
cosine similarity between them is:
cos (v2,v3) = (3*1 + 2*2)/sqrt[(9
+ 4)*(1 + 4)] = 0.86
We
now have the complete item-to-item similarity matrix as follows:
m1 |
m2 |
m3 |
|
m1 |
1 |
0.76 |
0.78 |
m2 |
0.76 |
1 |
0.86 |
m3 |
0.78 |
0.86 |
1 |
Step
3: For each user, we next predict his
ratings for items that he had not rated. We will calculate rating for user u1
in the case of item m2 (target item). To calculate this we weigh the
just-calculated similarity-measure between the target item and other items that
user has already rated. The weighing factor is the ratings given by the user to
items already rated by him. We further scale this weighted sum with the sum of
similarity-measures so that the calculated rating remains within a predefined
limits. Thus, the predicted rating for item m2 for user u1 would be calculated
using similarity measures between (m2, m1) and (m2, m3) weighted by the
respective ratings for m1 and m3:
Rating = (2 * 0.76 + 3 * 0.86)/
(0.76+0.86) = 2.53
Recommender
engine using item based collaborative filtering can be constructed using R
package recommenderlab.
###############
Collaborative filtering in R (Recommendation
Engine) ################
R Script
https://1drv.ms/u/s!AmfDeS0cl4gZhSoklJF8JfP4Z0oy
Train Data
Set for Model
https://1drv.ms/u/s!AmfDeS0cl4gZhSuzkbZ6h0E6-3Gt
Test Data
Set for Model
https://1drv.ms/u/s!AmfDeS0cl4gZhSwlVEX3l7jli3Kf
# Set data path as per your data file
(for example: "c://abc//" )
setwd("F:/Data Science/Data
Science/Ashish/Recommendation Engine Dataset")
# If not
installed, first install following three packages in R
#install.packages("recommenderlab")
library(recommenderlab)
library(reshape2)
library(ggplot2)
# Read
training file along with header
tr<-read.csv("train_v2.csv",header=TRUE)
# Just look
at first few lines of this file
head(tr)
# Remove
'id' column. We do not need it
tr<-tr[,-c(1)]
# Check, if
removed
tr[tr$user==1,]
# Using
acast to convert above data as follows:
# m1
m2 m3 m4
#
u1 3 4 2 5
#
u2 1 6 5
#
u3 4 4 2 5
g<-acast(tr,
user ~ movie)
# Check the
class of g
class(g)
# Convert
it as a matrix
R<-as.matrix(g)
# Convert R
into realRatingMatrix data structure
#
realRatingMatrix is a recommenderlab sparse-matrix like data-structure
r <-
as(R, "realRatingMatrix")
r
# view r in
other possible ways
as(r,
"list") # A list
as(r,
"matrix") # A sparse matrix
# I can turn
it into data-frame
head(as(r,
"data.frame"))
# normalize
the rating matrix
r_m <-
normalize(r)
r_m
as(r_m,
"list")
# Draw an
image plot of raw-ratings & normalized ratings
# A
column represents one specific movie and ratings by users
#
are shaded.
#
Note that some items are always rated 'black' by most users
#
while some items are not rated by many users
#
On the other hand a few users always give high ratings
#
as in some cases a series of black dots cut across items
image(r,
main = "Raw Ratings")
image(r_m,
main = "Normalized Ratings")
# Can also
turn the matrix into a 0-1 binary matrix
r_b <-
binarize(r, minRating=1)
as(r_b,
"matrix")
# Create a
recommender object (model)
#
Run anyone of the following four code lines.
#
Do not run all four
#
They pertain to four different algorithms.
#
UBCF: User-based collaborative filtering
#
IBCF: Item-based collaborative filtering
#
Parameter 'method' decides similarity measure
#
Cosine or Jaccard
rec=Recommender(r[1:nrow(r)],method="UBCF",
param=list(normalize = "Z-score",method="Cosine",nn=5,
minRating=1))
rec=Recommender(r[1:nrow(r)],method="UBCF",
param=list(normalize = "Z-score",method="Jaccard",nn=5,
minRating=1))
rec=Recommender(r[1:nrow(r)],method="IBCF",
param=list(normalize =
"Z-score",method="Jaccard",minRating=1))
rec=Recommender(r[1:nrow(r)],method="POPULAR")
# Depending
upon your selection, examine what you got
print(rec)
names(getModel(rec))
getModel(rec)$nn
############Create
predictions#############################
# This
prediction does not predict movie ratings for test.
#
But it fills up the user 'X' item matrix so that
#
for any userid and movieid, I can find predicted rating
#
dim(r) shows there are 6040 users (rows)
#
'type' parameter decides whether you want ratings or top-n items
#
get top-10 recommendations for a user, as:
#
predict(rec, r[1:nrow(r)], type="topNList", n=10)
recom <- predict(rec, r[1:nrow(r)],
type="ratings")
recom
##########
Examination of model & experimentation #############
##########
This section can be skipped #########################
# Convert
prediction into list, user-wise
as(recom,
"list")
# Study and
Compare the following:
as(r,
"matrix")[1:10,1:10] # Has lots of NAs. 'r'
is the original matrix
as(recom,
"matrix") # Is full of ratings. NAs disappear
as(recom,
"matrix")[1:10,1:10] # Show ratings for all users for items 1 to 10
as(recom,
"matrix")[5,3] # Rating for user 5 for item at index 3
as.integer(as(recom,
"matrix")[5,3]) # Just get the integer value
as.integer(round(as(recom,
"matrix")[6039,8])) # Just get the correct integer value
as.integer(round(as(recom,
"matrix")[368,3717]))
# Convert
all your recommendations to list structure
rec_list<-as(recom,"list")
head(summary(rec_list))
# Access
this list. User 2, item at index 2
rec_list[[2]][2]
rec_list[[1837]][4]
# Convert
to data frame all recommendations for user 1
u1<-as.data.frame(rec_list[[1]])
attributes(u1)
class(u1)
head(u1)
# Create a
column by name of id in data frame u1 and populate it with row names
u1$id<-row.names(u1)
# Check
movie ratings are in column 1 of u1
u1
# Now
access movie ratings in column 1 for u1
u1[u1$id==3952,]
##########
Create submission File from model #######################
# Read test
file
test<-read.csv("test_v2.csv",header=TRUE)
head(test)
# Get
ratings list
rec_list<-as(recom,"list")
head(summary(rec_list))
ratings<-NULL
# For all
lines in test file, one by one
for ( u in
1:length(test[,2]))
{
# Read userid and movieid from columns 2 and
3 of test data
userid <- test[u,2]
movieid<-test[u,3]
# Get as list & then convert to data
frame all recommendations for user: userid
u1<-as.data.frame(rec_list[[userid]])
# Create a (second column) column-id in the
data-frame u1 and populate it with row-names
# Remember (or check) that rownames of u1
contain are by movie-ids
# We use row.names() function
u1$id<-row.names(u1)
# Now access movie ratings in column 1 of u1
x= u1[u1$id==movieid,1]
# print(u)
# print(length(x))
# If no ratings were found, assign 0. You
could also
# assign user-average
if (length(x)==0)
{
ratings[u] <- 0
}
else
{
ratings[u] <-x
}
}
length(ratings)
tx<-cbind(test[,1],round(ratings))
# Write to
a csv file: submitfile.csv in your folder
write.table(tx,file="submitfile.csv",row.names=FALSE,col.names=FALSE,sep=',')
0 Comments
Be first to comment on this post.