Collaborative filtering is one of the most widely used and conceptually elegant techniques in recommender systems. It’s based on the idea that people who agreed in the past will agree again in the future, and that users will prefer items that similar users liked.
Let’s break it down into its core components and methods:
🤝 Collaborative Filtering: A Deep Dive
🧠 Core Idea
Collaborative filtering relies on user-item interactions (e.g., ratings, clicks, purchases) rather than item features or user profiles. It assumes that patterns of behavior can be used to predict future preferences.
🔹 Types of Collaborative Filtering
1. User-Based Collaborative Filtering
- Goal: Recommend items that similar users liked.
- How it works:
- Find users similar to the target user (using similarity metrics).
- Aggregate their preferences to recommend items.
- Similarity Metrics:
- Cosine similarity
- Pearson correlation
- Jaccard index (for binary data)
Example:
If User A and User B both liked sci-fi books, and User B also liked a new sci-fi novel, recommend it to User A.
2. Item-Based Collaborative Filtering
- Goal: Recommend items similar to those the user liked.
- How it works:
- Find items that are similar based on user ratings.
- Recommend items similar to those the user has interacted with.
- Similarity Metrics:
- Item-item cosine similarity
- Adjusted cosine similarity (accounts for user bias)
Example:
If a user liked "Interstellar" and "Inception", and many users who liked those also liked "Tenet", recommend "Tenet".
3. Model-Based Collaborative Filtering
Uses machine learning models to learn latent patterns in user-item interactions.
🔸 Matrix Factorization
- Decomposes the user-item matrix into lower-dimensional latent factors.
- Each user and item is represented by a vector in a latent space.
- Predicts ratings by computing the dot product of user and item vectors.
Popular Algorithms:
Algorithm | Description |
---|---|
SVD (Singular Value Decomposition) | Factorizes the rating matrix into user and item matrices |
ALS (Alternating Least Squares) | Optimizes user and item factors alternately |
NMF (Non-negative Matrix Factorization) | Ensures all latent factors are non-negative |
🧮 Mathematical Formulation (Matrix Factorization)
Let:
- ( R ) be the user-item rating matrix
- ( U \in \mathbb{R}^{n \times k} ) be the user latent matrix
- ( V \in \mathbb{R}^{m \times k} ) be the item latent matrix
Then: [ R \approx U \cdot V^T ] Where:
- ( n ) = number of users
- ( m ) = number of items
- ( k ) = number of latent features
Prediction for user ( i ) and item ( j ): [ \hat{r}_{ij} = U_i \cdot V_j^T ]
⚠️ Challenges in Collaborative Filtering
- Cold Start: Hard to recommend for new users or items with no interactions.
- Sparsity: Most users rate only a few items, leading to sparse matrices.
- Scalability: Large datasets require efficient algorithms and infrastructure.
- Bias: Popular items may dominate recommendations.
🧠 Enhancements and Extensions
- Implicit feedback (clicks, views) instead of explicit ratings
- Temporal dynamics (e.g., changing preferences over time)
- Hybrid models combining collaborative and content-based filtering
- Deep learning approaches like Neural Collaborative Filtering (NCF)
No comments:
Post a Comment