User-Item Interaction Matrix
Definition
User-Item Interaction Matrix
The user-item interaction matrix (or ratings matrix) is an matrix recording observed interactions between users and items. Each entry holds a user ‘s feedback on item — an explicit rating (e.g., – stars), a signed preference (), or a binary implicit signal (clicked / not clicked). It is the core data structure of Collaborative Filtering: predictions leverage the collective user-item interaction data of a large pool of users, rather than item content.
Formally, given users and items , the matrix tabulates which items each user interacted with so the recommender can find the unseen items most pertinent to a given user .
Intuition
A mostly-empty grid we have to fill in
Picture a spreadsheet with users down the rows and items across the columns. Most cells are blank — any one user has touched only a tiny fraction of the catalog. The recommendation task is exactly predicting the missing entries: estimate how much user would like the items they have not yet seen, then rank those by predicted value to produce a Top-N Recommendation list.
The matrix is “collaborative” because the blanks are filled by borrowing signal from other users/items: if Lucy and Eric rate the same movies similarly, Lucy’s known ratings predict Eric’s unknown ones. The structure of the observed entries (who liked what) is the only input pure CF needs — no text, no images, no item metadata.
Mathematical Formulation
The interaction matrix is the object on which all CF predictors operate. With explicit ratings :
where:
- — number of users; — number of items
- — set of observed (user, item) pairs; (the matrix is sparse)
- — observed feedback: a star rating, a signed like (), or implicit
- — a missing entry; recommendation = predicting for these cells
Memory-based read of — User-based Rating Prediction fills a blank by averaging the column’s ratings over ‘s nearest neighbors (users who did rate ):
Model-based read of — Matrix Factorization approximates the whole matrix by a low-rank product of latent user/item factors:
where (each row a user factor), (each row an item factor), and is the number of latent concepts. A rating is reconstructed as the dot product of the corresponding row of and row of ; in the lecture’s rank-2 toy example the two latent dimensions turn out interpretable (“history” vs. “romance”).
Key Properties / Variants
- Sparsity. The dominant practical feature: nearly all of the cells are missing, which drives the choice of algorithm and causes the cold-start problem (a new user/item is an all-blank row/column). See Data Sparsity and Cold Start Problem.
- Feedback type. Entries encode either Explicit Feedback (numeric ratings; a blank means “not rated”) or Implicit Feedback (clicks/plays/purchases; a blank is ambiguous — disinterest or simply not-yet-seen). Implicit matrices are usually treated as binary positives + negative sampling.
- Asymmetry of the two reads. Slicing by rows gives User-based Collaborative Filtering (similar users); slicing by columns gives Item-based Collaborative Filtering (similar items).
- Missing-not-at-random. Observed entries are biased — popular items and active users are over-represented — so the blanks are not a random sample. This connects to Popularity Bias and to evaluating beyond accuracy.
- Generalization by neural models. Neural Collaborative Filtering also takes one-hot user/item indices into but replaces the dot product with a learned non-linear function; classic MF is a special case of it.
- Filling-in procedure (memory-based prediction for one blank cell):
Predict R[u, i] for a missing entry:
──────────────────────────────────────
candidates ← { v : R[v, i] is observed, v ≠ u }
for each v in candidates:
s[v] ← similarity( row_u(R), row_v(R) ) # over co-rated items
N ← top-k users v by s[v] # nearest neighbors N_i(u)
R_hat[u, i] ← (1 / |N|) * Σ_{v in N} R[v, i] # (optionally weight by s[v])
return R_hat[u, i]Connections
- Core input to: Collaborative Filtering, Matrix Factorization, Neural Collaborative Filtering
- Read row-wise / column-wise: User-based Collaborative Filtering, Item-based Collaborative Filtering
- Memory-based predictor over it: User-based Rating Prediction, Neighborhood-based Collaborative Filtering
- Entry semantics: Explicit Feedback, Implicit Feedback
- Pathologies of the matrix: Data Sparsity, Cold Start Problem, Popularity Bias
- Output produced from filled entries: Top-N Recommendation