Neural Collaborative Filtering
Definition
Neural Collaborative Filtering (NCF)
Neural Collaborative Filtering [He et al., 2017] is a neural framework for top-n recommendation that replaces the fixed inner-product interaction of Matrix Factorization with a learnable neural function. Each user and item is one-hot encoded, mapped through an Embedding Layer to dense latent vectors , and the interaction is modeled by stacked neural CF layers with non-linear activations. NCF is trained on Implicit Feedback as a binary classification problem, and famously subsumes MF as a special case.
Intuition
Why a neural function instead of a dot product
Plain MF fixes the user–item interaction to be the inner product — a linear, symmetric combination of latent dimensions. This caps its expressiveness: two users can be close in the latent space yet have genuinely non-linear preference overlaps that a dot product cannot represent (the “MF limited to linear relationships” problem).
NCF keeps the Embedding Layer idea (one-hot dense latent vector) but lets a MLP learn the interaction function from data. With non-linear activations it can model complex user–item patterns; because the embeddings and the interaction network are trained jointly, NCF can also fold in heterogeneous content and sequential signals. The punchline of the paper: if you cripple the neural layers down to element-wise multiplication with fixed unit weights and identity output, you recover MF exactly — so MF is one point in the NCF hypothesis space.
Mathematical Formulation
The NCF predictor maps user/item one-hot vectors to a relevance score :
where:
- , — user / item embedding matrices ( users, items, latent dims)
- — dense user / item latent vectors (output of the Embedding Layer)
- — concatenation, the input to the first neural CF layer
- — the stacked neural CF layers (e.g. MLP with ReLU), giving the non-linearity
- — output layer mapping to a score, typically with sigmoid
- — predicted score; — target label
Training as binary classification. Treat if is observed/relevant for , else . For Implicit Feedback the loss is binary cross-entropy over observed positives and sampled negatives :
where:
- — set of observed (positive) interactions
- — set of unobserved instances drawn by Negative Sampling (reduces the cost of all unobserved pairs)
- For Explicit Feedback, a weighted square loss is used instead
MF as a special case (GMF). Replace the neural CF layers with element-wise multiplication and a fixed output, and NCF reduces exactly to the MF dot product:
where is element-wise product, the output activation, and a fixed unit (all-ones) weight vector. With a learnable and non-linear this becomes Generalized Matrix Factorization (GMF).
Key Properties / Variants
- Generality. NCF is a framework, not a single model: the interaction network is a design choice. MF is recovered by element-wise product + fixed unit output + identity activation (slide 46).
- GMF — Generalized Matrix Factorization: keeps element-wise product but learns and uses a non-linear output activation.
- MLP — feeds the concatenation through a deep stack with ReLU; learns an arbitrary interaction function rather than a product.
- NeuMF — fuses GMF and MLP, each with its own separate embeddings, concatenating their last hidden layers before the output. Combines MF’s linear modeling with the MLP’s non-linearity.
- Loss by feedback type. Binary cross-entropy for Implicit Feedback; weighted square loss for Explicit Feedback.
- Negative Sampling. Unobserved pairs vastly outnumber observed ones; sample a few negatives per positive instead of using all unobserved instances.
- Strengths over MF. Non-linear interactions, can ingest heterogeneous content (text/image/audio), and is trained end-to-end. Empirically outperformed strong baselines on two public datasets in the original paper.
- Caveat (Reproducibility). Dacrema et al. (2019) found many neural recommenders, NCF among the era’s wave, were hard to reproduce and often did not beat well-tuned simple baselines — always tune baselines and never tune on the test set.
Forward pass (NeuMF) in pseudo-code:
Algorithm: NeuMF forward pass
─────────────────────────────
Input: user id u, item id i
# GMF branch (its own embeddings)
p_u_gmf ← P_gmf[u]; q_i_gmf ← Q_gmf[i]
z_gmf ← p_u_gmf ⊙ q_i_gmf # element-wise product (vector)
# MLP branch (its own embeddings)
p_u_mlp ← P_mlp[u]; q_i_mlp ← Q_mlp[i]
z ← concat(p_u_mlp, q_i_mlp)
for layer ℓ = 1..X:
z ← ReLU(W_ℓ · z + b_ℓ) # neural CF layers
z_mlp ← z
# Fusion + output
ŷ_ui ← σ( hᵀ · concat(z_gmf, z_mlp) ) # sigmoid → score in [0,1]
return ŷ_uiConnections
- Generalizes: Matrix Factorization (recovered as the GMF special case with fixed unit output)
- Built on: Embedding Layer, Multi-Layer Perceptron, Neural Networks
- A model-based form of: Collaborative Filtering
- Trained with: Implicit Feedback / Explicit Feedback, Negative Sampling
- Motivated by limits of linear: Matrix Factorization’s fixed inner product
- Successor paradigms for ranking: Sequential Recommendation, Generative Recommendation
- Evaluation caveat: Reproducibility