Neural Collaborative Filtering

Definition

Neural Collaborative Filtering (NCF)

Neural Collaborative Filtering [He et al., 2017] is a neural framework for top-n recommendation that replaces the fixed inner-product interaction of Matrix Factorization with a learnable neural function. Each user and item is one-hot encoded, mapped through an Embedding Layer to dense latent vectors , and the interaction is modeled by stacked neural CF layers with non-linear activations. NCF is trained on Implicit Feedback as a binary classification problem, and famously subsumes MF as a special case.

Intuition

Why a neural function instead of a dot product

Plain MF fixes the user–item interaction to be the inner product — a linear, symmetric combination of latent dimensions. This caps its expressiveness: two users can be close in the latent space yet have genuinely non-linear preference overlaps that a dot product cannot represent (the “MF limited to linear relationships” problem).

NCF keeps the Embedding Layer idea (one-hot dense latent vector) but lets a MLP learn the interaction function from data. With non-linear activations it can model complex user–item patterns; because the embeddings and the interaction network are trained jointly, NCF can also fold in heterogeneous content and sequential signals. The punchline of the paper: if you cripple the neural layers down to element-wise multiplication with fixed unit weights and identity output, you recover MF exactly — so MF is one point in the NCF hypothesis space.

Mathematical Formulation

The NCF predictor maps user/item one-hot vectors to a relevance score :

where:

  • , — user / item embedding matrices ( users, items, latent dims)
  • — dense user / item latent vectors (output of the Embedding Layer)
  • — concatenation, the input to the first neural CF layer
  • — the stacked neural CF layers (e.g. MLP with ReLU), giving the non-linearity
  • — output layer mapping to a score, typically with sigmoid
  • — predicted score; — target label

Training as binary classification. Treat if is observed/relevant for , else . For Implicit Feedback the loss is binary cross-entropy over observed positives and sampled negatives :

where:

  • — set of observed (positive) interactions
  • — set of unobserved instances drawn by Negative Sampling (reduces the cost of all unobserved pairs)
  • For Explicit Feedback, a weighted square loss is used instead

MF as a special case (GMF). Replace the neural CF layers with element-wise multiplication and a fixed output, and NCF reduces exactly to the MF dot product:

where is element-wise product, the output activation, and a fixed unit (all-ones) weight vector. With a learnable and non-linear this becomes Generalized Matrix Factorization (GMF).

Key Properties / Variants

  • Generality. NCF is a framework, not a single model: the interaction network is a design choice. MF is recovered by element-wise product + fixed unit output + identity activation (slide 46).
  • GMF — Generalized Matrix Factorization: keeps element-wise product but learns and uses a non-linear output activation.
  • MLP — feeds the concatenation through a deep stack with ReLU; learns an arbitrary interaction function rather than a product.
  • NeuMF — fuses GMF and MLP, each with its own separate embeddings, concatenating their last hidden layers before the output. Combines MF’s linear modeling with the MLP’s non-linearity.
  • Loss by feedback type. Binary cross-entropy for Implicit Feedback; weighted square loss for Explicit Feedback.
  • Negative Sampling. Unobserved pairs vastly outnumber observed ones; sample a few negatives per positive instead of using all unobserved instances.
  • Strengths over MF. Non-linear interactions, can ingest heterogeneous content (text/image/audio), and is trained end-to-end. Empirically outperformed strong baselines on two public datasets in the original paper.
  • Caveat (Reproducibility). Dacrema et al. (2019) found many neural recommenders, NCF among the era’s wave, were hard to reproduce and often did not beat well-tuned simple baselines — always tune baselines and never tune on the test set.

Forward pass (NeuMF) in pseudo-code:

Algorithm: NeuMF forward pass
─────────────────────────────
Input: user id u, item id i
  # GMF branch (its own embeddings)
  p_u_gmf ← P_gmf[u];  q_i_gmf ← Q_gmf[i]
  z_gmf   ← p_u_gmf ⊙ q_i_gmf          # element-wise product (vector)
 
  # MLP branch (its own embeddings)
  p_u_mlp ← P_mlp[u];  q_i_mlp ← Q_mlp[i]
  z       ← concat(p_u_mlp, q_i_mlp)
  for layer ℓ = 1..X:
    z ← ReLU(W_ℓ · z + b_ℓ)             # neural CF layers
  z_mlp ← z
 
  # Fusion + output
  ŷ_ui ← σ( hᵀ · concat(z_gmf, z_mlp) ) # sigmoid → score in [0,1]
  return ŷ_ui

Connections

Appears In