Rocchio Algorithm

Rocchio Algorithm

The Rocchio Algorithm is a classic technique for implementing Relevance Feedback. It updates an initial query vector by moving it toward the centroid of known relevant documents and away from the centroid of known non-relevant documents.

Rocchio Query Update

where:

  • — the modified query vector
  • — the original query vector
  • — set of known relevant documents
  • — set of known non-relevant documents
  • — weights (hyperparameters) controlling the balance between original intent, positive feedback, and negative feedback.

Vector Space Navigation

Imagine the Vector Space Model where points represent documents. A user’s query is also a point. If the user marks some results as “good,” Rocchio literally “nudges” the query point closer to those good results. Usually, we weigh relevant documents more heavily than non-relevant ones ().

Pseudo-Relevance Feedback (PRF)

Since users rarely provide explicit feedback, we often use Pseudo-Relevance Feedback:

  1. Run initial search.
  2. Assume the top documents are relevant (no ).
  3. Apply Rocchio to expand the query.
  4. Run second search with .

Connections

  • Context: Used within the Vector Space Model (VSM).
  • Related concepts: BM25 (which typically doesn’t use vectorcentroids), Query Expansion.
  • Modern link: Modern PRF methods often use Transformers to expand the query instead of vector arithmetic.

Appears In