Query Likelihood Model

Query Likelihood Model

The Query Likelihood (QL) model ranks documents by the probability that the document’s language model would generate the query. Instead of asking “is this document relevant?”, it asks “how likely is this query given this document?”

Query Likelihood

Log form (for ranking):

Smoothing

Raw maximum likelihood estimation () assigns zero probability to terms not in the document. Smoothing fixes this:

Jelinek-Mercer (Linear Interpolation)

Dirichlet Prior

Smoothing Intuition

  • is the collection language model (background probability)
  • Smoothing says: “If this term isn’t in the document, fall back to how common it is in the whole collection”
  • Dirichlet : pseudo-count. Large → more smoothing (trust the collection more)

Connections

Appears In