Query Likelihood Model
Query Likelihood Model
The Query Likelihood (QL) model ranks documents by the probability that the document’s language model would generate the query. Instead of asking “is this document relevant?”, it asks “how likely is this query given this document?”
Query Likelihood
Log form (for ranking):
Smoothing
Raw maximum likelihood estimation () assigns zero probability to terms not in the document. Smoothing fixes this:
Jelinek-Mercer (Linear Interpolation)
Dirichlet Prior
Smoothing Intuition
- is the collection language model (background probability)
- Smoothing says: “If this term isn’t in the document, fall back to how common it is in the whole collection”
- Dirichlet : pseudo-count. Large → more smoothing (trust the collection more)
Connections
- Alternative to: BM25, TF-IDF
- Uses: Smoothing
- Foundation of: Language Model for IR