Plackett-Luce Model
Definition
Plackett-Luce (PL) Model
The Plackett-Luce model is a probability distribution over permutations (rankings) of a set of items, parameterized by a real-valued score per item. It defines — the probability of producing ranking — as a sequence of softmax choices made without replacement: at each rank position you pick the next item with probability proportional to among the items not yet placed.
In Listwise LTR it is the probabilistic foundation of ListNet and ListMLE: instead of scoring documents independently (pointwise) or in pairs (pairwise), PL gives a single coherent distribution over the whole ranking, which we can then fit by maximum likelihood / cross-entropy.
Intuition
Think of repeatedly drawing items from a bag, but biased: an item with a higher score is more likely to be drawn next. Once drawn, it is removed, and the remaining items renormalize.
- It is the softmax extended to ranking. A single softmax answers “which item is best?”; PL answers “which item is best, then which of the rest is best, then…” — applying softmax to a shrinking candidate set at every step.
- The model is a generative story for a ranking, so a model that outputs scores implicitly defines a distribution over all orderings. Training = make the correct ranking probable under this distribution.
- It satisfies Luce’s choice axiom (independence of irrelevant alternatives): the relative odds of choosing over at any step depend only on and , not on what other items are present.
Mathematical Formulation
Full ranking probability. For a permutation of items with scores :
where:
- — the item placed at rank position
- — score of that item (in IR, the model’s relevance score for the document)
- — normalizer over the items still remaining at step (positions through ); the already-placed items are dropped from the denominator
- the product runs over all choice steps (the last factor is always )
Top-1 / first-place probability. The marginal probability that item is ranked first is a plain softmax over all items:
where:
- — the (positive) “strength” of item ; PL is often written with strengths , so
- — partition over the full candidate set
Worked example (3 documents, ranking ):
Use in losses. The two main listwise losses are likelihoods under PL:
- ListMLE maximizes the likelihood of the ground-truth ranking (labels sorted descending), i.e. minimizes :
- ListNet keeps only the top-1 PL marginal and minimizes cross-entropy between the label-softmax and the score-softmax :
where is the relevance label of document and the model’s predicted score.
Key Properties / Variants
- Generative sampling procedure (how to draw a ranking from PL):
Algorithm: Sample a ranking from Plackett-Luce(s)
──────────────────────────────────────────────────
Input: scores s_1..s_n
Remaining R ← {1, ..., n}
For k = 1 to n:
# softmax over the items still in R
For each i in R: p_i ← exp(s_i) / Σ_{j in R} exp(s_j)
Sample item m ~ Categorical(p) # pick next-ranked item
π(k) ← m
R ← R \ {m} # sample without replacement
Return ranking π = (π(1), ..., π(n))- Softmax = top-1 PL. A single softmax is exactly the first step of PL; full PL is “softmax with replacement removed”, applied times.
- Scale by exponentiation, shift-invariant. Adding a constant to every score leaves unchanged (the cancels in each ratio), so scores are only identifiable up to an additive constant.
- Luce’s choice axiom / IIA. Pairwise odds are independent of the other alternatives — a known limitation when context matters.
- Factorial blow-up. There are permutations, so the full distribution is intractable for large (e.g. ). Practical fixes: ListNet’s top-1 truncation, or top- PL (only model the first choice steps).
- Ties / multiple valid orders. When several items share a label, any ordering among them is a valid ground truth; ListMLE handles this since the loss decomposes per step.
- Relation to softmax policies in RL. The same Gibbs/softmax form underlies the Softmax Policy used in Policy Gradient methods; ranking under PL can be seen as a sequential (without-replacement) softmax policy over documents.
- Complexity. Evaluating is given a sorted list (precompute suffix sums of ), plus to sort by label.
Connections
- Foundation of: Listwise LTR — specifically ListNet (top-1 PL) and ListMLE (full-ranking PL likelihood)
- Generalizes: the softmax / Softmax Policy (top-1 PL is a single softmax)
- Contrast with: Pointwise LTR (independent scores) and Pairwise LTR / RankNet (pairwise comparisons), neither of which models the full permutation
- Alternative listwise objective: metric-based losses like ApproxNDCG and LambdaRank / LambdaMART that approximate NDCG directly instead of fitting a permutation likelihood
- Applied within: Learning to Rank, evaluated with NDCG and MAP