uniCOIL

uniCOIL

uniCOIL is a learned sparse retrieval model that produces a single-vector representation of a document where each dimension corresponds to a term in the vocabulary. It simplifies earlier models like DeepImpact and COIL by assigning a single importance weight to each term (token) using BERT.

uniCOIL Retrieval

For a query and document , the relevance score is the inner product of their sparse weight vectors:

where:

  • is the weight of term in document , predicted by a BERT encoder .
  • Typically, is simplified to a binary indicator (1 if term is in query, 0 otherwise).

Sparse Neural Weights

Modern neural IR often uses “dense” vectors (long lists of numbers) that are hard to search quickly. uniCOIL keeps the “sparse” nature of classic IR (words in an index) but uses BERT to decide the “volume” of each word. If a word is very important in a sentence, it gets a loud volume (high weight); if it’s just filler, it gets silenced (zero weight).

Key Features

  • Simplification: Unlike COIL (which stores multiple vectors per term), uniCOIL uses one weight per term, making it compatible with standard inverted indexes like Lucene.
  • Effective Expansion: When combined with DocT5Query, it can assign weights to terms that weren’t in the original text but are relevant to the topic.
  • Extreme Speed: Because it calculates a simple sum of weights during retrieval, it is as efficient as BM25 but with the semantic understanding of BERT for IR.

Connections

Appears In