Study Notes

❯

❯

COIL

Jun 06, 20262 min read

neural-ir
learned-sparse-retrieval

COIL

COIL (Contextualized Inverted List)

COIL is a retrieval model that combines the efficiency of exact lexical matching with the power of contextualized token representations. It stores low-dimensional BERT-based token embeddings directly in an inverted index, allowing for “matching by embedding” within the framework of traditional sparse retrieval.

The Bridge Between Sparse and Dense

Sparse (BM25): Matches exact tokens, but forgets context (polysemy).

Dense (DPR): Captures global context, but might miss exact keyword matches.

COIL: Matches the same token (lexical) but checks if they have similar embeddings (contextual). It asks: “Is the word ‘bank’ in the query used in the same sense as ‘bank’ in the document?”

Key Mechanism

Encoding: Each token in the corpus is encoded by a BERT-like transformer into a contextual vector.
Indexing: The inverted index stores entries in the form: term -> (doc_id, vector).
Retrieval: At query time, the system finds documents containing the query terms and computes the similarity (e.g., dot product) between query token vectors and document token vectors.

Connections

Comparison: A more efficient alternative to ColBERT, which uses late interaction across all tokens.
Category: Learned Sparse Retrieval.
Successors: Influenced models like SPLADE which further sparsify the representation.

Appears In

IR-L07 - Learned Sparse Retrieval

Graph View

COIL
Key Mechanism
Connections
Appears In

Backlinks

Transformers
uniCOIL

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community