DPR

DPR (Dense Passage Retrieval)

DPR is a retrieval model that maps queries and documents into a shared dense vector space using BERT-based bi-encoders. Retrieval is performed using maximum inner product search (MIPS) on these embeddings.

Scoring via Dual Encoders

The relevance score is the dot product of two independently computed embeddings:

where:

  • — Dense representation of the query (Query Encoder)
  • — Dense representation of the document (Document Encoder)
  • Both encoders are typically BERT-based.

Key Components

  1. Bi-Encoder Architecture: Unlike Cross-Encoders, the query and document do not see each other during encoding.
  2. Indexing: Document embeddings are pre-computed and stored in an ANN index (e.g., FAISS).
  3. Training: Uses Contrastive Learning with an InfoNCE-like objective.
  4. Negative Sampling: Crucially uses In-batch Negatives — for a batch of queries and their relevant documents, the other documents in the batch serve as negatives for each query.

Semantic Search

Because it uses dense vectors, DPR can find documents that are topically relevant but don’t share any words with the query (solving the “lexical mismatch” problem of BM25).

Connections

Appears In