Information Retrieval

Information Retrieval

Information Retrieval (IR) is the activity of obtaining information system resources that are relevant to an information need from a collection of those resources. It involves finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections.

Core Components

  1. Document collection (corpus): The set of documents to search
  2. Query: A user’s expression of their information need
  3. Relevance: The degree to which a document satisfies the information need
  4. Ranking function: Scores and orders documents by estimated relevance

IR Pipeline

Query → [Query Processing] → [Matching/Retrieval] → [Ranking] → Ranked Results
                                     ↑
                              [Inverted Index]
                                     ↑
                     Documents → [Indexing] → [Text Processing]

Retrieval Paradigms

ParadigmExampleHow it works
Sparse retrievalBM25, TF-IDFExact term matching via Inverted Index
Dense retrievalDPR, ColBERTSemantic matching via learned embeddings
Learned sparseSPLADENeural term weights in inverted index
GenerativeDSI, GENREGenerate document IDs directly
RerankingMonoBERTRe-score top-k from first-stage retrieval

Appears In