Product Quantization (PQ)
Product Quantization
An approximate nearest neighbor search technique that compresses high-dimensional vectors by splitting them into sub-vectors and quantizing each sub-vector independently using a learned codebook. Distances are approximated using pre-computed lookup tables.
Intuition
Storing and comparing millions of dense vectors (e.g., from Bi-Encoder models) is expensive. PQ provides massive memory savings by representing each vector as a short code (sequence of codebook indices) while still allowing fast approximate distance computation.
How It Works
- Split: Divide each -dimensional vector into sub-vectors of dimension
- Train codebooks: For each sub-vector space, learn a codebook of centroids via k-means
- Encode: Replace each sub-vector with the index of its nearest centroid → vector becomes integers
- Search: Approximate distances using pre-computed lookup tables of distances between query sub-vectors and centroids
Memory Savings
- Original vector: bytes (float32)
- PQ code: bits
- Example: , , → 96 bytes instead of 3072 bytes (32× compression)
Key Properties
- Asymmetric Distance Computation (ADC): query stays uncompressed, only documents are quantized → better accuracy
- Additive error: quantization error is bounded and decreases with more sub-vectors
- Composable: often combined with ANN methods (e.g., IVF-PQ)
Connections
- Enables efficient Dense Retrieval at scale
- Used in FAISS (Facebook AI Similarity Search)
- Alternative to HNSW graph-based Approximate Nearest Neighbor search
- Relevant to Bi-Encoder and ColBERT index compression
Appears In
- IR-L06 - Dense Retrieval (§5.2 — compression techniques)