If you have built a search system in the last two years, you have probably used dense embeddings -- high-dimensional vectors that capture semantic meaning. Dense retrieval is powerful, but it has well-documented failure modes: it struggles with exact keyword matching, rare terms, and queries where lexical overlap matters more than semantic similarity.
The solution is not to abandon dense retrieval but to combine it with complementary methods. This post explains how Lakehouse42 implements hybrid search by fusing three retrieval signals -- dense embeddings, sparse vectors, and BM25 -- using Reciprocal Rank Fusion (RRF).
The Three Retrieval Signals
Dense Embeddings
Dense embeddings map text into a continuous vector space (typically 768 or 1024 dimensions) where semantically similar texts are close together. We use BGE-M3, a multi-lingual, multi-granularity model from BAAI that produces high-quality embeddings across 100+ languages.
Strengths:
Weaknesses:
Sparse Vectors (Learned Sparse Retrieval)
Sparse vectors assign non-zero weights to a vocabulary of tokens, producing a high-dimensional but mostly-zero vector. Unlike dense embeddings, each dimension corresponds to a specific token, making the representation interpretable. BGE-M3 produces sparse vectors alongside dense embeddings in a single forward pass.
Strengths:
Weaknesses:
BM25 (Best Match 25)
BM25 is a classical probabilistic ranking function that scores documents based on term frequency, inverse document frequency, and document length normalization. It has been the backbone of information retrieval for three decades.
Strengths:
Weaknesses:
Why Fusion Works
Each retrieval method has complementary strengths and weaknesses. Dense retrieval excels at semantic matching but misses keywords. BM25 excels at keyword matching but misses semantics. Sparse retrieval bridges the gap with learned term importance.
Consider this query: "GDPR data processing agreement template"
By fusing all three, you get documents that are both semantically relevant AND contain the right keywords, with learned term importance breaking ties.
Reciprocal Rank Fusion (RRF)
The fusion step is where the magic happens. We use Reciprocal Rank Fusion (RRF), a simple but remarkably effective algorithm introduced by Cormack, Clarke, and Buettcher in 2009.
The RRF score for a document $d$ given multiple ranked lists is:
RRF(d) = SUM( 1 / (k + rank_i(d)) ) for each ranking iWhere:
rank_i(d) is the rank of document d in the i-th retrieval method's resultsk is a constant (typically 60) that controls how much lower-ranked documents are penalizeddWhy RRF Over Other Fusion Methods
We evaluated several fusion strategies:
Linear combination -- Normalize scores from each method and take a weighted sum. The problem: scores from different methods are not on comparable scales. Dense cosine similarity (0-1) is fundamentally different from BM25 scores (unbounded). Normalization helps but introduces its own biases.
Learning-to-rank -- Train a model to combine features from each retrieval method. Produces excellent results but requires labeled training data, which most enterprises do not have at deployment time.
RRF -- Uses only rank positions, not scores. This sidesteps the score normalization problem entirely. It requires no training data. And empirically, it performs within 2-3% of learning-to-rank models while being dramatically simpler.
In our benchmarks, RRF with k=60 consistently outperforms any individual retrieval method:
| Method | NDCG@10 (BEIR avg) | MRR@10 |
|---|---|---|
| Dense only (BGE-M3) | 0.44 | 0.40 |
| Sparse only (BGE-M3) | 0.41 | 0.37 |
| BM25 only | 0.38 | 0.35 |
| Dense + BM25 (RRF) | 0.48 | 0.44 |
| Dense + Sparse + BM25 (RRF) | 0.51 | 0.47 |
The three-signal fusion outperforms the best single method by 15.9% on NDCG@10.
Implementation in Lakehouse42
Here is how hybrid search works in our system, step by step:
Step 1: Query Encoding
When a search query arrives, we encode it simultaneously with BGE-M3 to produce both a dense embedding (1024-dim float vector) and a sparse vector (variable-length token-weight pairs). We also tokenize the query for BM25 scoring.
Step 2: Parallel Retrieval
Three retrieval paths execute concurrently:
Step 3: Metadata Filtering
Before fusion, all three result sets are filtered by metadata constraints (organization_id, date range, document type, tags, etc.). This filtering happens at the storage layer, not in application code, so it benefits from Iceberg's partition pruning and predicate pushdown.
Step 4: RRF Fusion
The three ranked lists are fused using RRF with k=60. Documents that appear in multiple lists get boosted; documents that rank highly in all three lists rise to the top.
Step 5: Re-ranking (Optional)
For high-precision use cases, we optionally apply a cross-encoder re-ranker to the top-K fused results. The cross-encoder jointly encodes the query and each candidate document, producing more accurate relevance scores at the cost of higher latency (typically +50-100ms).
Tuning Hybrid Search
While RRF works well out of the box, there are several knobs for optimization:
Per-method weight in RRF -- You can weight the three signals differently. For keyword-heavy domains (legal, medical), increasing the BM25 weight improves results. For conceptual search (research, strategy), increasing the dense weight helps.
k parameter -- Lower k values (e.g., 20) amplify the importance of top ranks. Higher k values (e.g., 100) flatten the rank distribution, giving more weight to documents that appear across multiple methods even if they rank lower in each.
Retrieval depth (N) -- How many candidates to retrieve from each method before fusion. Deeper retrieval (N=200-500) improves recall at the cost of latency. For most workloads, N=100 provides a good balance.
Sparse vector threshold -- Minimum weight for a sparse dimension to be included. Higher thresholds reduce noise but may miss relevant terms. We default to 0.0 (include all non-zero dimensions).
Practical Results
We ran a controlled experiment with an enterprise customer in the legal sector. The corpus contained 50,000 contracts, and the evaluation set was 200 queries with human-labeled relevance judgments.
| Configuration | Precision@10 | Recall@10 | F1@10 | Avg Latency |
|---|---|---|---|---|
| Dense only | 0.62 | 0.45 | 0.52 | 85ms |
| BM25 only | 0.58 | 0.51 | 0.54 | 35ms |
| Dense + BM25 (RRF) | 0.71 | 0.58 | 0.64 | 95ms |
| Dense + Sparse + BM25 (RRF) | 0.74 | 0.63 | 0.68 | 110ms |
| + Cross-encoder re-rank | 0.81 | 0.63 | 0.71 | 195ms |
The full hybrid pipeline with re-ranking achieved 36.5% higher F1 than dense-only retrieval, with latency still well under 200ms.
Conclusion
Hybrid search is not a theoretical improvement -- it is a practical necessity for production retrieval systems. Dense embeddings, sparse vectors, and BM25 each capture different aspects of relevance, and combining them with RRF produces consistently superior results with minimal engineering overhead.
At Lakehouse42, hybrid search is the default for every query. You do not need to choose between semantic and keyword search -- you get both, fused intelligently, on every request.
Want to benchmark hybrid search on your own data? Start a free trial and run your evaluation in under an hour. For enterprise evaluations, contact our team.