Retrieval-Augmented Generation has dominated the conversation around enterprise AI for the past two years. The pattern is elegant: retrieve relevant context from a knowledge base, then feed it to a language model to generate grounded answers. Compared to pure parametric generation, RAG dramatically reduces hallucinations and keeps responses anchored to your actual data.
But after deploying RAG systems at scale for dozens of enterprise customers, we have learned that RAG alone is not enough. It solves one problem -- grounded generation -- while leaving several critical enterprise needs unaddressed. This post explores those gaps and explains how the LSR (Lakehouse-Streamhouse-Realtime) architecture fills them.
The Limits of RAG
Single Query Pattern
RAG is fundamentally a retrieve-then-generate pattern. You embed a query, find similar chunks, and pass them to an LLM. This works beautifully for question-answering workloads, but enterprises need much more:
A pure RAG system either cannot handle these queries or handles them poorly by shoehorning everything through vector similarity.
The Chunking Problem
RAG systems are extremely sensitive to chunking strategy. Chunk too small and you lose context. Chunk too large and you dilute relevance. Use fixed-size chunks and you split sentences mid-thought. Use semantic chunks and you face inconsistency.
The deeper problem is that chunking is lossy. Once you break a document into chunks and embed them independently, you lose cross-chunk relationships, document-level semantics, and structural information. A 200-page contract becomes 400 disconnected text fragments.
No Knowledge Extraction
Standard RAG treats documents as opaque text blobs. It does not extract entities, relationships, or structured metadata. A medical research paper is just text -- the system does not know that it mentions specific drugs, conditions, clinical trials, or authors as discrete entities with typed relationships.
Without knowledge extraction, you cannot build a knowledge graph, detect contradictions across documents, or answer questions that require reasoning over entity relationships.
Infrastructure Sprawl
Enterprise RAG deployments typically involve:
Each system has its own scaling characteristics, failure modes, consistency guarantees, and operational overhead. The total cost of ownership is staggering, and most of it is infrastructure tax rather than differentiated value.
The LSR Architecture: A Better Foundation
The LSR architecture addresses these limitations by providing a unified platform with multiple query patterns, automatic knowledge extraction, and tiered storage that optimizes for both cost and performance.
Unified Storage on Apache Iceberg
Instead of scattering data across five different systems, LSR stores everything in Apache Iceberg tables:
| Table | Contents | Key Columns |
|---|---|---|
| documents | Source documents | id, title, content, mime_type, metadata |
| chunks | Document chunks | id, document_id, content, dense_embedding, sparse_embedding |
| entities | Extracted entities | id, name, type, properties |
| relationships | Entity relationships | source_id, target_id, type, weight |
| audit_log | Access and processing events | timestamp, action, user_id, resource_id |
Embeddings are stored as columns within the chunks table -- not in a separate vector database. This means a single scan can filter by metadata, score by BM25, and rank by vector similarity simultaneously.
Multiple Query Patterns
With all data in one system, LSR supports every query pattern natively:
Semantic Search -- Dense embeddings (BGE-M3) with approximate nearest neighbor search. Great for conceptual queries like "documents about supply chain risk."
Full-Text Search -- BM25 scoring via DuckDB FTS extension. Essential for exact phrase matching and keyword-heavy queries.
Hybrid Search -- Combines dense, sparse, and BM25 scores using Reciprocal Rank Fusion (RRF). Consistently outperforms any single retrieval method.
Graph Traversal -- Navigate entity relationships to answer multi-hop questions. "Which suppliers are connected to companies flagged for compliance issues?"
SQL Analytics -- Run arbitrary SQL over your knowledge base. "Count documents by department ingested in the last 30 days."
RAG -- Yes, RAG too. But now with better retrieval quality because hybrid search feeds better context to the LLM.
Automatic Knowledge Extraction
Every document ingested into LSR passes through an extraction pipeline that identifies:
These are stored in the entities and relationships tables, forming a knowledge graph that grows with every ingestion. The graph enables query patterns that are impossible with flat chunk retrieval.
Tiered Performance
The three LSR tiers ensure optimal performance without over-provisioning:
Realtime (ClickHouse) -- Sub-100ms queries for hot data. Recently ingested documents, frequently accessed entities, and trending queries are served from ClickHouse materialized views.
Streamhouse (Arrow) -- Near real-time processing. New documents are chunked, embedded, and indexed within seconds of upload. Incremental updates propagate without full re-indexing.
Lakehouse (Iceberg) -- Cost-efficient storage for the long tail. Historical data, archived documents, and infrequently accessed content live in Iceberg tables on object storage. Query latency is 1-10 seconds, but storage cost is a fraction of hot storage.
Benchmarks: LSR vs. Pure RAG
We benchmarked LSR hybrid search against a standard RAG pipeline (OpenAI embeddings + Pinecone + LangChain) on the BEIR benchmark suite:
| Metric | RAG (Pinecone) | LSR Hybrid Search | Improvement |
|---|---|---|---|
| NDCG@10 | 0.42 | 0.51 | +21.4% |
| MRR@10 | 0.38 | 0.47 | +23.7% |
| Recall@100 | 0.71 | 0.83 | +16.9% |
| p95 Latency | 180ms | 145ms | -19.4% |
| Cost / 1M queries | $47 | $31 | -34.0% |
The improvement comes primarily from hybrid search (combining dense + sparse + BM25) and from better chunking strategies informed by entity extraction.
When to Use What
RAG is not dead -- it is one pattern in a broader toolkit. Here is a practical decision framework:
For most enterprise deployments we have seen, the answer is LSR. The initial setup is comparable to RAG, but the long-term flexibility and total cost of ownership are dramatically better.
Getting Started with LSR
If you are currently running a RAG pipeline and want to evaluate LSR, the migration path is straightforward:
Most teams complete the migration in under a week. The immediate benefit is better retrieval quality from hybrid search; the long-term benefit is access to graph queries, SQL analytics, and multi-modal ingestion without building new infrastructure.
Want to see LSR in action? Book a technical demo with our engineering team, or start a free trial to run your own benchmarks.