Every enterprise sits on a mountain of unstructured data -- documents, audio recordings, video assets, images, and structured datasets scattered across dozens of systems. According to IDC, 80% of enterprise data is unstructured, and that figure is only growing. Yet most organizations can search less than 10% of it effectively.
We built Lakehouse42 to solve this problem. Today, we are announcing the general availability of our LSR-as-a-Service platform -- a unified system that ingests any data type, extracts knowledge automatically, and makes it instantly searchable across your entire organization.
Why We Built Lakehouse42
The knowledge management landscape is fragmented. Enterprises typically cobble together a patchwork of vector databases, search engines, document stores, and custom pipelines. Each system handles one data type or one query pattern, and none of them talk to each other.
The result is predictable: data silos, duplicated infrastructure, inconsistent search quality, and engineering teams spending more time maintaining plumbing than building products.
We saw an opportunity to rethink this from the ground up. Instead of adding another point solution, we designed a platform architecture -- the LSR (Lakehouse-Streamhouse-Realtime) pattern -- that unifies storage, processing, and serving into a single coherent system.
The LSR Architecture
At the heart of Lakehouse42 is our three-tier LSR architecture:
Lakehouse Layer -- Built on Apache Iceberg v3, this is the source of truth. All documents, chunks, embeddings, entities, and relationships are stored in open table formats. No vendor lock-in. Your data stays in your cloud storage (S3, GCS, Azure Blob, or Cloudflare R2) in standard Parquet files.
Streamhouse Layer -- Incremental processing via Arrow micro-batches. When a document is ingested, it flows through extraction, chunking, embedding, and entity resolution pipelines in near real-time. Agentic operators can be composed to build custom processing graphs.
Realtime Layer -- Powered by ClickHouse, this layer serves sub-second queries over hot data. Frequently accessed documents, recent ingestions, and trending entities are promoted to the realtime tier automatically based on access patterns.
Data flows between tiers transparently. Hot data lives in ClickHouse for sub-100ms queries. Warm data is served directly from Iceberg via DuckDB. Cold data can be queried on-demand with slightly higher latency. Your application code does not need to know which tier is serving a given query.
What Makes Us Different
Multi-modal ingestion out of the box. Upload PDFs, Word documents, audio files, video, images, or structured CSVs. Our extraction pipelines handle format detection, content extraction, and chunking automatically. No custom parsers required.
More than RAG. Most platforms stop at retrieval-augmented generation. Lakehouse42 supports semantic search, full-text search, graph traversal, SQL analytics, and hybrid search -- all from the same indexed data. RAG is one query pattern among many.
Open formats, no lock-in. Your data is stored in Apache Iceberg tables on your own object storage. You can query it with any Iceberg-compatible engine (Spark, Trino, DuckDB, Flink) independently of Lakehouse42.
Enterprise-grade multi-tenancy. Each organization gets isolated Iceberg namespaces with storage-level RBAC enforced by Apache Polaris. Data isolation is not just logical -- it is physical.
BYO everything. Bring your own storage, catalog, compute, and LLM provider. We integrate with all major cloud providers and model vendors. You control where your data lives and which models process it.
Early Traction
Over the past six months, we have been running a private beta with a select group of enterprise customers across financial services, healthcare, and legal sectors. The results have exceeded our expectations:
What Is Next
We are just getting started. Over the coming quarters, we will be rolling out:
Get Started
Lakehouse42 is available today with a free tier that includes 1GB of storage, 10,000 queries per month, and full access to all features. Enterprise plans include dedicated infrastructure, SLA guarantees, and white-glove onboarding.
Visit lakehouse42.com/signup to create your account, or book a demo with our team to see the platform in action.
We are building the future of enterprise knowledge management. If this resonates with you, we are hiring across engineering, product, and go-to-market.