-

New Research From Dremio Demonstrates Data Lakehouse Value with Math-Style Proof and Technical Clarity

Paper lays out shortcomings of traditional data warehousing on RDBMS-OLAP and offers concrete example of lakehouse implementation to show practical benefits

SANTA CLARA, Calif.--(BUSINESS WIRE)--Dremio, the easy and open data lakehouse, has published "The Data Lakehouse: Data Warehousing and More," a novel research paper now available on arXiv. The paper explores the data lakehouse model, offering modern insights for businesses looking to optimize their data utilization.

The idea through this preprint publication is to gather feedback from the open source research and scientific community and make it available to the wider community of practitioners.

The paper decomposes commonly used but overloaded terms like data warehouse, data warehousing, and data lakehouse into discrete components (such as query engine, table format, etc.), then offers clear terms and definitions based on these components to bring clarity to communication that uses these common terms.

Data warehousing has long been a cornerstone of modern data-driven organizations, serving as a strategic asset for informed decision-making. However, the emergence of data lakehouses has challenged the traditional paradigms by providing a new approach to achieving the goals of data warehousing, while overcoming its limitations and adding new dimensions of capability.

The paper begins by rigorously defining the often-ambiguous terms "data warehousing," "data warehouse," and "data lakehouse."

"We have seen some folks in the market say that ‘data lakehouse’ is just another marketing buzzword. We understood the arguments on both sides, but whether the statement was right or not was essentially rooted in how you interpret certain terms. We wanted to provide some clarity and, using an approach similar to a math-based proof, show that with clarity on definitions, ‘data lakehouse’ is definitely more than a marketing term. Rather, it’s a practical and valuable approach to data warehousing,” said Jason Hughes, director of technical advocacy.

The paper breaks down what is commonly referred to as “data warehousing” into its fundamental requirements, categorizing them into technical components, technical capabilities, and technology-independent practices. It then shows how a data lakehouse addresses all of these core requirements, therefore demonstrating that a data lakehouse can be used to achieve what is traditionally thought to require an RDBMS-OLAP. It also highlights the shortcomings of traditional data warehousing on RDBMS-OLAP, including limitations with semi-structured and unstructured data, lock-in and lock-out, and cost issues, prompting a reevaluation of architectural approaches. Additionally, the paper provides a concrete example of data lakehouse implementation to demonstrate its practical benefits.

The ultimate goal of a data lakehouse is to combine the strengths of RDBMS-OLAP data warehousing and data lakes, fulfilling data warehousing requirements on open data architecture and expanding to additional analytic capabilities.

Dremio's research paper solidifies the concept of data lakehouses and provides a practical roadmap for organizations looking to harness the full potential of their data while optimizing their data architecture.

To read the full paper please visit: https://arxiv.org/abs/2310.08697

About Dremio

Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Use Dremio's lightning-fast SQL query service and any other processing engine on the same data. Dremio increases agility with a revolutionary data-as-code approach that enables Git-like data experimentation, version control, and governance. In addition, Dremio eliminates data silos by enabling queries across data lakes, databases, and data warehouses, and by simplifying ingestion into the lakehouse. Dremio's fully managed service helps organizations get started with analytics in minutes, and automatically optimizes data for every workload. As the original creator of Apache Arrow and committed to Arrow and Iceberg’s community-driven standards, Dremio is on a mission to reinvent SQL for data lakes and meet customers where they are on their lakehouse journey.

Hundreds of global enterprises like JPMorgan Chase, Microsoft, Regeneron, and Allianz Global Investors use Dremio to deliver self-service analytics on the data lakehouse. Founded in 2015, Dremio is headquartered in Santa Clara. CNBC recognized Dremio as a Top Startup for the Enterprise and Deloitte named Dremio to its 2022 Technology Fast 500. To learn more, follow the company on GitHub, LinkedIn, Twitter, and Facebook, or visit www.dremio.com.

Contacts

Dremio


Release Versions

Contacts

More News From Dremio

Dremio Announces General Availability on Microsoft Azure

SANTA CLARA, Calif.--(BUSINESS WIRE)--Dremio, the unified lakehouse platform for self-service analytics, today announced the general availability of Dremio Cloud on Microsoft Azure. This SaaS solution brings users closer to their data with lakehouse flexibility, scalability, and performance at a fraction of the cost of traditional data warehouses. Dremio Cloud’s intuitive unified analytics, high-performance SQL query engine, and lakehouse management service for next-gen dataops let organization...

Dremio All In With Achievements Driving Customer Value in 2024 and Beyond

SANTA CLARA, Calif.--(BUSINESS WIRE)--Dremio, the unified lakehouse platform for self-service analytics and AI, today announced significant milestones that are shaping how it drives value for customers and accelerates enterprise decision-making in 2024. Dremio has focused its innovations, achievements, and leadership to ensure customers enjoy easy self-service analytics—with data warehouse functionality and data lake flexibility—across all of their data. AI and product innovation delivering fas...

Dremio Cloud Now Available for Microsoft Azure

SANTA CLARA, Calif.--(BUSINESS WIRE)--Dremio, the easy and open data lakehouse, today announced the public preview of Dremio Cloud for Microsoft Azure. This SaaS solution offers companies self-service analytics coupled with data warehouse functionality and the flexibility of a data lake, all within an environment characterized by rapid setup and deployment, automatic upgrades, scalability, and advanced data lakehouse management features. Dremio Cloud enables companies to rapidly drive value fro...
Back to Newsroom