SAN FRANCISCO & PHILADELPHIA--(BUSINESS WIRE)--Datafold, a data quality platform that automates the most tedious parts of data engineering workflows, today announced a partnership with dbt Labs, the pioneer in analytics engineering, along with a new integration to deliver trusted data faster. Datafold has automated test coverage for analytics engineers which can now be added into a company’s CI/CD workflow in one click with dbt Cloud or with a Python SDK for dbt Core.
“As a data engineer at Lyft, I always struggled with data testing. We wrote hundreds of SQL tests but never got to significant test coverage. Data quality issues would inevitably get into production and affect the business,” said Datafold founder and CEO Gleb Mezhanskiy. “I built Datafold to fully automate data testing so that analytics engineers could see the impact of every pull request on data models and applications before merging to prevent any issues from getting into production.”
Analytics engineers need to regularly update models within massive and complex schemas without any mistakes. However, they don’t have time to write the thousands of tests needed to get complete test coverage across their schema and pipelines. This means that in many cases, updates to models happen without complete confidence as to how the updated dbt code will impact the data.
Datafold automates writing these thousands of regression tests, so engineers know exactly what will happen to the data before they merge their update. Datafold embeds a summary of these automated tests directly in GitHub and GitLab, so engineers can see the impact in every pull request.
“Improved data quality is one of the primary benefits of standardizing on dbt. Datafold’s data diff in continuous integration checks and fine-grained column-level lineage on top of dbt models augments this experience for analytics engineers,” said Julia Schottenstein, product manager at dbt Labs. “We’re excited to further our partnership with Datafold and help customers gain confidence in their data.”
dbt enabled the data community to build useful models easily in data warehouses. This created a strong foundation to build things on top of the warehouse. Companies went from only building dashboards to building notebooks, apps, ML/AI, and reverse ETL on the warehouse, all within the past few years. Due to this huge increase in leverage of the warehouse, data quality has become a focus.
Datafold built column-level lineage at scale which it uses to give analytics engineers complete visibility into how their work impacts their pipelines. It allows analytics engineers to fix data quality issues before they ever get to production. Working together, dbt and Datafold deliver trusted data faster.
“The integration between dbt and Datafold is a game-changer,” said Josh Devlin, senior analytics engineer, Brooklyn Data Co. “There is so much value in actually understanding the effect of your pull request. It’s easy to set up, and it gives me the confidence that my dbt code does what I expect it to do.”
About Datafold
Datafold is a data quality platform that helps data teams deliver reliable data products faster. It has a unique ability to identify, prioritize, and investigate data quality issues proactively before they affect production. Founded in 2020 by veteran data engineers, Datafold has raised $22 million from investors including NEA, Amplify Partners, and YCombinator. Customers include Thumbtack, Patreon, Truebill, Faire, and Dutchie. For more information, visit www.datafold.com/, and follow the company on LinkedIn, Twitter, Facebook, and YouTube.
About dbt Labs
Since 2016, dbt Labs has been on a mission to help analysts create and disseminate organizational knowledge. dbt Labs pioneered the practice of analytics engineering, built the primary tool in the analytics engineering toolbox, and has been fortunate enough to see a fantastic community coalesce to help push the boundaries of the analytics engineering workflow. Today, there are 9,000 companies using dbt every week, 25,000 practitioners in the dbt Community Slack, and 1,800 companies paying for dbt Cloud.
All brand names and product names are trademarks or registered trademarks of their respective companies.
Tags: Datafold, dbt Labs, data observability, data quality, data reliability, regression testing, anomaly detection, machine learning, test coverage, analytics engineering, analytics engineers, cloud, pull requests, data diff, column-level lineage