The State of Data Testing in 2024: Why We Still Have Broken Windows
Two years ago, I shared an article called ‘Data tests and the broken windows theory’.
The broken windows theory can be traced back to criminology and suggests that if you leave a window broken in a compound everything else starts to fall apart. If residents start seeing that things are falling apart they stop caring about other things. We can draw the same analogy to the world of data and data tests in particular.
The problems with tests can be grouped into three areas:
There are too many alerts from test failures causing people to not pay attention to them
The signal-to-noise ratio is too low and data teams waste time on false positive test failures
Data teams don’t implement tests in the right places and data issues are caught by end-users
Back then I suggested a few ways to improve this:
Think of data assets as having a “tax” – each new data model requires ongoing maintenance, data teams should set aside 10–20% of their time to maintain data models and keep an eye out for test failures and other things going wrong.
Have a clear idea about which tests are important – be explicit about which tests power important use cases across your business, and prioritize these.
Get really good at ownership – route issues to relevant people, in and outside the data team, and avoid everything being everyone’s problem.
Have good test hygiene – set clear testing expectations, clean up long-standing failures monthly, and add tests for any stakeholder-reported issues to prevent future misses.
But still, alert overload, long-failing tests, and poor data quality continue to rank at the top of the list of data teams’ problems (57% of respondents in dbt’s 2024 Analytics Engineering survey highlight it as one of their top obstacles). Clearly, we have some way to go.
—
I still stand by the principles above, but I now think they miss the most important part – be strategic in your approach to testing.
In the article ’Test smarter not harder: add the right tests to your dbt project’, Faith McKenna and Jerrie Kumalah Kenney from dbt share their perspective. They make the case that testing should be grouped into data hygiene issues, business-focused anomalies, and stats-focused anomalies, then prioritize each concern by assessing its breadth of impact.
They make some well-measured and actionable suggestions such as
Data hygiene issues should be addressed in your staging layer, typically involving out-of-the-box tests such as unique and not_null as well as format and accepted_values
Business-focused anomalies should be used to catch unexpected behavior. For example, a revenue number should not change by more than X% in Y amount of time
Stats-focused anomalies should be used to catch fluctuations that go against your expected volumes or metrics such as a row count anomaly monitor that could indicate a site’s traffic amounts that may indicate illicit behavior
I believe they’re onto something—as an industry, we heavily underestimate the need for rigorous business-focused testing and overestimate the power of data hygiene tests. Anecdotal, data teams are hungry for ‘best practices’, and sessions around this topic at this year’s dbt Coalesce conference were consistently the most well attended.
To go beyond anecdotes, we analyzed an extract of dozens of our customers at SYNQ and found that only 3% of all tests are business logic.
There’s a good reason we overemphasize more mechanical tests and underinvest in business logic testing. Mechanical tests are easier to write and require little domain knowledge. Business logic tests, on the other hand, require you to deeply understand the output of your data and the underlying processes, to be able to capture edge cases and true anomalies.
Thor Henriksen has a “hot” take on this topic as seen from a data engineer’s perspective that he shared on the dbt community Slack.
First, tests should be organized into suites with alert groups defined by the most empowered people to solve this type of issue. Second, domain experts should play a role in remediating issues in source systems or reevaluation the definition of business tests to make data quality a shared responsibility across the data team and business.
If my first and only action as an alert recipient is to write a message/mail to someone else, I should not be the first support level for that alert.
If done well, a few things you start to happen: (1) You have a higher signal-to-noise ratio for your data tests and more often catch issues before your stakeholders, (2) less time is spent triaging and escalating issues between teams and (3) you continuously adjust and adopt better tests through a tight feedback loop with business stakeholders.
—
Do I believe that more thoughtful testing and shared data quality ownership will fully remove data quality as a top-rated problem among analytics engineers? No.
But I believe it's the most meaningful step we can take as an industry — and in my experience, it's something that 95% of data teams are still not doing well, but could be doing to fix their broken windows.