Hundreds of actionable data quality guidelines
Building a collection of bite-sized guidelines for data teams – from testing to ownership and incident management best practices
Inspired by Baymard (a collection of UX best practices), I’ve built dataqualityguides.com. It’s a collection of hundreds of bite-sized guidelines for data quality best practices, from how to design a testing strategy to ownership and incident management.
It’s built on content and learnings from dozens of top data teams at companies such as Monzo, Airbnb, and Voi, SYNQ’s 70-page data product guide, and dozens of curated articles.
There are already more than 100 guidelines, with more to come.
Check it out here: dataqualityguides.com
Here are a few ways you can use it:
Search for guidelines – for example, search “testing strategy” to learn more about why you should test in layers, and how to avoid duplicate tests across layers
Browse by topics – dig into dozens of guidelines by topic, from Testing & Monitoring to Ownership and Measuring Data Quality
Real-life examples – each topic has real-life examples of how companies such as Monzo, Voi, and Airbnb are solving these problems
Here are a few of my favourite guidelines
#1: Shift focus from testing individual models in isolation to testing the end-to-end pipeline that serves a data product. This ensures the entire chain of dependencies is reliable.
#25: Assign ownership to teams (e.g., via a Slack channel or email group) rather than individuals. This is more scalable and resilient to people changing roles or going on vacation.
#54: Avoid vanity metrics like overall test coverage. Instead, measure the SLA achievement of your defined data products, as this directly reflects the reliability experienced by your consumers.
Building dataqualityguides.com
The guidelines are collected from resources such as SYNQ’s data quality guide, Monzo’s data blog, Airbnb’s data blog, and dozens of articles from data practitioners.
I started by collecting all relevant links and resources, and asked Google’s Gemini to summarize these into bite-sized guidelines across nine topics.
I then fed back through a “Feedback” column guidelines I wanted changed, and with what, and removed guidelines that didn’t meet the quality bar.
Once there were enough guidelines, I looked for real-life examples of how companies in the wild have implemented this through case studies.
I asked Gemini to link these case studies to the relevant topics so you can see how companies approach this in real-world situations.
Next up, I’ll continue adding more guidelines and real-life examples. I’ll also add other topics for guidelines. You can already find “Team & Culture” and "Technology & Architecture” with broader guidelines on how to hire a team, run onboarding, and select tools.
Stay tuned at dataqualityguides.com