Moldy data and dashboards: safe to eat?

Why dashboards should self-destruct and spreadsheets aren’t that bad

Nov 17, 2021

I started a presentation to our leadership team with a picture of moldy bread to describe how the situation in data felt at the time.

Everyone’s first expression was: “eww” followed by “ah yes, that’s exactly how it feels!”.

Moldy data is when

A dashboard has outdated metrics definitions
A dashboard only partially loads because an underlying data model is broken
KPI targets are imported from an outdated spreadsheet
A partial complete scheduled PDF report from Looker you didn’t know about goes out to hundreds of people in Slack on Monday mornings
An exec has bookmarked a dashboard you made in a rush last year and still use it to make decisions

Sometimes moldy data is innocent. But more often it’s not. It means that people use wrong information to make decisions.

Moldy data deteriorates trust in data and leads to a situation where more time in meetings is spent discussing what data means and if it’s right rather than making informed decisions based on it.

Mold leads to more mold and flows downstream. You can’t smell it, maybe you can’t see it but you surely don’t want it.

From Petr Janda’s article: Why the Data Analyst role has never been harder

Dashboards deserve extra blame as they most often bring mold to the surface in front of innocent people across the company.

The problem is that dashboards look and feel like real software programs. Something permanent and something that you can trust.

I don’t blame people for thinking that dashboards are right. After all they should expect them to be.

The good old days of spreadsheets

15 years ago, long before the birth of the Modern Data Stack, these same people relied on getting their numbers from spreadsheets an analyst would put together. This wasn’t perfect.

Spreadsheets are hard labour: They have to be manually updated - I vividly recall having to update a spreadsheet every Friday morning before the exec KPI update in a previous job
Spreadsheets are fragile: If dashboards feel like a house made of solid bricks, spreadsheets are made of thin sticks held together by duct tape. Something temporary that’s never 100% accurate and more often than not have a few calculation errors
Spreadsheets expire: They don’t automatically update and are only good for a certain place and time. Who doesn’t remember “annual budget_v37.xls”

In other words, spreadsheet’s don’t feel like real software. The contract between the end-user is clear; here’s something that solves a specific problem for you but use at your own risk.

But this contract also means that spreadsheets don’t mold. Nobody expect to dig up a two year-old spreadsheet and for that to automatically be updated to how the business has evolved since then.

Where do we go from here?

Moldy data and the role dashboards play in unintendedly surfacing it to the wrong people is an important problem. Dashboards are the layer people see, the tip of the iceberg and the pinnacle at which data trust is determined.

I have some suggestions on how to make this better

Dashboards should self-destruct
Spreadsheets have a place in the Modern Data Stack
We need new tools

Dashboards should self-destruct

In Mission Impossible, Tom Cruise’s messages from the Impossible Missions Force self-destruct after they’ve been read. What if the dashboard did the same?

What if dashboarding tools forced analysts to fill in a “time to self-destruct” field? After that the dashboard would be gone, completely evaporated, out-of-sight-out-of-mind, boom.

Sure, some people will get pissed. That person in the sales team in the Polish office that once sent you a DM on Slack for a bespoke dashboard and looks at it every day will get upset.

Sure, some people will feel that their ability to do “self serve root cause analysis” will be taken from them as they’ve been relying on a dashboard created three years ago by someone who left to join Facebook.

Sure, the “% of questions that can be self served” score that you’re reporting to your manager every six month will drop.

But as someone working in data who ultimately owns the quality of the data you just can’t support all of the above (I know I can’t).

Your job is to make these people upset but you’re doing it for a good reason - to build and uphold a culture of data trust.

Dashboards should only be the presentation layer anyway. If your dashboards are so full of business logic that you can’t recreate one in less than 30 minutes you should think twice if you’ve got the right setup.

I started my first few months by going through all our dashboards and added “☠️ WILL BE DELETED IN 30 DAYS” to 80% of our dashboards. In a month we went from 175 to 30 dashboards. - Carl Johan, Director of Data at Too Good to Go

Dare I say spreadsheets?

For some time I believed that everything should be self-serve. Instead of doing a quick data pull in a spreadsheet, analysts should spend 30% longer making it self-serve accessible in Looker so everyone could get that same data in the years to come. All data will be truly self-serve I told them. I’ve since changed my mind.

I now believe spreadsheets are often the right tool for a meaningful share of data work. It’s the closest to a democratised data language that everyone shares.

In the same way that Figma has invited the entire company into the design process, spreadsheets do the same for data. With spreadsheets someone with no data credentials who thinks dbt is a form of psychotherapy can take part. Want to add a column with customer type to a lead list? Perform a VLOOKUP and bring your own data in.

This is in stark contrast to the Modern Data Stack where a similar task requires anything from changing LookML code to creating a pull requests to updating a data model while keeping the YML file intact so the definition of the metric appears consistent in your data catalogue. Good luck explaining that to someone outside of the data team.

We need new tools

Should we stop using dashboards? Absolutely not. Having accurate and updated top level company KPIs that anyone can access in a dashboard is amazing. Almost no matter the investment, it’s worth having this and getting it right. My beef is with the moldy dashboards.

But moldy dashboards are only the tip of the iceberg. More often than not, moldy dashboards are the result of something upstream being broken. This mold just shouldn’t be brought to the surface by dashboards.

We need tools to tell us when something is wrong and these tools should not be company-wide dashboards.

Something to tell us when data is outdated.

Something that sits in between all the data that’s captured and what eventually ends up in the hands of end-users.

Something that creates a ✅ on a data point in a dashboard and gives everyone full confidence that it is good to use.

Something that’s aware of moldy data and critical of what gets exposed through dashboards.

Something that runs in the background and automatically brings bad data to the surface - no matter if it comes from faulted business logic, something broken in backend systems or from spreadsheet imports that no longer do what they’re supposed to.

Something that fits into the workflow of data practitioners and nudges them through workflows to do high quality data work instead of high volume data work.

Don’t get me wrong. I enjoy the ETL → ELT transition and having unlimited access to do exploratory analysis on all the data as much as any other data nerd. I absolutely don’t want this to go away. I just think the line between this and the expectation that all of it should be available to business end-users in dashboards has gone too far.

If you’ve got some tips on how to tackle moldy dashboards and data let me know.

Inside Data by Mikkel Dengsøe

Discussion about this post