Data teams are getting larger, faster

Jul 11, 2022

On the relationship between data team size and complexity

10 Comments

Jul 18, 2022

yes! That's why we built a framework like https://github.com/stitchfix/hamilton to help ensure that the human/team side of feature/data engineering can scale to any size! It brings in software engineering best practices and prescribes a paradigm that helps keep things ordered, no matter the scale.

Expand full comment

Luis

Jul 13, 2022

Great article! Any thoughts about Data Product Manager roles?

Expand full comment

Reply (1)

Mikkel Dengsøe

Jul 14, 2022

I don't have any experience working with data product managers but have spoken to some companies that have them. What's your take?

Expand full comment

Reply (1)

Luis

Jul 14, 2022Edited

I transitioned from product to data product 3 months ago when I noticed that many products our team was building were data products (APIs, data sets, dashboards, ML algos, data apps). I feel this is happening as more companies and PMs notice this as well.

Expand full comment

Tom Goff

Jul 12, 2022

This was a brilliant read, so insightful.

Expand full comment

Tristan Handy

Jul 11, 2022

> Some data teams have already started making progress on only exposing certain well-crafted data models to people outside their own team.

Do you have any more information on this? I am very interested in this.

Expand full comment

Reply (1)

Mikkel Dengsøe

Jul 11, 2022

The companies I've spoken to who've made the most progress are taking this approach:

1) Define model ownership (finance, marketing...) in the dbt meta tag for each dbt model

2) Define if a model is public or private in the yml file

3) Only people who work within the same domain are able to access private data models in that domain

4) If a user accesses a private model outside their own domain Airflow throws an error

Companies that I've spoken to that are going down this path are mostly still early on and building it from scratch. There's also some implementation details such as how you define user <> domain mapping that as far as I understand are being handled in different ways

Expand full comment

Reply (1)

Jeremy Cohen

Jul 12, 2022

Thanks for the great article, Mikkel!

We're actively thinking about capabilities in dbt that could support splitting up monolithic projects (with thousands of models) into a set of smaller projects — each of which would be faster to run, easier to reason about, have clearer lines of ownership (one project == one team), and could be treated as contracted "services" by other teams' projects.

Some initial discussion in this direction, including public/private models, and how a team might version their public models: https://github.com/dbt-labs/dbt-core/discussions/5244

I've had a chance to run those ideas past some users already, and I'm always looking for more. If you or any of the folks you mention above would be interested in talking, let me know

Expand full comment

Reply (1)

Mikkel Dengsøe

Jul 12, 2022

This sounds really interesting! I'll keep an eye on it

Expand full comment

Bart-Vee

Jul 11, 2022

One of the symptoms of a growing data team that comes up in a lot of conversations are the long data access request workflows. They always end up with the engineer that has set up the process who becomes the bottleneck in the process.

Expand full comment

Inside Data by Mikkel Dengsøe

Data teams are getting larger, faster