10 Comments

yes! That's why we built a framework like https://github.com/stitchfix/hamilton to help ensure that the human/team side of feature/data engineering can scale to any size! It brings in software engineering best practices and prescribes a paradigm that helps keep things ordered, no matter the scale.

Expand full comment

Great article! Any thoughts about Data Product Manager roles?

Expand full comment

I don't have any experience working with data product managers but have spoken to some companies that have them. What's your take?

Expand full comment

I transitioned from product to data product 3 months ago when I noticed that many products our team was building were data products (APIs, data sets, dashboards, ML algos, data apps). I feel this is happening as more companies and PMs notice this as well.

Expand full comment

This was a brilliant read, so insightful.

Expand full comment

> Some data teams have already started making progress on only exposing certain well-crafted data models to people outside their own team.

Do you have any more information on this? I am very interested in this.

Expand full comment

The companies I've spoken to who've made the most progress are taking this approach:

1) Define model ownership (finance, marketing...) in the dbt meta tag for each dbt model

2) Define if a model is public or private in the yml file

3) Only people who work within the same domain are able to access private data models in that domain

4) If a user accesses a private model outside their own domain Airflow throws an error

Companies that I've spoken to that are going down this path are mostly still early on and building it from scratch. There's also some implementation details such as how you define user <> domain mapping that as far as I understand are being handled in different ways

Expand full comment

Thanks for the great article, Mikkel!

We're actively thinking about capabilities in dbt that could support splitting up monolithic projects (with thousands of models) into a set of smaller projects — each of which would be faster to run, easier to reason about, have clearer lines of ownership (one project == one team), and could be treated as contracted "services" by other teams' projects.

Some initial discussion in this direction, including public/private models, and how a team might version their public models: https://github.com/dbt-labs/dbt-core/discussions/5244

I've had a chance to run those ideas past some users already, and I'm always looking for more. If you or any of the folks you mention above would be interested in talking, let me know

Expand full comment

This sounds really interesting! I'll keep an eye on it

Expand full comment

One of the symptoms of a growing data team that comes up in a lot of conversations are the long data access request workflows. They always end up with the engineer that has set up the process who becomes the bottleneck in the process.

Expand full comment