dbtips digest #2

dbt Explorer, macros docs, testing business logic

Feb 20, 2024

This year is going to be full of exciting new releases for dbt. As promised by dbt Labs in November of last year, their current focus is on ensuring the stability of dbt-core. Additionally, one of the most highly anticipated features is the native unit testing, which is expected to be introduced in version 1.8 this spring. Exciting!

Meanwhile, the developers of dbt Cloud continue to release new features for their commercial product. Today, we will discuss some of the improvements they have made to dbt Explorer, which replaces dbt docs in dbt Cloud. These new features add significant value to the platform!

New features in dbt Explorer

dbt Explorer is the successor of dbt docs for dbt Cloud users.

dbt Explorer is slick and modern. It has revamped UI, supports cross project lineage (in case you are using dbt Mesh), and gives you old good models documentation and lineage graph. And it just got a little bit better!

Meet three new features of dbt Explorer:

Column level lineage 🤩
Project recommendations ✍️
Performance assessment 🚀

Column-level lineage

Yes, column-level lineage is now included in dbt!

However, there are some limitations. First, it will only be available in dbt Explorer and there will be no updates for older dbt docs. Second, currently it is only available for users on the Enterprise plan, so those on the Team plan will not have access to it.

Nevertheless, this is great news and shows that this feature is not forgotten. Hopefully, one day we will have widespread accessibility for dbt-core as well.

On a side note, Toby Mao, one of the creators of SQLmesh (dbt alternative) mentioned that they support column-level lineage in their UI as well as they have support for parsing existing dbt projects. So definitely try them out if you want to have CLL for your models.

(source)

Recommendations

Recommendations provide insights on your project and models. For example, the general overview tab displays documentation and test coverage. Additionally, each model in your project has a tab with specific recommendations.

All of that functionality is available in the open-source dbt-project-evaluator package. It appears that dbt Cloud runs all of those checks in the background, so you don't need to install the package yourself. However, if you are not a dbt Cloud user and still want to check your project for best practices, I highly recommend using this package.

Performance

Finally, the performance feature allows you to monitor the run times of your models and provides an overview of the longest running models and the models with the most number of failures.

Another interesting aspect is that you can observe the trend of past runs for a specific model, which can give you an idea of the runtime's increase over time. This enables you to keep track of models that gradually increase your costs and optimize them accordingly.

If you want to implement it yourself, take a look at the Elementary package. Essentially, it provides very similar data that allows you to monitor your project.

More on all three features you can read from the official dbt blog post.

Weekly (db)tips

Now to some tips :)

Last week, I had a discussion with students of the "Advanced dbt" course (in which I serve as a teaching assistant). We covered two topics: macros documentation and testing business rules in dbt. Here is a brief summary of both discussions.

Macros documentation

Apparently, not many dbt-folks are aware that you can document your macros in the same way you document your models and data sources.

Create a YAML file in the /macros folder and add a description to the macro and its arguments:

Having documented macros can bring the same benefits as documenting your models:

clarity and understanding of the use case
easier to discover and reuse in new models
knowledge sharing with your team

Testing business rules

In dbt, we typically write tests to check data quality. For example, we check for column emptiness, uniqueness, or perform other assertion tests.

However, there are times when it is also useful to test business rules and context.

For example, in your dim_user_subscriptions table, you have two calculated columns: first_subscription_date and count_of_total_subscriptions. From a business standpoint, it is impossible to have an empty first_subscription_date while count_of_total_subscriptions is greater than 0. This breaks the logic.

Your business may expect the subscriptions table to always contain information from all payment providers, or that the data should come from both IOS and ANDROID platforms. However, there may be delays or other issues with data delivery that can go unnoticed with simple assertion tests. To identify these errors, it is necessary to understand the business context and set up appropriate tests.

I typically use dbt singular tests that can capture these tricky scenarios and alert the team before they reach the CEOs dashboard. In some cases, you could even promote singular tests to generic.

If you liked this issue please subscribe and share it with your colleagues, this greatly helps me developing this newsletter!

See you next time 👋

#dbtips

Discussion about this post

Ready for more?