Press "Enter" to skip to content

Month: May 2026

Setting Function Parameters for Debugging in R

Jason Bryer has a function:

I tend to write a lot of functions that create specific graphics implemented with ggplot2. Although I try to pick graphic parameters (e.g. colors, text size, etc.) that are reasonable, I will typically define all relevant aesthetics as parameters to my function. As a result, my functions tend to have a lot of parameters. When I need to debug the function I need to have all those parameters set in the global environment which usually requires me highlighting each assignment and running it. This function automates this process.

Click through to see how it works. H/T R-Bloggers.

Leave a Comment

Database Deployment Variables with SQLCMD

Andy Brownsword changes a variable:

A regular Database Project deployment is static and delivers consistent results regardless of environment. When it comes to schema, that’s usually desired, but data is a different story.

Data is environment specific. You want a Database Project that works across all environments. You want smarter deployments. You need SQLCMD Variables.

These have been the go-to method for handling different environments and other things that change between releases since I started using database projects about 15 years ago. Looks like not a lot has changed on this front, but it’s good to see that they still work as expected.

Leave a Comment

T-SQL Tricks from Recent Versions

Rebecca Lewis has three tricks for us:

Three T-SQL features that have shipped over the last few releases and quietly retired patterns many of us are still using out of habit. Each replaces a stale workaround with one line of code, and in two of three cases it runs much faster, too. Take a look, try them out.

Click through for Rebecca’s list. And if you want a full talk’s worth of these sorts of things, I happen to have one.

Leave a Comment

Against Data Lakes

Christophe Pettus lays out an argument:

If your engineers have told you that you need a data lake, you should be a little suspicious. Most organizations that build data lakes don’t need them, and a substantial fraction of the ones that do build them end up with what the industry — without any irony — calls a “data swamp.” So before we get to what a data lake is, let me say plainly: the right answer is often “not yet, and maybe never.” The interesting question is when “yet” becomes “now.”

I think my level of agreement is about 80%, and I’m glad that Christophe anticipated my “It’s really useful for data science work” argument. If the large majority of your data is relational in nature, then yeah, a data lake seems like overkill. And most of the time, I see companies taking that lake data and then organizing it into a warehouse later.

I’d say the biggest downside to relying on a data warehouse is the latency of requests. I need to get some dataset that includes columns A, B, and C from a table in the relational database but I’m not 100% sure that I really need A, B, and C because I need to train a model first or otherwise work with the data in some significant way. The OLTP DBAs don’t want me writing large-scale analytical queries against this data because of the performance implications. The BI developers/DBAs give me a turnaround time in months on this data, and if it turns out I don’t need it, they’ve wasted a lot of time for nothing.

That kind of scenario, in my mind, is what compels people in organizations to push for data lakes or something similar.

Leave a Comment

Comparing {targets} in R to dbt for Data Engineering

Jonathan Carroll compares two approaches:

Thinking of a real-world project I could take for a spin, I decided to build some ingestion for my personal finances. I’ve used Quickbooks previously which connects up to my bank and helps categorise personal and business (as a freelance contractor) expenses. I decided I’ll build my own ‘slowbooks’ processing workflow based on some manual exports (I don’t think my bank has an API).

Both of the approaches I’ll compare here build on the idea of a Makefile which connects up commands to run based on dependencies, and only runs what is needed; if all the input dependencies of a step have not changed, there’s no need to re-run that step. From what I understand, you could largely get away with just writing some Makefiles (or the newer implementation just (just.systems)) but these two approaches help to better structure how that’s constructed.

Read on for Jonathan’s discovery process and ultimate findings. H/T R-Bloggers.

Leave a Comment

Migration Regret: SQL Server to Postgres Edition

Tim Radney provides an important reminder:

As a data nerd who’s spent the last 25+ years helping organizations keep their databases running smoothly, I’ve had this conversation more times than I can count: “We’re moving to Postgres to save on licensing costs.” It sounds great on paper, open source, no vendor lock-in, and those big SQL Server license fees go away. But lately, I’m hearing a different story from DBAs and architects after the migration is done. They’re calling it Post Regret. That sinking feeling when the promised savings evaporate, performance tanks, and the team realizes they might have been better off staying put (or at least doing a lot more due diligence).

If you’re considering a SQL Server to PostgreSQL migration (or already knee-deep in one), this post is for you. I’ll break down what Post Regret looks like in the real world, why it happens so often, and how to avoid becoming the next cautionary tale. I’ve seen it play out in enough environments to spot the patterns.

Click through for Tim’s tales of woe. Importantly, none of it is a knock on Postgres or a knock on SQL Server. It’s the fact that these are two separate products whose tuning options are very different. You can successfully migrate from one to the other, but to do so, you really need to have a great understanding of both platforms at scale, not just at the tutorial level.

1 Comment

Using the XMLA Endpoint for Power BI

Ruben Van de Voorde hits an endpoint:

Most Power BI developers have come across “XMLA endpoint” somewhere: a tenant setting, a Microsoft Learn page, or a tool’s connection dialog. The term sounds technical, and it is, but the idea behind it is straightforward.

Your semantic model is a database. Like any database, it lives somewhere: on your laptop while you’re authoring it in Power BI Desktop, or in a workspace once you’ve published it to the Power BI Service or Fabric. To use a database with anything other than the application that hosts it, you need a connection. The XMLA endpoint is that connection.

This article walks through what the XMLA endpoint is, where it comes from, how to turn it on, what you can do with it once you have it, and where the alternatives (the Power BI REST API, Semantic Link, and the Fabric REST API) fit in.

Click through for Ruben’s article, which does a good job of demystifying the endpoint.

Leave a Comment

Managed Identities in SQL Server 2025

Greg Low offers another security option for service management:

Those who have worked with SQL Server will understand the need to avoid storing passwords for accessing resources. Windows-based identities are fine for on-premises SQL Server systems, including those on cloud-based virtual machines (VMs), but are of no use when you need to access cloud-based resources like those in Azure.

Some Azure-based resources (including storage accounts) offer other access methods, such as shared access signatures (SAS), but these aren’t much of a step-up from passwords.

What’s really needed is for SQL Server to have its own Microsoft Entra based identity. These can be used directly with Azure-based resources – and that’s exactly where managed identities come in.

Click through to see how it works. Importantly, this is a feature that requires additional payment.

Leave a Comment

Making a Power BI Matrix Visual Look Nicer

Valerie Junk pretties up a visual:

Many Power BI developers view tables and matrix visuals as the enemy. They dislike building them, and often think, “the user is just going to export this to Excel anyway.”
But here’s the thing: tables and matrix visuals have an important business case, and sometimes a well-structured table communicates data far better than any chart would.

There’s also something we don’t talk about enough: trust. BI developers often assume users trust our data, but that’s rarely true. Many users have been burned before by incorrect data or unreliable tools. Providing a matrix visual for row-by-row verification is a powerful way to rebuild trust.

That said, a matrix visual that looks like default Power BI formatting isn’t doing you any favors. 

And they’re probably going to export it to Excel anyhow. Them’s the breaks.

Leave a Comment