Towards Data Science

4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers
Data Engineering

How we replaced Python pipelines with dlt, dbt, and Trino — and cut delivery time…

Kiril Kazlou

Apr 29

10 min read
Ensembles of Ensembles of Ensembles: A Guide to Stacking
Machine Learning

The best machine learning model is not one model

Cole Sussmeier

Apr 29

9 min read

Latest

Agentic AI: How to Save on Tokens
Agentic AI

Caching, lazy-loading, routing, compaction, and more

Ida Silfverskiöld

Apr 29

26 min read
System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine
Data Science

A deep dive into how Apache Flink works, why it exists, and learning it while…

Sanil Khurana

Apr 29

17 min read
Let the AI Do the Experimenting
Agentic AI

Using autoresearch to optimise marketing campaigns under budget constraints

Mariya Mansurova

Apr 28

14 min read
Correlation Doesn’t Mean Causation! But What Does It Mean?
Data Science

What does correlation tells us?

Sara A. Metwalli

Apr 28

6 min read
The Next Frontier of AI in Production Is Chaos Engineering
Artificial Intelligence

Blast-radius control tells you how much to break. Intent tells you what breaking it will…

Sayali Patil

Apr 28

18 min read
PyTorch NaNs Are Silent Killers — So I Built a 3ms Hook to Catch Them at the Exact Layer
Deep Learning

NaNs don’t crash your training — they quietly destroy it. After losing hours to a…

Emmimal P Alexander

Apr 28

11 min read
A Career in Data Is Not Always a Straight Line, and That’s Okay
Author Spotlights

Sabrine Bendimerad on why flexibility is a crucial data science skill, the risks of outsourcing…

TDS Editors

Apr 27

9 min read
How Spreadsheets Quietly Cost Supply Chains Millions
Product Management

A simulation of how a single forecast change moves through five planning teams, and why…

Samir Saci

Apr 27

14 min read
Comparing Explicit Measures to Calculation Groups in Tabular Models
Data Modeling

With the advent of UDFs and their combination with calculation groups, I see a lot…

Salvatore Cagliari

Apr 27

6 min read

See all of the latest

Editor’s Picks

Bytes Speak All Languages: Cross-Script Name Retrieval via Contrastive Learning
Deep Learning

Why learn 8 scripts when you can learn 256 bytes?

Vedant Jumle

Apr 26

12 min read
Causal Inference Is Different in Business
Data Science

How does decision-gravity dictate this gap?

Alejandro Alvarez Perez

Apr 25

12 min read
I Built an AI Pipeline for Kindle Highlights
Large Language Models

A local, zero-cost project that cleans, structures, and summarizes your reading automatically

Pol Marin

Apr 24

13 min read
Using Causal Inference to Estimate the Impact of Tube Strikes on Cycling Usage in London
Data Science

Turning free-to-use data into a hypothesis-ready dataset

Luke Stuckey

Apr 22

19 min read
Ivory Tower Notes: The Methodology
Data Science

A short intro to scientific methodology to combat “prompt in, slop out”

Marina Tosic

Apr 22

6 min read
What Does the p-value Even Mean?
Data Science

And what does it tell us?

Sara A. Metwalli

Apr 20

7 min read
The LLM Gamble
Artificial Intelligence

Why it tickles your brain to use an LLM, and what that means for the…

Stephanie Kirmer

Apr 20

8 min read
AI Agents Need Their Own Desk, and Git Worktrees Give Them One
Agentic AI

Git worktrees, parallel agentic coding sessions, and the setup tax you should be aware of

Ruben Broekx

Apr 18

20 min read
Beyond Prompting: Using Agent Skills in Data Science
Artificial Intelligence

How I turned my eight-year weekly visualization habit into a reusable AI workflow

Yu Dong

Apr 17

7 min read

The Variable Newsletter

Exciting Changes Are Coming to the TDS Author Payment Program
Writing

Authors can now benefit from updated earning tiers and a higher article cap

TDS Editors

Mar 2

2 min read
TDS Newsletter: Vibe Coding Is Great. Until It’s Not.
The Variable

Sorting through the good, bad, and ambiguous aspects of vibe coding

TDS Editors

Feb 5

4 min read

Deep Dives

The Essential Guide to Effectively Summarizing Massive Documents, Part 2
LLM Applications

We have the document clusters, and it’s time to unlock their true potential! Let’s explore…

Vinayak Sengupta

Apr 25

18 min read
Lasso Regression: Why the Solution Lives on a Diamond
Machine Learning

It’s simpler than you think.

Nikhil Dasari

Apr 23

24 min read
Correlation vs. Causation: Measuring True Impact with Propensity Score Matching
Data Science

Learn how Propensity Score Matching uncovers true causality in observational data. By finding “statistical twins,”…

Gustavo Santos

Apr 22

12 min read
DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling
Machine Learning

How you can build your own Thompson Sampling Algorithm object in Python and apply it…

Jacob Ingle

Apr 21

17 min read
Git UNDO : How to Rewrite Git History with Confidence
Programming

For any data scientist who works in a team, being able to undo Git actions…

Omer Rosenbaum

Apr 21

24 min read
I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing
Machine Learning

The hidden cost of probabilistic outputs in systems that demand reliability

Benjamin Nweke

Apr 21

13 min read