The History

I was on the fence for doing Calhacks with a team and eventually decided to participate alone. The Query Planner Challenge was the first thing that I saw as a challenge and I was really intrigued. While everyone was building AI Agents and MCPs, I decided to take a different route and work on efficiency and optimization.

Coming in to this challenge, I knew a little about optimizing queries and databases. But at the end, I'm walking away with key concepts that I learnt while trying to fix all the errors I keep getting.

What did I build?

I built an efficient query planner engine to reduce query runtime on a dataset of 20 GB. Given the data and test queries, I built a system that loads data, pre-processes, and executes queries efficiently. The optimized version is built on top of the baseline given to us.

How does it work?

I built an adaptive query optimization system using DuckDB. My system first analyzes incoming queries to understand their structure, identifying aggregation levels and distinguishing between constant and variable filters. I then intelligently merge compatible queries into specifications for pre-aggregated summary tables, which my system materializes alongside a sorted main table. An adaptive router I created matches subsequent queries to the most optimal summary table—or the main table as a fallback—and rewrites the SQL for execution. This automated approach ensures my system delivers high performance, storage efficiency through merging, and 100% routing accuracy without manual query tuning.

Challenges I faced

The biggest challenge I faced was the learning curve that came with this. Because there were so many edge cases with the method I chose, it was efficient but it was long. Trying to piece every phase together while accounting for a wide range of test cases was a challenge. More specifically, my system was designed for small output tables but testing showed that tables with 200k+ rows have an extremely slow runtime, resulting in the overall execution time to be dropped down. This was overcome by implementing cardinality optimizations in phase 1 of my engine.

Built With

Share this project:

Updates