Presto Blog - PrestoDB

Deploy Presto on Kubernetes using Helm: Query S3 Data with Hive Metastore

By Saurabh Mahawar February 27, 2026February 27, 2026

Deploying Presto on Kubernetes transforms this powerful engine into a cloud-native, resilient service that automatically handles failures, scales seamlessly, and optimizes resource utilization. When combined with Helm charts, the deployment becomes standardized, version-controlled, and easily reproducible across environments. This comprehensive guide will walk you through deploying a production-capable baseline Presto cluster on Kubernetes using the official Presto Helm…

PBench 1.2.1: End-to-End Benchmarking and Performance Testing for Presto

By Ethan Zhang February 24, 2026February 24, 2026

Benchmarking a distributed SQL engine like Presto involves much more than running a few queries and recording wall-clock times. Real-world performance evaluation demands multi-phase test execution, concurrent workloads, production traffic replay, and deep offline analysis. PBench is a purpose-built benchmarking toolkit for Presto that handles all of this through a declarative, composable stage system. With the 1.2.1…

TPC-H vs TPC-DS : Benchmarking Modern Distributed SQL Engines like Presto

By Saurabh Mahawar January 30, 2026February 3, 2026

In the world of big data, performance is the ultimate currency. But when you are processing petabytes of data across a distributed cluster, speed isn’t just about a stopwatch, it’s a high-stakes engineering challenge. Whether you are evaluating Presto, Spark or any other engine, you need an objective yardstick. Performance in a distributed SQL engine…

Presto vs Prestissimo – Known differences and workarounds

By Amit Dutta & Krishna Pai January 22, 2026January 30, 2026

TL;DR This blog outlines the known differences between Presto and Prestissimo where existing Presto queries require adjustment to work in Prestissimo. Details Prestissimo is generally available to use and has feature parity (except for a few functions) with Presto Java. There are differences in libraries used in both stacks. Also we have ensured that bugs…

From Zero to Contributor: A Complete Guide to Contributing to Presto Open Source

By Saurabh Mahawar January 9, 2026January 9, 2026

PrestoDB is a powerful distributed SQL query engine used widely for large-scale data analytics. Contributing to Presto is an excellent way to gain hands-on experience with distributed systems, Java, SQL engines, and large open-source codebases. This step-by-step tutorial is designed specifically for beginners and first-time contributors who want to build Presto from source, run the…

Understanding Presto UI: A Deep Dive into the Web Interface Architecture

By Saurabh Mahawar December 1, 2025December 1, 2025

Presto UI is a modern, React-based web interface that provides real-time monitoring, query management, and cluster administration capabilities for the Presto distributed SQL query engine. Whether you’re a database administrator, data engineer, or developer, Presto UI offers intuitive tools to visualize query execution, monitor cluster health, and interact with the Presto coordinator. Key Benefits of…

Seamless Integration: Connecting PrestoDB to SingleStore for High-Performance Analytics

By Saurabh Mahawar September 11, 2025November 19, 2025

In today’s data-driven landscape, organization’s are constantly seeking ways to analyze massive datasets quickly and efficiently. PrestoDB, a powerful open-source SQL query engine, and SingleStore, a distributed SQL database, are two technologies that, when combined, offer unparalleled capabilities for high-performance data querying and distributed analytics. This guide provides a hands-on, step-by-step tutorial on how to…

Presto Takes a Leap: Upgrading to Java 17 for Enhanced Performance and Security

By Zachary Blanco August 25, 2025

We’re excited to announce that the core Presto engine is migrating to Java 17. This upgrade reinforces our commitment to providing a robust, high-performance, and secure SQL query engine. This change allows Presto to leverage Java 17’s improvements, bringing enhancements in performance, stability, and security, and laying a strong foundation for future upgrades. Why Java…

Prestissimo Extension for AI Training Data Normalization at Meta: A Deep Dive for Developers (Lightning Talk)

By Saurabh Mahawar August 23, 2025August 23, 2025

At PrestoCon Day 2025, Meta’s Presto team recently unveiled the Prestissimo extension, a powerful enhancement designed to optimize AI training data normalization. This article explores the technical underpinnings and developer-centric features of this extension, providing a comprehensive understanding of how it supports large-scale AI workloads at Meta. Understanding AI Training Data Storage at Meta At…

Presto C++ Unleashed: Dynamically Load Unfenced UDFs, End Rebuilds, and Boost Performance

By Saurabh Mahawar August 22, 2025August 22, 2025

Dynamic loading in Presto C++ is revolutionizing how developers build and deploy user-defined functions (UDFs). At PrestoCon Day 2025 , Soumya Duriseti explained how Presto C++ now supports dynamic loading of unfenced UDFs, eliminating the need for time-consuming static builds and making it easier than ever to add custom logic without rebuilding the entire binary….

Building Connectors in Presto C++: Deep Dive into the TPCDS Connector (Lightning Talk)

By Saurabh Mahawar, Pramod Satya & Pratik Joseph Dabre August 20, 2025August 20, 2025

At PrestoCon Day 2025, engineers from IBM presented a deep dive into how connectors in Presto C++ extend the engine’s modular capabilities, focusing on the newly implemented TPCDS benchmark connector. Connectors are central to Presto’s architecture, enabling the query engine to communicate seamlessly with external systems such as databases, file formats, or benchmark data generators….

Presto’s Intelligent Future: Leveraging RAG and LLM’s for Smarter Query Execution

By Saurabh Mahawar August 12, 2025August 12, 2025

At PrestoCon Day 2025, Satej Sahu (Principal Data Engineer at Zalando SE) introduced the Self-Healing Query Connector for Presto, an AI-powered upgrade designed to make query troubleshooting faster, smarter, and more reliable. By combining Large Language Models with live query data, including logs, explain plans, and schema details it delivers accurate, context-aware solutions that improve…

Revolutionizing Presto C++: Unleashing Native Power with the Sidecar

By Saurabh Mahawar, Pramod Satya & Pratik Joseph Dabre August 10, 2025August 11, 2025

At PrestoCon Day 2025, we unveiled the Presto Sidecar, a powerful enhancement for Presto C++ (Velox) clusters that transforms how coordinators interact with native workers. This innovation removes long-standing blind spots in query planning by giving the coordinator real-time visibility into native worker capabilities – such as supported functions, data types, session properties, and plan…

Unlocking Petabyte-Scale Performance: Uber’s Journey with Presto for Distributed Cache using Alluxio

By Saurabh Mahawar August 8, 2025August 28, 2025

At PrestoCon Day 2025, Uber presented their innovative solution for optimizing petabyte-scale data analytics by deploying a distributed cache using Alluxio for Presto. Their journey was driven by significant challenges during a massive cloud migration, including read slowness and overwhelming HDFS clusters on-premises, and later high GCS egress costs and file access charges in the…

Unleashing Interactivity: Inside Meta’s Presto-Powered Data Warehouse Innovation

By Saurabh Mahawar August 7, 2025August 7, 2025

At this year’s PrestoCon Day, Meta had an awesome session to share the latest on what they’re doing with Presto. As you probably know, Meta has one of the largest data Lakehouse’s in the world and Presto is a critical piece of that data platform. It plays a critical role in serving vast and diverse…

Setting Up Presto with Apache Superset: Hands-On Guide

By Saurabh Mahawar August 7, 2025August 7, 2025

PrestoDB, an open-source distributed SQL query engine, allows you to query data from multiple disparate sources. When combined with Apache Superset, an open-source data visualization and exploration platform, it forms a powerful and flexible analytics solution. This guide provides a step-by-step approach to deploying these components within a Dockerized environment, simplifying setup and management. Pre-Requisites:…

Build Your Open Data Lakehouse: A Step-by-Step ETL Guide with MySQL, OLake, and PrestoDB

By Saurabh Mahawar July 29, 2025August 19, 2025

This tutorial provides a comprehensive guide to building an Open Data Lakehouse from scratch, a modern and flexible data architecture solution. Open Data Lakehouses offer a powerful and scalable method for storing, managing, and querying both structured and semi-structured data, leveraging a suite of robust open-source tools for enhanced control and flexibility. Pre-Requisites: Before commencing…

Leading by Contribution: IBM’s Ongoing Investment in Open-Source Presto

By Anant Aneja, Yabin Ma, Ali LeClerc & Ethan Zhang July 15, 2025July 16, 2025

Note: This is a cross-post from https://community.ibm.com/community/user/blogs/ali-leclerc/2025/07/15/ibms-ongoing-investment-in-presto At IBM, we believe open source is the engine of innovation. Presto, as a fast and flexible SQL engine for interactive analytics, continues to evolve rapidly thanks to community contributions. Over the past year, IBM engineers have focused on driving Presto forward across security, performance, native execution, and…