This repository provides the schema and queries to benchmark the performance of Spanner Columnar Engine. By running these analytical queries, you can compare query latency and efficiency with and without the Columnar Engine enabled, using a public dataset.
The Columnar Engine is designed to accelerate analytical query performance (OLAP) on your data in Spanner, often by orders of magnitude. This is achieved by a combination of columnar storage and vectorized execution.
In the instructions below, we will:
- Create a Spanner database with the target schema and load benchmark dataset.
- Convert the data fully into columnar format.
- Run the provided analytical queries and observe the benefits of the Columnar Engine.
-
Enable the Spanner API: Make sure the Cloud Spanner API is enabled in your Google Cloud project.
-
Create a Spanner Instance: Provision a new Spanner instance or use an existing one. The Columnar Engine requires the Enterprise or Enterprise Plus edition.
-
GCS Bucket: Have a Google Cloud Storage bucket available to temporarily store the dataset and schema files.
-
Download Dataset: Download the CSV version of this dataset.
-
Download Schema: Download the Spanner table schema definition: benchmark-schema.sql
-
Upload to GCS:
- Extract the downloaded
hits.csvfile. - Upload the
hits.csvfile to your GCS bucket. - Upload the
benchmark-schema.sqlfile to your GCS bucket.
- Extract the downloaded
-
Create the database and import Data: Use the Spanner "Import" feature to load the
hits.csvdata from your GCS bucket into thehitstable.- In the Google Cloud Console, navigate to the Overview page of your Spanner instance.
- Click
Import my own databutton. - Follow the wizard:
-
Trigger Compaction: After the data importing job is finished, trigger a major compaction to ensure the data is fully converted to the columnar format. Wait for the major compaction to finish.
Run columnar and non-columnar queries in benchmark-schema.sql and compare the performance.
Notes that @{scan_method=columnar} force Spanner to use Columnar Engine if available, while @{scan_method=no_columnar} explicitly opt-out Columnar Engine for the query.
