columnar-engine

Spanner Columnar Engine Benchmark

This repository provides the schema and queries to benchmark the performance of Spanner Columnar Engine. By running these analytical queries, you can compare query latency and efficiency with and without the Columnar Engine enabled, using a public dataset.

The Columnar Engine is designed to accelerate analytical query performance (OLAP) on your data in Spanner, often by orders of magnitude. This is achieved by a combination of columnar storage and vectorized execution.

In the instructions below, we will:

Create a Spanner database with the target schema and load benchmark dataset.
Convert the data fully into columnar format.
Run the provided analytical queries and observe the benefits of the Columnar Engine.

Before you begin

Enable the Spanner API: Make sure the Cloud Spanner API is enabled in your Google Cloud project.
- Enable Spanner API
Create a Spanner Instance: Provision a new Spanner instance or use an existing one. The Columnar Engine requires the Enterprise or Enterprise Plus edition.
- Create a Spanner instance
- Spanner Editions Overview
GCS Bucket: Have a Google Cloud Storage bucket available to temporarily store the dataset and schema files.
- Create a GCS Bucket

Setup Instructions

Download Dataset: Download the CSV version of this dataset.
Download Schema: Download the Spanner table schema definition: benchmark-schema.sql
Upload to GCS:
- Extract the downloaded hits.csv file.
- Upload the hits.csv file to your GCS bucket.
- Upload the benchmark-schema.sql file to your GCS bucket.
Create the database and import Data: Use the Spanner "Import" feature to load the hits.csv data from your GCS bucket into the hits table.
- In the Google Cloud Console, navigate to the Overview page of your Spanner instance.
- Click Import my own data button.
- Follow the wizard:
  - Select CSV file as the file type,
  - Select hits.csv file in your GCS bucket as the file location.
  - Select benchmark-schema.sql file in your GCS bucket as the schema.
  - Select New database as the destination, and choose a name for the new database.
  - Click import.
Trigger Compaction: After the data importing job is finished, trigger a major compaction to ensure the data is fully converted to the columnar format. Wait for the major compaction to finish.

Run the Benchmark Queries

Run columnar and non-columnar queries in benchmark-schema.sql and compare the performance.

Notes that @{scan_method=columnar} force Spanner to use Columnar Engine if available, while @{scan_method=no_columnar} explicitly opt-out Columnar Engine for the query.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
SAMPLEOWNERS		SAMPLEOWNERS
benchmark-queries.sql		benchmark-queries.sql
benchmark-schema.sql		benchmark-schema.sql
import_data.png		import_data.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Spanner Columnar Engine Benchmark

Before you begin

Setup Instructions

Run the Benchmark Queries

FilesExpand file tree

columnar-engine

Directory actions

More options

Directory actions

More options

Latest commit

History

columnar-engine

Folders and files

parent directory

README.md

Spanner Columnar Engine Benchmark

Before you begin

Setup Instructions

Run the Benchmark Queries