Testing Database Insert Performance

This project compares different methods for reading CSV files and inserting them into databases, including:

PyArrow + ADBC - Using PyArrow to read CSVs and ADBC driver for Postgres
Polars + ADBC - Using Polars to read CSVs and ADBC driver for Postgres
psycopg2 + COPY - Using Python's csv module and psycopg2's COPY command
DuckDB + COPY - Using DuckDB's native COPY command

Prerequisites

Python 3.12+
Docker and Docker Compose (for local Postgres)
uv package manager (or install dependencies manually)

Setup

Install dependencies:

uv sync

Or with pip:

pip install -e .

Starting Local Postgres Database

Start a local PostgreSQL database using Docker Compose:

docker compose up -d

This will start a Postgres 16 container with:

Host: localhost
Port: 5432
Database: postgres
Username: postgres
Password: postgres

The container will be named adbc-postgres and data will be persisted in a Docker volume.

Checking Postgres Status

To check if Postgres is running:

docker-compose ps

To view logs:

docker-compose logs postgres

Stopping Postgres

To stop the Postgres container:

docker-compose down

To stop and remove all data:

docker-compose down -v

Available Scripts

All scripts read CSV files from data/uncompressed/ and insert them into databases. Each creates a different table name so you can compare results.

1. PyArrow + ADBC (`pyarrow_adbc_driver.py`)

Uses PyArrow's dataset API to read CSVs and ADBC driver to insert into Postgres.

python pyarrow_adbc_driver.py

Table: divvy_tripdata

Features:

Uses PyArrow's efficient dataset API for reading multiple CSV files
Converts to Arrow table format
Uses ADBC's adbc_ingest() for bulk insertion
Creates table with proper type mapping

2. Polars + ADBC (`polars_adbc_driver.py`)

Uses Polars to read CSVs and Polars' built-in write_database() with ADBC engine.

python polars_adbc_driver.py

Table: divvy_tripdata_polars

Features:

Uses Polars for fast CSV reading
Leverages Polars' native write_database() method
Automatic schema inference and table creation
Uses ADBC engine under the hood

3. psycopg2 + COPY (`psycopg2_driver.py`)

Uses Python's built-in csv module and psycopg2's COPY command (fastest psycopg2 method).

python psycopg2_driver.py

Table: divvy_tripdata_psycopg2_noarrow

Features:

Pure Python CSV reading (no Arrow dependencies)
Uses PostgreSQL's native COPY command via copy_expert()
Type inference from sample data
Optimized for bulk inserts

4. DuckDB + COPY (`duckdb_copy_driver.py`)

Uses DuckDB's native COPY command to load CSV files directly.

python duckdb_copy_driver.py

Table: divvy_tripdata_duckdb (in DuckDB database file)

Features:

Creates a local DuckDB database file (divvy_data.duckdb)
Uses DuckDB's COPY FROM command
Automatic schema inference
Stores data in DuckDB's columnar format

Connection Configuration

Using Default Connection (Docker Postgres)

All Postgres scripts (except DuckDB) use the Docker Postgres setup by default:

# Start Postgres
docker-compose up -d

# Run any script
python pyarrow_adbc_driver.py
python polars_adbc_driver.py
python psycopg2_driver.py

Using Custom Connection

You can override Postgres connection parameters using environment variables:

export PGHOST=your-host
export PGPORT=5432
export PGDATABASE=your-database
export PGUSER=your-user
export PGPASSWORD=your-password

python pyarrow_adbc_driver.py

Performance Comparison

All scripts include timing information showing:

Total insertion time
Rows per second throughput
Row count verification

This allows you to compare the performance of different approaches:

ADBC methods (PyArrow/Polars) - Leverage Arrow format for zero-copy transfers
psycopg2 COPY - Uses PostgreSQL's native bulk loading
DuckDB COPY - Optimized columnar database with native CSV loading

Run all scripts and compare the timing results to see which method works best for your use case!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
README.md		README.md
docker-compose.yml		docker-compose.yml
duckdb_copy_driver.py		duckdb_copy_driver.py
polars_adbc_driver.py		polars_adbc_driver.py
psycopg2_driver.py		psycopg2_driver.py
pyarrow_adbc_driver.py		pyarrow_adbc_driver.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Testing Database Insert Performance

Prerequisites

Setup

Starting Local Postgres Database

Checking Postgres Status

Stopping Postgres

Available Scripts

1. PyArrow + ADBC (`pyarrow_adbc_driver.py`)

2. Polars + ADBC (`polars_adbc_driver.py`)

3. psycopg2 + COPY (`psycopg2_driver.py`)

4. DuckDB + COPY (`duckdb_copy_driver.py`)

Connection Configuration

Using Default Connection (Docker Postgres)

Using Custom Connection

Performance Comparison

About

Uh oh!

Releases

Packages

Languages

danielbeach/testingArrowADBCdrivers

Folders and files

Latest commit

History

Repository files navigation

Testing Database Insert Performance

Prerequisites

Setup

Starting Local Postgres Database

Checking Postgres Status

Stopping Postgres

Available Scripts

1. PyArrow + ADBC (pyarrow_adbc_driver.py)

2. Polars + ADBC (polars_adbc_driver.py)

3. psycopg2 + COPY (psycopg2_driver.py)

4. DuckDB + COPY (duckdb_copy_driver.py)

Connection Configuration

Using Default Connection (Docker Postgres)

Using Custom Connection

Performance Comparison

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. PyArrow + ADBC (`pyarrow_adbc_driver.py`)

2. Polars + ADBC (`polars_adbc_driver.py`)

3. psycopg2 + COPY (`psycopg2_driver.py`)

4. DuckDB + COPY (`duckdb_copy_driver.py`)

Packages