The runnable companion to the Coursera course DuckDB From Zero, part of the Rust for Data Engineering specialization.
This repo bundles every example shown in the videos plus the optional Rust
capstone artifact (duckdb-reports). The Coursera lab boots a sandbox with the
same fixtures pre-loaded — but you can also run everything locally.
git clone https://github.com/paiml/duckdb-from-zero
cd duckdb-from-zero
make demo # downloads fixtures, runs the three SQL tour scripts
make capstone # builds and runs the Rust binary against Pagila Parquets
The first make call takes 60–120 seconds (TPC-H sf=1 generation + Pagila
download + NYC Taxi download). Subsequent calls are instant.
One-line install for each prerequisite:
curl -fsSL https://install.duckdb.org/ | sh # DuckDB CLI 1.5+ → ~/.local/bin/duckdb
curl -sSf https://sh.rustup.rs | sh # rustup; rust-toolchain.toml pins the rest
Then:
git clone https://github.com/paiml/duckdb-from-zero
cd duckdb-from-zero
make fetch # ~60–120 s; populates data/
make fetch is idempotent — re-running is a no-op once data/ is populated.
make demo # SQL tour: 01-tour, 02-files-as-tables, 03-tpch-q1
make capstone # Rust binary: customers, films, actors reports
make verify # CI smoke test: assert TPC-H + Pagila headline counts
make test # cargo test --release (28 tests)
make lint # cargo clippy --all-targets -- -D warnings
make clean # rm -rf data/ target/
The Rust capstone binary takes flags directly:
cargo run --release --bin duckdb-reports -- --report customers --limit 10
cargo run --release --bin duckdb-reports -- --report films --limit 10 --out films.json
cargo run --release --bin duckdb-reports -- --report actors --limit 10
sql/— the SQL the videos walk through, plus all 22 TPC-H benchmark queriescrates/duckdb-reports/— Rust binary that runs three Sakila analytics reports with runtime contracts; the foundation for the course capstonescripts/— the bash glue for fetch, demo, and verify
- DuckDB CLI 1.1+ (
curl https://install.duckdb.org/ | sh) - Rust 1.95+ (use rustup;
rust-toolchain.tomlwill pin automatically) - ~500 MB free disk for fixtures
Three things you can do here that you cannot easily do with a server-based database:
SELECT * FROM 'data/pagila/film.parquet'— the Parquet file is the table. NoCREATE TABLE. No import step. (Seesql/02-files-as-tables.sql.)- Query 6 million rows of TPC-H
lineitemand have the answer in well under a second on a laptop. (Seesql/03-tpch-q1.sql.) - Embed the entire database engine inside a Rust binary that ships as a
single file with no daemon and no network port. (See
crates/duckdb-reports/.)
| Metric | Result | Notes |
|---|---|---|
| Tests | 31 passing | 23 lib unit + 5 CLI + 3 integration |
| Line coverage | 99.16% total / 100% on main.rs / 99.09% on lib.rs |
3 missed are inside query_map closures (unreachable when prepare()? fails) |
| Function coverage | 100% | cargo llvm-cov --release |
cargo clippy --all-targets -- -D warnings |
clean | strict mode |
pmat quality-gate |
0 violations | excluding entropy (reports repetitions: 0, variation: 1.00 but flags as low — small-repo metric artifact) |
Reproduce: make test, make coverage, make lint, make pmat.
Every report enforces five named runtime invariants the moment its query
returns. The formal spec lives in contracts/duckdb-rust-v1.yaml;
each rule maps to an assert! in crates/duckdb-reports/src/lib.rs:
| ID | Rule | Severity |
|---|---|---|
| C1 | rows.len() == limit (LIMIT N must return N rows) |
ERROR |
| C2 | rows[0].count >= 1 (top record matched the LEFT JOIN) |
ERROR |
| C3 | forall r in rows. r.id > 0 (AUTO_INCREMENT sanity) |
ERROR |
| C4 | text fields well-formed (name has space, title non-empty, etc.) |
ERROR |
| C5 | rows[i].count >= rows[i+1].count (ORDER BY DESC sort invariant) |
ERROR |
Each contract has a should_panic unit test that fires on a synthetic
violation, and each report has a positive-path integration test against the
real Pagila fixture. Falsification IDs (FALSIFY-DUCKDB-001..005) are listed
in the YAML.
The capstone for this course extends crates/duckdb-reports/ from three reports
to a configurable analytics tool. See the course's capstone reading for the
detailed brief.
This is a course companion repo, not an actively-developed library. PRs that improve fidelity to the course videos, fix data-fetch breakage, or harden the provable contracts are welcome. Before opening a PR:
make test && make lintmust pass cleanly- Coverage gate is
cargo llvm-cov --release --workspace≥ 95% onlib.rs pmat quality-gateshould not regress on complexity or section checks- Any new contract goes in both
contracts/duckdb-rust-v1.yamland ashould_panicunit test inmod tests
Dual-licensed under MIT or Apache-2.0 at your option (matches the rest of the
PAIML Coursera companion repos). See LICENSE-MIT and LICENSE-APACHE.