This repository contains hands-on examples and labs for using Polars with Rust for data engineering tasks. A Coursera course from Pragmatic AI Labs.
This repository has example projects in ./examples and hands-on labs in ./labs. Make sure you have the Rust toolchain installed.
This repository is Codespaces ready and set as a template repository. You can open it directly in a GitHub Codespace — Rust, rust-analyzer, and all extensions are pre-installed.
Complete these hands-on labs to reinforce your learning:
- Setting up a Polars project and building a first DataFrame
- The Apache Arrow memory model and columnar storage
- Polars vs. pandas: performance, lazy evaluation, and type safety
- Reading wine-ratings.csv and inspecting the schema
- Series, DataFrames, and Polars data types
- Column selection, row slicing, and null counts
- col, lit, and chained transforms with LazyFrame
- Eager vs. lazy evaluation — when to use each
- Collecting a LazyFrame and reading the query plan
- Handling nulls, casting ratings to f64, normalizing text
- Drop-vs-fill strategies for missing values
- Filtering invalid rows and validating the cleaned schema
- Filter by region and rating, group_by variety
- Multi-column sort with descending and ascending order
- Counting the most-reviewed regions
- Left join, melt/unpivot, and writing CSV and Parquet
- Adding a lookup table and enriching the wine DataFrame
- Wide-to-long transformations for reporting
- clap CLI that loads wine-ratings.csv into SQLite
- Schema design for the bronze layer: preserving raw columns
- Adding an ingested_at timestamp without business logic
- SQLite → LazyFrame cleaning pipeline with a printed summary
- Applying reusable cleaning functions from Module 2
- Enforcing non-null constraints and deduplication
- min-rating filter, top-variety aggregations, CSV and JSON export
- Configurable thresholds with clap flags
- Exporting the gold DataFrame for downstream consumers
Build wine-pipeline — a Rust CLI tool implementing the Bronze–Silver–Gold medallion architecture over the wine ratings dataset:
bronze— readwine-ratings.csvand load all rows as-is into a SQLiteraw_winestable, adding aningested_attimestampsilver— readraw_wines, apply cleaning rules (drop nulls, normalize text, cast rating tof64), and write a validatedclean_winestable with a printed summary of changesgold— readclean_wines, filter by--min-rating(default 90), compute top grape varieties by average rating, and export results togold_wines.csvandgold_wines.jsonreport— print a Markdown summary table of gold-layer aggregates to stdout
A starter implementation is in wine-pipeline/.
# Build
cargo build -p wine-pipeline
# Ingest raw CSV
cargo run -p wine-pipeline -- bronze --input wine-ratings.csv
# Clean and standardize
cargo run -p wine-pipeline -- silver
# Export gold results (wines rated 92+)
cargo run -p wine-pipeline -- gold --min-rating 92
# Print a Markdown report
cargo run -p wine-pipeline -- report-
Install the Rust toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-
Clone this repository:
git clone https://github.com/alfredodeza/polars-fundamentals.git cd polars-fundamentals -
Build the entire workspace:
cargo build --workspace
-
Run an example:
cargo run -p polars-intro
-
Run tests:
cargo test --workspace
| Crate | Purpose |
|---|---|
| polars | DataFrame engine with lazy evaluation and Arrow backend |
| rusqlite | SQLite bindings for the bronze/silver/gold persistence layer |
| clap | CLI argument parsing with derive API |
| serde_json | JSON serialization for gold-layer export |
| chrono | Timestamps for the bronze ingestion layer |
| anyhow | Ergonomic error handling |
Coursera Courses