Polars Fundamentals: Data Engineering with Rust

This repository contains hands-on examples and labs for using Polars with Rust for data engineering tasks. A Coursera course from Pragmatic AI Labs.

Labs

Complete these hands-on labs to reinforce your learning:

Lab	Topic	Example
Lab 1: What is Polars and Why Use It with Rust?	Arrow memory model, Polars vs. pandas, project setup	examples/1-polars-intro
Lab 2: DataFrames and Series	CsvReader, schema inspection, null counts, slicing	examples/2-dataframes-series
Lab 3: Expressions and the Lazy API	col, lit, LazyFrame, query plans, collect	examples/3-lazy-api
Lab 4: Data Cleaning	Nulls, casting, text normalization, invalid rows	examples/4-data-cleaning
Lab 5: Sorting, Filtering, and Aggregation	filter, sort, group_by, aggregations	examples/5-filtering-aggregation
Lab 6: Joining and Reshaping Data	Left joins, melt/unpivot, CSV and Parquet export	examples/6-joins-reshape
Lab 7: Bronze — Ingesting Raw Data	clap CLI, CsvReader to SQLite, ingested_at timestamp	examples/7-bronze-ingestion
Lab 8: Silver — Cleaning and Standardizing	SQLite to LazyFrame, cleaning pipeline, clean_wines	examples/8-silver-cleaning
Lab 9: Gold — Business Logic and Export	min-rating filter, top varieties, CSV and JSON export	examples/9-gold-export

Course Outline

Module 1: Polars Foundations

Lesson 1.1 — What is Polars and why use it with Rust?

Setting up a Polars project and building a first DataFrame
The Apache Arrow memory model and columnar storage
Polars vs. pandas: performance, lazy evaluation, and type safety

Lesson 1.2 — DataFrames and Series

Reading wine-ratings.csv and inspecting the schema
Series, DataFrames, and Polars data types
Column selection, row slicing, and null counts

Lesson 1.3 — Expressions and the Lazy API

col, lit, and chained transforms with LazyFrame
Eager vs. lazy evaluation — when to use each
Collecting a LazyFrame and reading the query plan

Module 2: Cleaning and Transforming Wine Data

Lesson 2.1 — Data Cleaning

Handling nulls, casting ratings to f64, normalizing text
Drop-vs-fill strategies for missing values
Filtering invalid rows and validating the cleaned schema

Lesson 2.2 — Sorting, Filtering, and Aggregation

Filter by region and rating, group_by variety
Multi-column sort with descending and ascending order
Counting the most-reviewed regions

Lesson 2.3 — Joining and Reshaping Data

Left join, melt/unpivot, and writing CSV and Parquet
Adding a lookup table and enriching the wine DataFrame
Wide-to-long transformations for reporting

Module 3: Building the Medallion Pipeline

Lesson 3.1 — Bronze: Ingesting Raw Data

clap CLI that loads wine-ratings.csv into SQLite
Schema design for the bronze layer: preserving raw columns
Adding an ingested_at timestamp without business logic

Lesson 3.2 — Silver: Cleaning and Standardizing

SQLite → LazyFrame cleaning pipeline with a printed summary
Applying reusable cleaning functions from Module 2
Enforcing non-null constraints and deduplication

Lesson 3.3 — Gold: Business Logic and Export

min-rating filter, top-variety aggregations, CSV and JSON export
Configurable thresholds with clap flags
Exporting the gold DataFrame for downstream consumers

Graded Project: wine-pipeline

Build wine-pipeline — a Rust CLI tool implementing the Bronze–Silver–Gold medallion architecture over the wine ratings dataset:

bronze — read wine-ratings.csv and load all rows as-is into a SQLite raw_wines table, adding an ingested_at timestamp
silver — read raw_wines, apply cleaning rules (drop nulls, normalize text, cast rating to f64), and write a validated clean_wines table with a printed summary of changes
gold — read clean_wines, filter by --min-rating (default 90), compute top grape varieties by average rating, and export results to gold_wines.csv and gold_wines.json
report — print a Markdown summary table of gold-layer aggregates to stdout

A starter implementation is in wine-pipeline/.

# Build
cargo build -p wine-pipeline

# Ingest raw CSV
cargo run -p wine-pipeline -- bronze --input wine-ratings.csv

# Clean and standardize
cargo run -p wine-pipeline -- silver

# Export gold results (wines rated 92+)
cargo run -p wine-pipeline -- gold --min-rating 92

# Print a Markdown report
cargo run -p wine-pipeline -- report

Local Setup

Install the Rust toolchain:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Clone this repository:

git clone https://github.com/alfredodeza/polars-fundamentals.git
cd polars-fundamentals

Build the entire workspace:
```
cargo build --workspace
```
Run an example:
```
cargo run -p polars-intro
```
Run tests:
```
cargo test --workspace
```

Key Crates

Crate	Purpose
polars	DataFrame engine with lazy evaluation and Arrow backend
rusqlite	SQLite bindings for the bronze/silver/gold persistence layer
clap	CLI argument parsing with derive API
serde_json	JSON serialization for gold-layer export
chrono	Timestamps for the bronze ingestion layer
anyhow	Ergonomic error handling

Resources

Coursera Courses

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
wine-pipeline		wine-pipeline
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polars Fundamentals: Data Engineering with Rust

Contents

Labs

Course Outline

Module 1: Polars Foundations

Lesson 1.1 — What is Polars and why use it with Rust?

Lesson 1.2 — DataFrames and Series

Lesson 1.3 — Expressions and the Lazy API

Module 2: Cleaning and Transforming Wine Data

Lesson 2.1 — Data Cleaning

Lesson 2.2 — Sorting, Filtering, and Aggregation

Lesson 2.3 — Joining and Reshaping Data

Module 3: Building the Medallion Pipeline

Lesson 3.1 — Bronze: Ingesting Raw Data

Lesson 3.2 — Silver: Cleaning and Standardizing

Lesson 3.3 — Gold: Business Logic and Export

Graded Project: wine-pipeline

Local Setup

Key Crates

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Polars Fundamentals: Data Engineering with Rust

Contents

Labs

Course Outline

Module 1: Polars Foundations

Lesson 1.1 — What is Polars and why use it with Rust?

Lesson 1.2 — DataFrames and Series

Lesson 1.3 — Expressions and the Lazy API

Module 2: Cleaning and Transforming Wine Data

Lesson 2.1 — Data Cleaning

Lesson 2.2 — Sorting, Filtering, and Aggregation

Lesson 2.3 — Joining and Reshaping Data

Module 3: Building the Medallion Pipeline

Lesson 3.1 — Bronze: Ingesting Raw Data

Lesson 3.2 — Silver: Cleaning and Standardizing

Lesson 3.3 — Gold: Business Logic and Export

Graded Project: wine-pipeline

Local Setup

Key Crates

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages