Skip to content

paiml/polars-fundamentals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polars Fundamentals: Data Engineering with Rust

This repository contains hands-on examples and labs for using Polars with Rust for data engineering tasks. A Coursera course from Pragmatic AI Labs.

Contents

This repository has example projects in ./examples and hands-on labs in ./labs. Make sure you have the Rust toolchain installed.

This repository is Codespaces ready and set as a template repository. You can open it directly in a GitHub Codespace — Rust, rust-analyzer, and all extensions are pre-installed.

Open in GitHub Codespaces

Labs

Complete these hands-on labs to reinforce your learning:

Lab Topic Example
Lab 1: What is Polars and Why Use It with Rust? Arrow memory model, Polars vs. pandas, project setup examples/1-polars-intro
Lab 2: DataFrames and Series CsvReader, schema inspection, null counts, slicing examples/2-dataframes-series
Lab 3: Expressions and the Lazy API col, lit, LazyFrame, query plans, collect examples/3-lazy-api
Lab 4: Data Cleaning Nulls, casting, text normalization, invalid rows examples/4-data-cleaning
Lab 5: Sorting, Filtering, and Aggregation filter, sort, group_by, aggregations examples/5-filtering-aggregation
Lab 6: Joining and Reshaping Data Left joins, melt/unpivot, CSV and Parquet export examples/6-joins-reshape
Lab 7: Bronze — Ingesting Raw Data clap CLI, CsvReader to SQLite, ingested_at timestamp examples/7-bronze-ingestion
Lab 8: Silver — Cleaning and Standardizing SQLite to LazyFrame, cleaning pipeline, clean_wines examples/8-silver-cleaning
Lab 9: Gold — Business Logic and Export min-rating filter, top varieties, CSV and JSON export examples/9-gold-export

Course Outline

Module 1: Polars Foundations

Lesson 1.1 — What is Polars and why use it with Rust?

Lesson 1.2 — DataFrames and Series

Lesson 1.3 — Expressions and the Lazy API

Module 2: Cleaning and Transforming Wine Data

Lesson 2.1 — Data Cleaning

Lesson 2.2 — Sorting, Filtering, and Aggregation

Lesson 2.3 — Joining and Reshaping Data

Module 3: Building the Medallion Pipeline

Lesson 3.1 — Bronze: Ingesting Raw Data

Lesson 3.2 — Silver: Cleaning and Standardizing

Lesson 3.3 — Gold: Business Logic and Export

Graded Project: wine-pipeline

Build wine-pipeline — a Rust CLI tool implementing the Bronze–Silver–Gold medallion architecture over the wine ratings dataset:

  • bronze — read wine-ratings.csv and load all rows as-is into a SQLite raw_wines table, adding an ingested_at timestamp
  • silver — read raw_wines, apply cleaning rules (drop nulls, normalize text, cast rating to f64), and write a validated clean_wines table with a printed summary of changes
  • gold — read clean_wines, filter by --min-rating (default 90), compute top grape varieties by average rating, and export results to gold_wines.csv and gold_wines.json
  • report — print a Markdown summary table of gold-layer aggregates to stdout

A starter implementation is in wine-pipeline/.

# Build
cargo build -p wine-pipeline

# Ingest raw CSV
cargo run -p wine-pipeline -- bronze --input wine-ratings.csv

# Clean and standardize
cargo run -p wine-pipeline -- silver

# Export gold results (wines rated 92+)
cargo run -p wine-pipeline -- gold --min-rating 92

# Print a Markdown report
cargo run -p wine-pipeline -- report

Local Setup

  1. Install the Rust toolchain:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  2. Clone this repository:

    git clone https://github.com/alfredodeza/polars-fundamentals.git
    cd polars-fundamentals
  3. Build the entire workspace:

    cargo build --workspace
  4. Run an example:

    cargo run -p polars-intro
  5. Run tests:

    cargo test --workspace

Key Crates

Crate Purpose
polars DataFrame engine with lazy evaluation and Arrow backend
rusqlite SQLite bindings for the bronze/silver/gold persistence layer
clap CLI argument parsing with derive API
serde_json JSON serialization for gold-layer export
chrono Timestamps for the bronze ingestion layer
anyhow Ergonomic error handling

Resources

Coursera Courses

About

Introduction to Polars course. Everything you need to get started using Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages