Skip to content

MiguelElGallo/snowdag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snowdag

Deploy Airflow, dbt, and Snowpark into Snowpark Container Services.

You build one Airflow service in Snowflake. It starts one DAG, builds a small dbt project, and writes a Snowpark summary row you can verify from SQL.

That service runs the snowdag_dbt_snowpark DAG:

start -> snowdbt_build -> snowpark_summary -> finish

The DAG builds the vendored snowdbt dbt project, then runs a Snowpark Python summary task over the dbt marts.

Note

This README is the fast path. If this is your first run, keep Deploy Step By Step open next to it.

What You Get

After a successful run, you have:

  • A public Airflow URL served by Snowpark Container Services.
  • One unpaused snowdag_dbt_snowpark DAG.
  • dbt marts in SNOWDBT_DEV.ANALYTICS.
  • A Snowpark summary row in SNOWDAG_APP_DB.OPERATIONS.
  • Local screenshot evidence in .snowdag/screenshots/.

The latest DAG run should show successful tasks:

Airflow Grid view showing a successful snowdag run

Screenshots are useful for reviewers, but make verify is the data proof.

The Shape

The deployment has a small shape on purpose.

flowchart LR
    CLI["Snowflake CLI<br/>and Docker"] --> Repo["Snowflake image repository"]
    Repo --> Service["Airflow service<br/>in Snowpark Container Services"]
    Service --> DAG["DAG<br/>snowdag_dbt_snowpark"]
    DAG --> DBT["dbt project<br/>snowdbt"]
    DBT --> Marts["Snowflake marts<br/>SNOWDBT_DEV.ANALYTICS"]
    Marts --> Summary["Snowpark summary task"]
    Summary --> Ops["SNOWDAG_APP_DB.OPERATIONS<br/>SNOWPARK_SUMMARIES"]
Loading

The first-run path is just a sequence of Make targets:

flowchart LR
    Configure["configure .env"] --> Check["make check"]
    Check --> Bootstrap["make bootstrap"]
    Bootstrap --> Build["make build"]
    Build --> Push["make push"]
    Push --> Deploy["make deploy"]
    Deploy --> Verify["make verify"]
    Verify --> Airflow["make service-info<br/>make screenshots"]
Loading

The service contains:

  • Airflow and its scheduler/webserver.
  • Astronomer Cosmos for dbt task rendering.
  • The vendored include/dbt/snowdbt project.
  • The DAG in include/dags/snowdag_dbt_snowpark.py.
  • Snowpark Python for the final summary task.

Requirements

You should already have:

  • Snowflake CLI.
  • Docker with Buildx.
  • make.
  • A Snowflake account with Snowpark Container Services enabled.
  • A Snowflake role that can create warehouses, databases, image repositories, compute pools, services, service endpoints, and roles.

This repo creates the demo warehouse, compute pool, image repository, service, roles, and SNOWDBT_* databases for you.

Check that the Snowflake CLI can connect:

snow connection test --connection "<your-snowflake-cli-connection>"

Configure

Create a local environment file:

cp .env.example .env

Set your Snowflake CLI connection name:

SNOWFLAKE_CONNECTION=your-snowflake-cli-connection

That connection is used by every Make target.

You can also set image values before building:

SNOWDAG_IMAGE_NAME=snowdag-airflow
SNOWDAG_IMAGE_TAG=dev

Tip

Keep the first run simple. Change image names, tags, or registry token settings after make verify passes once.

Check

Run:

make check

This target checks:

  • Required local commands.
  • The Snowflake connection.
  • Docker availability.

Bootstrap Snowflake

Run:

make bootstrap

This creates the SnowDAG runtime objects:

  • SNOWDAG_APP_DB
  • SNOWDAG_APP_DB.IMAGES.AIRFLOW_REPOSITORY
  • SNOWDAG_APP_DB.SERVICES
  • SNOWDAG_APP_DB.OPERATIONS
  • SNOWDAG_POOL
  • SNOWDAG_WH
  • SNOWDAG_SERVICE_ROLE
  • SNOWDAG_VIEWER

It also creates the dbt target databases:

  • SNOWDBT_DEV
  • SNOWDBT_TEST
  • SNOWDBT_PROD

Build

Build the Airflow image:

make build

The image is built for linux/amd64, which is required by Snowpark Container Services.

The image includes Airflow, dbt, Cosmos, Snowpark Python, the DAG, and the vendored snowdbt project.

Push

Push the image to Snowflake:

make push

The script:

  • Resolves the Snowflake image repository URL.
  • Writes Docker registry auth from your Snowflake CLI connection.
  • Tags the local image.
  • Pushes it to SNOWDAG_APP_DB.IMAGES.AIRFLOW_REPOSITORY.

If your environment needs a static registry credential, set:

SNOWFLAKE_REGISTRY_PAT_FILE=programmatic_access_token.txt

Deploy

Important

make deploy starts a public Snowpark Container Services Airflow service with one running instance. The warehouse and compute pool can auto-resume while the demo exists. Run make destroy when you are done.

Create the Airflow service:

make deploy

This drops any old SNOWDAG_APP_DB.SERVICES.AIRFLOW service and creates a fresh one with the pushed image.

The deploy script writes generated Airflow admin credentials to .snowdag/airflow_credentials.env.

When the container starts, it initializes Airflow, unpauses the DAG, and triggers one run.

That's it. You do not need to manually trigger the first DAG run.

What Runs

The deployment flow is:

sequenceDiagram
    participant User
    participant Make
    participant Snowflake
    participant SPCS as Snowpark Container Services
    participant Airflow
    participant dbt
    participant Snowpark

    User->>Make: make bootstrap
    Make->>Snowflake: create roles, warehouse, databases, pool, repository
    User->>Make: make build
    Make->>Make: build linux/amd64 Airflow image
    User->>Make: make push
    Make->>Snowflake: push image to image repository
    User->>Make: make deploy
    Make->>SPCS: create AIRFLOW service
    SPCS->>Airflow: start scheduler and webserver
    Airflow->>dbt: run snowdbt build and tests
    dbt->>Snowflake: create marts in SNOWDBT_DEV.ANALYTICS
    Airflow->>Snowpark: run snowpark_summary task
    Snowpark->>Snowflake: write summary row
    User->>Make: make verify
    Make->>Snowflake: check service, marts, and summary row
Loading

The snowdbt_build task group builds from Snowflake sample TPC-H data.

It creates marts such as:

  • SNOWDBT_DEV.ANALYTICS.FCT_ORDERS
  • SNOWDBT_DEV.ANALYTICS.DIM_CUSTOMER
  • SNOWDBT_DEV.ANALYTICS.FCT_ORDER_ITEMS

Then snowpark_summary reads those marts and writes one success row to SNOWDAG_APP_DB.OPERATIONS.SNOWPARK_SUMMARIES.

Verify

Run:

make verify

Verification polls until the startup DAG run finishes its visible Snowflake side effects.

By default it tries for about 20 minutes. For slower accounts, increase SNOWDAG_VERIFY_ATTEMPTS or SNOWDAG_VERIFY_SLEEP_SECONDS:

SNOWDAG_VERIFY_ATTEMPTS=90 SNOWDAG_VERIFY_SLEEP_SECONDS=30 make verify

It checks:

  • The compute pool.
  • The Airflow service.
  • The service endpoint.
  • The pushed image.
  • The dbt marts in SNOWDBT_DEV.ANALYTICS.
  • A Snowpark summary row written after the latest deploy.

If it passes, the full Airflow -> dbt -> Snowpark path worked.

The important rows look like this:

object_name                              row_count
SNOWDBT_DEV.ANALYTICS.FCT_ORDERS          1500000
SNOWDBT_DEV.ANALYTICS.DIM_CUSTOMER          99996
SNOWDBT_DEV.ANALYTICS.FCT_ORDER_ITEMS     6001215

successful_snowpark_summary_rows_after_deploy  order_count  customer_count  order_item_count
1                                                  1500000           99996          6001215

Open Airflow

Print the service URLs and login file:

make service-info

This works for the Airflow service now. It also lists any future public services or plugin endpoints created in SNOWDAG_APP_DB.SERVICES.

If this is your first screenshot run, install Playwright locally:

npm install --no-save playwright
npx playwright install chromium

Then capture useful DAG screenshots:

make screenshots

Screenshots are written to .snowdag/screenshots/:

  • airflow-dags.png
  • airflow-snowdag-grid.png
  • airflow-snowdag-graph.png

Use screenshots to show the Airflow state. Use make verify output to prove the Snowflake data path.

Destroy

Warning

make destroy drops the Snowflake objects created for this repo.

Drop the SnowDAG objects:

make destroy

Run the full cleanup and deployment loop:

make recreate

make recreate runs destroy -> bootstrap -> build -> push -> deploy -> verify. It drops the demo databases and runtime objects before creating them again.

Important

The Airflow service runs one Snowpark Container Services instance while the demo exists. Run make destroy when you are done to stop the service, compute pool, warehouse, app database, roles, and SNOWDBT_* demo databases.

Check the Docs

Run:

make docs-check

This lightweight check confirms the required documentation files are present.

Make Targets

Use these targets from the repo root:

Target What it does Safe to rerun? Spend or impact
make check Checks local tools, Snowflake CLI access, and Docker. Yes No objects created.
make bootstrap Creates or repairs Snowflake roles, databases, warehouse, pool, image repository, and operations tables. Yes Creates demo objects.
make build Builds the Airflow image for linux/amd64. Yes Local Docker work.
make push Pushes the image to the Snowflake image repository. Yes Writes a new image tag.
make deploy Creates the Airflow service in Snowpark Container Services. Yes Starts one service instance.
make verify Waits for dbt marts and the Snowpark summary row. Yes Uses the warehouse.
make service-info Prints service endpoint URLs and Airflow login details. Yes Metadata only.
make screenshots Captures Airflow DAG screenshots into .snowdag/screenshots/. Yes Opens the Airflow endpoint.
make destroy Drops objects created by this repo. No Destructive cleanup.
make recreate Runs destroy, bootstrap, build, push, deploy, and verify. No Destructive rebuild.
make docs-check Checks that the expected documentation files exist. Yes Local docs check.

More Detail

About

Uses Airflow and Astro to run Snowflake Native DBT jobs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors