Skip to content

Releases: Edwardvaneechoud/Flowfile

Release v0.8.1

30 Mar 19:01
e5ca96c

Choose a tag to compare

Release v0.8.1

Highlights
Delta Lake Catalog Storage
Flowfile now uses Delta Lake as the storage layer for the catalog. This replaces static Parquet files with versioned, ACID-compliant tables. You can now perform time travel to query historical versions, view transaction logs, and perform atomic merges (upserts) without risk of data corruption.

Flow Parameters
Introducing flow-level variables that can be referenced across node settings using ${parameter_name} syntax. Parameters can be managed via a new Designer panel or overridden at runtime through the CLI using the --param flag.


Delta Lake Storage

The catalog storage layer has moved from standalone Parquet files to Delta Lake tables. This change enables managed storage with advanced data operations:

  • Time Travel: Access previous versions of any catalog table directly from the UI.
  • Advanced Write Modes: The Catalog Writer now supports append, upsert, update, and delete using configurable key columns.
  • Schema Evolution: Appending data now allows for automatic schema merging when new columns are detected.
  • Metadata Offloading: Row counts, schema detection, and size calculations are now offloading to the worker process to keep the UI responsive.
image

Flow Parameters

Flows can now be parameterized for dynamic execution. By defining parameters at the flow level, you can inject values into file paths, SQL queries, or formulas.

  • Dynamic Resolution: Use ${} syntax in node settings (e.g., a Read node path: C:/data/${current_month}/report.csv).
  • CLI Overrides: Pass parameters during execution via the command line:
    flowfile run flow my_flow.json --param input_dir=/production/data --param threshold=50
  • Designer Panel: A new draggable panel in the Designer allows you to define, default, and describe parameters as you build.
image

Changes

Core

  • Bidirectional Formula Translation: Implemented _ff_repr tracking in the FlowFrame API; Python expressions now automatically translate to visual Formula nodes when saved.
  • Idiomatic Code Export: The Python code generator now converts visual formulas into native Polars expressions (pl.col(...)) instead of proprietary helper functions.
  • Expanded Math Formulas: Added support for abs, round, ceil, and floor in the formula engine.
  • Delta Utility Layer: Centralized format detection and I/O logic for Delta and legacy Parquet tables.
  • Parameter Resolver: Recursive resolution engine for substituting ${} patterns in Pydantic models at runtime.
  • Catalog Migration: Added migrate_parquet_to_delta.py utility to convert existing catalog storage to the new format.
  • Pivot Optimization: Added zero-fill logic to sum, count, and len aggregations to maintain consistency with native Polars behavior.
  • Expanded Scope: Added selectors (cs) and base64 to the execution scope for Polars Code nodes.
  • Subprocess Execution: Enhanced support for "frozen" environments (PyInstaller) when spawning flow runs via CLI.

UI

  • Parameters Panel: New draggable UI for global parameter management.
  • Version History UI: Added a historical version list and a "Viewing Historical Version" banner to the Catalog.
  • Read Node Path Input: Added a manual path entry field to the Read node to support parameter injection.
  • Merge Configuration: Added key column selection and mode descriptions to the Catalog Writer.
  • Custom Cursors: Implemented high-contrast SVG cursors for the Canvas to improve visibility on Windows systems.

Infrastructure

  • Dependency Updates: * Polars bumped to < 1.40.
    • polars-expr-transformer updated to >= 0.5.3 for bidirectional mapping.
    • pl-fuzzy-frame-match updated to >= 0.6.0 for hybrid matching and fuzzy filter strategies.
  • Database Schema: Added storage_format column to the catalog_tables table (defaults to delta).
  • Data Types: Added support for Int128, UInt128, and Float16 across the engine and UI.

Fixes

  • Pivot Alignment: Fixed missing combinations in pivot nodes to correctly fill with zeros instead of nulls for specific aggregations.
  • Path Validation: Added strict validation for catalog table paths to prevent directory traversal.
  • Unpivot Styling: Fixed background color issues for unpivot text selection in dark mode.
  • Duration Handling: Corrected duration calculations for runs involving a mix of naive and timezone-aware datetimes.
  • Node Resets: Fixed a bug where parameter substitution would trigger spurious node resets and clear cached analysis data.

What's Changed

Full Changelog: v0.8.0...v0.8.1

Release v0.8.0

25 Mar 19:58
f29f313

Choose a tag to compare

Highlights

Flowfile now runs flows on a schedule. Set an interval, trigger on table updates, or run manually from the catalog — no external orchestrator needed. The scheduler runs embedded, standalone, or in Docker. This release also adds run management: trigger flows on demand, cancel running ones, and inspect execution logs.

Flow Scheduling

Run flows on a timer or trigger them when catalog tables update. Three schedule types: interval (every N minutes), table trigger (fires when a specific table is refreshed), and table set trigger (fires when all tables in a group have been refreshed). One active run per flow — if a flow is already running, new triggers are skipped until it finishes.

Schedules are managed from a new Schedules tab in the Catalog, or directly from a flow's detail panel. Create, enable/disable, run now, or delete — all inline.

image

Scheduler Modes

The scheduler runs wherever Flowfile runs:

  • Embedded (desktop / pip install flowfile) — start and stop from the Schedules tab
  • Standaloneflowfile run flowfile_scheduler as an independent background service
  • Docker — set FLOWFILE_SCHEDULER_ENABLED=true in your compose file

Only one scheduler instance runs at a time, enforced via an advisory database lock with heartbeat. If a scheduler dies, another can take over after 90 seconds.

Run Flows from the Catalog

Trigger any registered flow directly from its detail panel — no schedule required. The Run Flow button spawns a subprocess, tracks it in the run history, and writes execution logs to ~/.flowfile/logs/. Cancel a running flow at any time (sends SIGTERM to the process).

image

Table Trigger Architecture

Table triggers use a dual-path mechanism. The push path fires immediately when a Catalog Writer overwrites a table — no waiting for the next poll tick. The poll path (every ~30 seconds) acts as a safety net in case the push path fails. Double-firing is prevented by active run checks and timestamp comparison.

Run Logs

Scheduled, manual, and on-demand runs write output to log files. Click View log in a run's detail panel to see the full execution output.


Changes

Core

  • Full scheduling system with interval, table trigger, and table set trigger types
  • Scheduler engine with advisory lock, heartbeat, and stale-takeover logic
  • flowfile_scheduler as a new standalone package — lightweight, no flowfile_core dependency
  • flowfile run flowfile_scheduler CLI command for standalone mode
  • FLOWFILE_SCHEDULER_ENABLED environment variable for Docker auto-start
  • Run Flow from catalog (manual trigger without schedule)
  • Cancel Run support — sends SIGTERM, marks run as failed
  • Active runs tracking with live polling
  • In-place table overwrite — Catalog Writer now preserves the table's ID and all foreign key references (schedules, read links, favorites) instead of delete-and-recreate
  • Push-driven table trigger firing on overwrite_table_data
  • Paginated run history with status counts (total, success, failed, running)
  • Run types: in_designer_run, scheduled, manual, on_demand
  • Run log file access from API
  • get_database_url() centralized in shared storage config
  • Shared lightweight SQLAlchemy models for cross-package database access
  • Shared subprocess utilities for spawning flow runs
  • Flow handler rekey for Save As operations
  • Local execution mode — CLI runs skip worker offloading, write parquet directly, collect analysis data in-memory

UI

  • Schedules tab in the Catalog with overview, summary cards, and schedule list
  • Create Schedule modal with flow selector and type configuration
  • Schedule detail panel with run history filtered by schedule
  • Run overview panel with status breakdown
  • Run Flow and Cancel Run buttons in flow detail panel
  • Run log viewer in run detail panel
  • Save As flow identity switching
  • Fix background color for unpivot text in dark-mode

Infrastructure

  • Docker image now bundles flowfile_scheduler, flowfile, and flowfile_frame
  • flowfile_scheduler added to pyproject.toml packages and scripts
  • Scheduler lock table for single-instance enforcement

Fixes

  • Catalog Writer table overwrite now preserves table ID and foreign keys instead of deleting and recreating
  • Local (CLI) execution no longer attempts worker offloading for record counts and analysis data
  • Duration calculation handles naive vs timezone-aware datetime correctly
  • Docker kernel E2E test timing — added delay before polling to avoid false-positive early completion
  • Lazy module import in CLI for faster startup

What's Changed

Release v0.7.3

19 Mar 07:02
e2b1201

Choose a tag to compare

What's Changed

Full Changelog: v0.7.2...v0.7.3

Release v0.7.2

16 Mar 21:13
a4c97db

Choose a tag to compare

What's Changed

Full Changelog: v0.7.1...v0.7.2

Release v0.7.1

16 Mar 20:07
b4e723f

Choose a tag to compare

Full Changelog: https://github.com/Edwardvaneechoud/Flowfile/compare/v0.7.0..v0.7.1

Release v0.7.0

15 Mar 20:46
235a21c

Choose a tag to compare

Flowfile v0.7.0 Release Notes

Highlights

Docker-Based Kernel Execution

Run your custom Python code in isolated Docker containers with your own packages and resource limits. Install any pip package you need — scikit-learn, transformers, custom internal libraries — and use them directly in your flow. Kernels are managed through a dedicated UI with live status monitoring, memory tracking, and auto-restart on failure.

image

Jupyter-Style Code Editor

Python Script nodes feature a full notebook editor with cell-by-cell execution, CodeMirror 6 with Python syntax highlighting, and flowfile API autocompletions. Variables persist across cells, outputs render inline, and the editor expands to fullscreen for focused work.

image

Rich Display Outputs

flowfile.display() renders matplotlib figures, plotly charts, PIL images, and HTML directly in the notebook. plt.show() is auto-captured — no explicit display call needed.

image

Flow Catalog

Unity Catalog-style namespace hierarchy (catalog > schema > flow) for organizing pipelines. Register flows with run history, version snapshots, and node-level results. Favorite flows, track table lineage, and open any historical version directly in the designer.

flow-detail

Artifact System

Publish, consume, and track Python objects across nodes within a flow. The system is DAG-aware — only upstream artifacts are visible. Artifacts persist through container restarts and can be shared globally via the catalog.

Named Inputs & Outputs

Python Script nodes support named connections with visual edge labels. read_input("orders") reads a specific named input, publish_output(df, name="cleaned") writes to a named output. Up to 10 inputs per node (#344).


Changes

Core

  • Docker kernel system with container lifecycle management (#284)
  • Kernel Manager UI with status monitoring and memory tracking (#284)
  • Kernel auto-restart for stopped/errored kernels (#284)
  • Kernel execution cancellation (#284)
  • Flow Catalog with namespace hierarchy and flow registration (#285)
  • Catalog table management and lineage tracking (#346)
  • Documentation and favorite handling for catalog tables (#349)
  • Catalog upload improvements (#348)
  • Artifact publishing, consuming, and DAG-aware availability (#284)
  • Global artifact sharing tied to catalog (#284)
  • Named inputs/outputs for Python Script nodes (#344)
  • File Manager for Docker mode file uploads (#326)
  • Update functionality for database and cloud storage connections (#351)
  • Added psycopg2-binary and pandas/sqlalchemy as production dependencies for database writes

UI

  • Jupyter-style notebook editor with cell execution (#284)
  • CodeMirror 6 editor with flowfile API autocompletions (#284)
  • Rich display outputs — matplotlib, plotly, PIL, HTML (#284)
  • Kernel execution in custom node designer (#284)
  • Auto-generated node descriptions from config (#313)
  • Embeddable FlowfileEditor as Vue component library (#338, #341)
  • Z-index overflow fix with bounded constants (#314)
  • DraggablePanel layout fix on viewport change (#305)

Infrastructure

  • Fixed all ruff linting issues across the codebase (#347)

Fixes

  • API calls failing when Docker deployment accessed remotely (#324)
  • Parquet corruption in Docker volumes (#284)

Documentation

  • Project documentation (#350)

Release feauture/kernel-implementation

22 Feb 17:47
0bb3761

Choose a tag to compare

Claude/embeddable flowfile wasm q fqi1 (#341)

Release v0.6.3

02 Feb 16:03
2c96c59

Choose a tag to compare

What's Changed

Full Changelog: v0.6.2...v0.6.3

Release v0.6.2

30 Jan 12:07
ea01ae2

Choose a tag to compare

Multiple hotfixes to ensure better stability

Full Changelog: v0.6.1...V0.6.2

Release v0.6.1

29 Jan 19:46
17c700c

Choose a tag to compare

What's Changed

Full Changelog: v0.6.0...v0.6.1