Skip to content

Feature/catatalog to delta#376

Merged
Edwardvaneechoud merged 16 commits intomainfrom
feature/catatalog-to-delta
Mar 29, 2026
Merged

Feature/catatalog to delta#376
Edwardvaneechoud merged 16 commits intomainfrom
feature/catatalog-to-delta

Conversation

@Edwardvaneechoud
Copy link
Copy Markdown
Owner

This pull request implements a major migration of the catalog storage backend from plain Parquet files to Delta Lake tables, providing support for ACID transactions, time travel, schema evolution, and incremental updates. It introduces new utility modules, a migration script for existing data, and updates to the catalog service to support both legacy and new storage formats. The changes are grouped below by theme.

Delta Lake Integration and Migration:

  • Added a comprehensive design document (feature-md-catalog-to-delta.md) outlining the motivation, migration plan, API changes, and open questions for moving catalog storage from Parquet files to Delta Lake tables.
  • Introduced a one-time migration script (migrate_parquet_to_delta.py) to convert existing Parquet catalog tables to Delta format, including dry-run support and database updates.
  • Added delta_utils.py, a utility module to centralize format detection, size calculation, preview reading, and deletion logic for both Delta and legacy Parquet tables.

Catalog Service Enhancements:

  • Updated the catalog service (service.py) to:
    • Use format-agnostic helpers for table existence, preview, and deletion. [1] [2] [3]
    • Support both Delta and legacy Parquet tables in metadata reading, registration, and materialization, including new fields for storage format and table path. [1] [2] [3] [4]
    • Add helpers for parsing Delta table history and formatting timestamps to support future versioning and time travel APIs. [1] [2]

These changes lay the groundwork for robust, versioned, and transactional catalog storage, while ensuring backward compatibility and a clear migration path.

References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 29, 2026

Deploy Preview for flowfile-wasm canceled.

Name Link
🔨 Latest commit bcfcc67
🔍 Latest deploy log https://app.netlify.com/projects/flowfile-wasm/deploys/69c97c400abf06000849715b

@Edwardvaneechoud Edwardvaneechoud merged commit 7258002 into main Mar 29, 2026
27 checks passed
@Edwardvaneechoud Edwardvaneechoud deleted the feature/catatalog-to-delta branch March 29, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants