Add catalog table management and lineage tracking by Edwardvaneechoud · Pull Request #346 · Edwardvaneechoud/Flowfile

Edwardvaneechoud · 2026-03-04T17:12:36Z

Summary

This PR adds comprehensive catalog table management capabilities to Flowfile, including table registration, preview, deletion, and lineage tracking between flows and tables. It introduces new UI components for table browsing and management, backend services for table operations, and integration with the flow designer through new CatalogReader and CatalogWriter nodes.

Key Changes

Frontend - Catalog Management

New Components:
- TableDetailPanel.vue: Displays table metadata, schema, data preview, and read-by flows
- RegisterTableModal.vue: Modal for registering new tables from data files
- CreateNamespaceModal.vue: Extracted modal for creating catalogs/schemas
- RegisterFlowModal.vue: Extracted modal for registering flows
Enhanced CatalogView:
- Added search and filter functionality for catalog items
- Added "Register Table" button in sidebar
- Integrated table selection and detail view
- Added table display in catalog tree with search/filter support
- Displays tables produced by flows in FlowDetailPanel
Read Node Enhancement:
- Added "Browse Catalog" button to select tables from catalog
- Dialog for browsing and selecting catalog tables

Frontend - Node Types

CatalogReader Node: New node type for reading tables from the catalog with namespace/table selection and schema preview
CatalogWriter Node: New node type for writing data to catalog with configurable table name, namespace, and write mode
Added SVG icons for both new node types

Backend - Catalog Service

Table Operations:
- get_table(), list_tables(), create_table(), update_table(), delete_table()
- Table preview generation with configurable row limits
- Table metadata enrichment (row count, column count, size)
Lineage Tracking:
- list_tables_for_flow(): Get tables produced by a flow
- list_readers_for_table(): Get flows that read a table
- upsert_read_link(): Track read relationships between flows and tables
- Bulk operations for N+1 query elimination
API Endpoints:
- GET/POST /tables: List and register tables
- GET /tables/{id}: Get table details
- GET /tables/{id}/preview: Get data preview
- DELETE /tables/{id}: Delete table
- POST /tables/{id}/read-links: Track table reads

Database

New Models:
- CatalogTable: Stores table metadata (name, description, file path, row/column counts, size)
- CatalogTableReadLink: Tracks which flows read which tables
Schema Updates:
- Added tables field to NamespaceTree
- Added tables_produced field to FlowRegistrationOut
- New schemas: CatalogTableOut, CatalogTableCreate, CatalogTableUpdate, CatalogTablePreview, CatalogTableSummary

Flow Graph Integration

CatalogReader Node Handler: Resolves catalog table by ID or name, reads materialized Parquet file, generates schema callback
CatalogWriter Node Handler: Writes output data to catalog as Parquet, creates/updates table metadata, tracks lineage

State Management

Extended catalog-store with table selection, preview loading, and all tables list
Added table-related actions and getters

Notable Implementation Details

Table data is materialized as Parquet files in catalog_tables_directory
Lineage is tracked bidirectionally: flows know what tables they produce, tables know what flows read them
Table preview is lazy-loaded and configurable
Search/filter in catalog tree supports both flows and tables
Modal components extracted for reusability and cleaner code organization
Bulk database queries used to prevent N+1 problems when enriching flow data

Implements the catalog table feature that allows users to register data files (CSV, Parquet, Excel) as materialized Parquet tables in the catalog. Tables appear in the catalog tree alongside flows and artifacts, with schema metadata, row/column counts, and data preview capabilities. Backend: - CatalogTable SQLAlchemy model with schema_json, row_count, size_bytes - Pydantic schemas (CatalogTableCreate/Out/Preview, ColumnSchema) - Repository layer with full CRUD + namespace queries - Service layer with Polars-based materialization, preview (first N rows) - REST endpoints: GET/POST/PUT/DELETE /catalog/tables, GET preview - catalog_tables_directory in shared storage config - TableNotFoundError, TableExistsError domain exceptions Frontend: - CatalogTable TypeScript types and API client methods - Catalog store with table state, selection, and preview loading - TableDetailPanel component with metadata grid, schema table, data preview - CatalogTreeNode updated with table items (green table icon, row count) - CatalogView with Register Table modal and table detail integration - Browse Catalog button in Read node settings for selecting catalog tables https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

The Register Table action was previously only discoverable by hovering over schema-level tree nodes. This adds a visible table icon button in the sidebar header and a namespace selector dropdown in the registration modal so users can register tables without navigating the tree first. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

- Fix: Register Table button was greyed out because file selection required double-click. Now single-click file selection also captures the file path, enabling the Register button immediately. - Add "Publish to Catalog" checkbox to the output (write) node that registers the written file as a catalog table after execution. Includes optional table name and namespace selector fields. - Backend: Add publish_to_catalog, catalog_table_name, and catalog_namespace_id fields to OutputSettings. After output writes, if publish_to_catalog is true, auto-register via CatalogService. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Documents the end-to-end flow: UI settings, flow graph execution, CatalogService materialization to Parquet, database schema, file layout, and future Iceberg integration path. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

… modals - Add CatalogReader node to read tables from the catalog into flows - Add CatalogWriter node to write flow data to catalog as Parquet tables - Revert publish_to_catalog from Output node in favor of dedicated nodes - Add search and "show unavailable" filter to catalog sidebar - Add file_exists field to CatalogTable for availability tracking - Extract modals into CreateNamespaceModal, RegisterFlowModal, RegisterTableModal - Remove outdated catalog-publish-from-output-node.md docs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

- Fix icon loading: add catalog_reader.svg and catalog_writer.svg to BUILTIN_ICONS set so they're served from bundled assets instead of being looked up in user_defined_nodes/icons/ - Add source_registration_id and source_run_id columns to CatalogTable to track which flow produced a table - Add CatalogTableReadLink junction table to track which flows read from which tables (populated when catalog_reader node resolves) - Show "Produced by" flow link in table detail panel - Show "Read by Flows" list in table detail panel - Show "Tables Produced" list in flow detail panel - Add DB migration for new columns on catalog_tables - Add FlowSummary and CatalogTableSummary schemas for lightweight refs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

The read link was being recorded in add_catalog_reader() which runs when the node is configured in the designer, before the flow has a source_registration_id. Move the upsert_read_link call into _func() so it executes at flow runtime when the registration context is set. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Instead of recording which catalog tables a flow reads during execution (_func closure), record the links when the flow is saved. This ensures source_registration_id is always available and aligns with user expectations that lineage is captured on save. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

The source_registration_id was None at save time because it was only resolved before flow execution. Now the save_flow route looks up the flow registration by path before calling save_flow, so that _sync_catalog_read_links can record the read relationships. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

- Remove max-width: fit-content from catalog-detail so it fills the screen - Change catalog reader/writer icon color from indigo (#6366F1) to deep green (#16a34a) - Increase CATALOG label font-size from 12 to 16 in both SVGs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

- Produced by: truncate long flow names with ellipsis, show full name on hover via title attribute - Read by Flows: replaced inline chip list with a clickable meta card that opens a modal dialog listing all reading flows https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

netlify · 2026-03-04T17:12:41Z

✅ Deploy Preview for flowfile-wasm canceled.

Name	Link
🔨 Latest commit	`16d6d88`
🔍 Latest deploy log	https://app.netlify.com/projects/flowfile-wasm/deploys/69b44266834e240008e41f1b

Quality fixes: - Move json, pathlib.Path, uuid to top-level imports in service.py - Move sqlalchemy.func to top-level import in repository.py - Remove unnecessary getattr() for source_registration_id and source_run_id in _table_to_out — these are proper model columns - Add CatalogTableReadLink and CatalogTable to test cleanup Tests added (16 new tests): - TestReadLinks: upsert idempotency, list_readers_for_table, list_read_tables_for_flow, multiple readers per table - TestTableLineage: source_registration_id storage and nullability, list_tables_for_flow, bulk_get_tables_for_flows - TestServiceLineageEnrichment: source_registration_name enrichment, read_by_flows enrichment, tables_produced enrichment https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Guard navigateToFlow emit with source_registration_id null check so TypeScript can narrow the type from number | null to number. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

Tests (8 new): - TestCatalogWriter: table creation, source_registration_id lineage, overwrite mode replaces existing table - TestCatalogReader: load by table ID, load by name + namespace - TestSyncCatalogReadLinks: save_flow records read links, skips when no source_registration_id - TestCatalogRoundTrip: write → read preserves data and column names Quality fixes in add_catalog_writer: - Remove redundant `import logging` / local logger (module-level logger from flowfile_core.configs already available) - Remove redundant lazy imports of CatalogService, repository, and get_db_context (already imported at module top) - Replace lazy `import uuid` with top-level `uuid4` import https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

codecov-commenter · 2026-03-04T22:07:56Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 71.54213% with 179 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
flowfile_core/flowfile_core/catalog/service.py	63.19%	53 Missing ⚠️
flowfile_core/flowfile_core/routes/catalog.py	29.16%	34 Missing ⚠️
flowfile_core/flowfile_core/catalog/exceptions.py	19.23%	21 Missing ⚠️
flowfile_core/flowfile_core/flowfile/flow_graph.py	80.00%	19 Missing ⚠️
flowfile_worker/flowfile_worker/routes.py	66.66%	13 Missing ⚠️
flowfile_core/flowfile_core/catalog/repository.py	84.72%	11 Missing ⚠️
flowfile_core/flowfile_core/routes/routes.py	72.97%	10 Missing ⚠️
flowfile_core/flowfile_core/database/init_db.py	68.96%	9 Missing ⚠️
...core/flowfile/manage/compatibility_enhancements.py	41.66%	7 Missing ⚠️
flowfile_core/flowfile_core/schemas/schemas.py	83.33%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Capture defineEmits return as 'emit' and use it instead of $emit() in the template. This resolves the ESLint vue/require-explicit-emits rule violations on Windows CI where warnings are treated as errors. - Replace $emit('deleteTable', ...) with emit('deleteTable', ...) - Replace $emit('navigateToFlow', ...) with emit('navigateToFlow', ...) - Extract inline modal click handler into handleReadByClick function https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Collapse multi-line function signatures and extract template literal URL to a variable to avoid parser confusion with generics spanning multiple lines. The vue-eslint-parser (used as top-level parser) could not parse `axios.get<Type>(\`template\`)` when split across lines on Windows CI. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Critical fixes: - Fix double-materialization: add register_table_from_parquet() so the catalog writer node doesn't re-copy an already-written Parquet file - Add cascade delete of CatalogTableReadLink rows when deleting a table - Replace pandas df.to_pandas().values.tolist() with Polars df.rows() Design improvements: - Batch _sync_catalog_read_links into a single DB session instead of opening one session per catalog_reader node - Move file_path from query parameter to CatalogTableCreate request body - Add total_tables stat card to StatsPanel.vue Minor fixes: - Use logger.error (not warning) for re-raised exceptions in catalog writer - Use lazy logger formatting (%s) instead of f-strings - Remove redundant list_all_tables repository method - Remove unnecessary (table as any) cast in CatalogTreeNode.vue - Remove unused CatalogTable import in CatalogTreeNode.vue https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

claude and others added 18 commits February 28, 2026 08:26

Add technical README for Publish to Catalog feature

00b01e2

Documents the end-to-end flow: UI settings, flow graph execution, CatalogService materialization to Parquet, database schema, file layout, and future Iceberg integration path. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Add custom catalog node icons with distinct indigo/book design

9340043

https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

remove df as input

1ed1a6a

fixing the add node to starting node

4896ecc

Merge branch 'claude/improve-catalog-manager-7XNB2' of github.com:Edw…

279dcef

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

Fix issue with schema callback

8df9dc2

setting width 100%

f6be100

Merge branch 'claude/improve-catalog-manager-7XNB2' of github.com:Edw…

45d5c61

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

claude and others added 5 commits March 4, 2026 17:19

Fix TS2769 error in TableDetailPanel emit

66639bf

Guard navigateToFlow emit with source_registration_id null check so TypeScript can narrow the type from number | null to number. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV

Adding debug info

f266f9a

Merge branch 'claude/improve-catalog-manager-7XNB2' of github.com:Edw…

36e7edf

…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2

claude and others added 5 commits March 5, 2026 06:58

fix linting

8b36795

improve styling

ce4dcd1

Edwardvaneechoud added 6 commits March 5, 2026 16:51

improve styling

59d1dcc

Improve viewport handling

e358811

ensure unique id when opening old run version

10806bd

fix race condition in docker test

e840af6

Serialize lazy frame with sink parquet instead of write parquet

6d43026

Small improvements for router

16d6d88

Edwardvaneechoud merged commit 961507b into main Mar 13, 2026
27 checks passed

Edwardvaneechoud deleted the claude/improve-catalog-manager-7XNB2 branch March 13, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add catalog table management and lineage tracking#346

Add catalog table management and lineage tracking#346
Edwardvaneechoud merged 34 commits intomainfrom
claude/improve-catalog-manager-7XNB2

Edwardvaneechoud commented Mar 4, 2026

Uh oh!

netlify bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Edwardvaneechoud commented Mar 4, 2026

Summary

Key Changes

Frontend - Catalog Management

Frontend - Node Types

Backend - Catalog Service

Database

Flow Graph Integration

State Management

Notable Implementation Details

Uh oh!

netlify bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for flowfile-wasm canceled.

Uh oh!

codecov-commenter commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Mar 4, 2026 •

edited

Loading

codecov-commenter commented Mar 4, 2026 •

edited

Loading