Add catalog table management and lineage tracking#346
Merged
Edwardvaneechoud merged 34 commits intomainfrom Mar 13, 2026
Merged
Add catalog table management and lineage tracking#346Edwardvaneechoud merged 34 commits intomainfrom
Edwardvaneechoud merged 34 commits intomainfrom
Conversation
Implements the catalog table feature that allows users to register data files (CSV, Parquet, Excel) as materialized Parquet tables in the catalog. Tables appear in the catalog tree alongside flows and artifacts, with schema metadata, row/column counts, and data preview capabilities. Backend: - CatalogTable SQLAlchemy model with schema_json, row_count, size_bytes - Pydantic schemas (CatalogTableCreate/Out/Preview, ColumnSchema) - Repository layer with full CRUD + namespace queries - Service layer with Polars-based materialization, preview (first N rows) - REST endpoints: GET/POST/PUT/DELETE /catalog/tables, GET preview - catalog_tables_directory in shared storage config - TableNotFoundError, TableExistsError domain exceptions Frontend: - CatalogTable TypeScript types and API client methods - Catalog store with table state, selection, and preview loading - TableDetailPanel component with metadata grid, schema table, data preview - CatalogTreeNode updated with table items (green table icon, row count) - CatalogView with Register Table modal and table detail integration - Browse Catalog button in Read node settings for selecting catalog tables https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
The Register Table action was previously only discoverable by hovering over schema-level tree nodes. This adds a visible table icon button in the sidebar header and a namespace selector dropdown in the registration modal so users can register tables without navigating the tree first. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
- Fix: Register Table button was greyed out because file selection required double-click. Now single-click file selection also captures the file path, enabling the Register button immediately. - Add "Publish to Catalog" checkbox to the output (write) node that registers the written file as a catalog table after execution. Includes optional table name and namespace selector fields. - Backend: Add publish_to_catalog, catalog_table_name, and catalog_namespace_id fields to OutputSettings. After output writes, if publish_to_catalog is true, auto-register via CatalogService. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
Documents the end-to-end flow: UI settings, flow graph execution, CatalogService materialization to Parquet, database schema, file layout, and future Iceberg integration path. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
… modals - Add CatalogReader node to read tables from the catalog into flows - Add CatalogWriter node to write flow data to catalog as Parquet tables - Revert publish_to_catalog from Output node in favor of dedicated nodes - Add search and "show unavailable" filter to catalog sidebar - Add file_exists field to CatalogTable for availability tracking - Extract modals into CreateNamespaceModal, RegisterFlowModal, RegisterTableModal - Remove outdated catalog-publish-from-output-node.md docs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2
- Fix icon loading: add catalog_reader.svg and catalog_writer.svg to BUILTIN_ICONS set so they're served from bundled assets instead of being looked up in user_defined_nodes/icons/ - Add source_registration_id and source_run_id columns to CatalogTable to track which flow produced a table - Add CatalogTableReadLink junction table to track which flows read from which tables (populated when catalog_reader node resolves) - Show "Produced by" flow link in table detail panel - Show "Read by Flows" list in table detail panel - Show "Tables Produced" list in flow detail panel - Add DB migration for new columns on catalog_tables - Add FlowSummary and CatalogTableSummary schemas for lightweight refs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2
The read link was being recorded in add_catalog_reader() which runs when the node is configured in the designer, before the flow has a source_registration_id. Move the upsert_read_link call into _func() so it executes at flow runtime when the registration context is set. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
Instead of recording which catalog tables a flow reads during execution (_func closure), record the links when the flow is saved. This ensures source_registration_id is always available and aligns with user expectations that lineage is captured on save. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
The source_registration_id was None at save time because it was only resolved before flow execution. Now the save_flow route looks up the flow registration by path before calling save_flow, so that _sync_catalog_read_links can record the read relationships. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
- Remove max-width: fit-content from catalog-detail so it fills the screen - Change catalog reader/writer icon color from indigo (#6366F1) to deep green (#16a34a) - Increase CATALOG label font-size from 12 to 16 in both SVGs https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
- Produced by: truncate long flow names with ellipsis, show full name on hover via title attribute - Read by Flows: replaced inline chip list with a clickable meta card that opens a modal dialog listing all reading flows https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
✅ Deploy Preview for flowfile-wasm canceled.
|
Quality fixes: - Move json, pathlib.Path, uuid to top-level imports in service.py - Move sqlalchemy.func to top-level import in repository.py - Remove unnecessary getattr() for source_registration_id and source_run_id in _table_to_out — these are proper model columns - Add CatalogTableReadLink and CatalogTable to test cleanup Tests added (16 new tests): - TestReadLinks: upsert idempotency, list_readers_for_table, list_read_tables_for_flow, multiple readers per table - TestTableLineage: source_registration_id storage and nullability, list_tables_for_flow, bulk_get_tables_for_flows - TestServiceLineageEnrichment: source_registration_name enrichment, read_by_flows enrichment, tables_produced enrichment https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
Guard navigateToFlow emit with source_registration_id null check so TypeScript can narrow the type from number | null to number. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
…ardvaneechoud/Flowfile into claude/improve-catalog-manager-7XNB2
Tests (8 new): - TestCatalogWriter: table creation, source_registration_id lineage, overwrite mode replaces existing table - TestCatalogReader: load by table ID, load by name + namespace - TestSyncCatalogReadLinks: save_flow records read links, skips when no source_registration_id - TestCatalogRoundTrip: write → read preserves data and column names Quality fixes in add_catalog_writer: - Remove redundant `import logging` / local logger (module-level logger from flowfile_core.configs already available) - Remove redundant lazy imports of CatalogService, repository, and get_db_context (already imported at module top) - Replace lazy `import uuid` with top-level `uuid4` import https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Capture defineEmits return as 'emit' and use it instead of $emit() in
the template. This resolves the ESLint vue/require-explicit-emits rule
violations on Windows CI where warnings are treated as errors.
- Replace $emit('deleteTable', ...) with emit('deleteTable', ...)
- Replace $emit('navigateToFlow', ...) with emit('navigateToFlow', ...)
- Extract inline modal click handler into handleReadByClick function
https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
Collapse multi-line function signatures and extract template literal URL to a variable to avoid parser confusion with generics spanning multiple lines. The vue-eslint-parser (used as top-level parser) could not parse `axios.get<Type>(\`template\`)` when split across lines on Windows CI. https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
Critical fixes: - Fix double-materialization: add register_table_from_parquet() so the catalog writer node doesn't re-copy an already-written Parquet file - Add cascade delete of CatalogTableReadLink rows when deleting a table - Replace pandas df.to_pandas().values.tolist() with Polars df.rows() Design improvements: - Batch _sync_catalog_read_links into a single DB session instead of opening one session per catalog_reader node - Move file_path from query parameter to CatalogTableCreate request body - Add total_tables stat card to StatsPanel.vue Minor fixes: - Use logger.error (not warning) for re-raised exceptions in catalog writer - Use lazy logger formatting (%s) instead of f-strings - Remove redundant list_all_tables repository method - Remove unnecessary (table as any) cast in CatalogTreeNode.vue - Remove unused CatalogTable import in CatalogTreeNode.vue https://claude.ai/code/session_01AcexA8fgAu5D4apWsGE6AV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive catalog table management capabilities to Flowfile, including table registration, preview, deletion, and lineage tracking between flows and tables. It introduces new UI components for table browsing and management, backend services for table operations, and integration with the flow designer through new CatalogReader and CatalogWriter nodes.
Key Changes
Frontend - Catalog Management
New Components:
TableDetailPanel.vue: Displays table metadata, schema, data preview, and read-by flowsRegisterTableModal.vue: Modal for registering new tables from data filesCreateNamespaceModal.vue: Extracted modal for creating catalogs/schemasRegisterFlowModal.vue: Extracted modal for registering flowsEnhanced CatalogView:
Read Node Enhancement:
Frontend - Node Types
Backend - Catalog Service
Table Operations:
get_table(),list_tables(),create_table(),update_table(),delete_table()Lineage Tracking:
list_tables_for_flow(): Get tables produced by a flowlist_readers_for_table(): Get flows that read a tableupsert_read_link(): Track read relationships between flows and tablesAPI Endpoints:
GET/POST /tables: List and register tablesGET /tables/{id}: Get table detailsGET /tables/{id}/preview: Get data previewDELETE /tables/{id}: Delete tablePOST /tables/{id}/read-links: Track table readsDatabase
New Models:
CatalogTable: Stores table metadata (name, description, file path, row/column counts, size)CatalogTableReadLink: Tracks which flows read which tablesSchema Updates:
tablesfield toNamespaceTreetables_producedfield toFlowRegistrationOutCatalogTableOut,CatalogTableCreate,CatalogTableUpdate,CatalogTablePreview,CatalogTableSummaryFlow Graph Integration
State Management
catalog-storewith table selection, preview loading, and all tables listNotable Implementation Details
catalog_tables_directory