Global sharing of artifacts#304
Merged
Edwardvaneechoud merged 35 commits intofeauture/kernel-implementationfrom Feb 6, 2026
Merged
Global sharing of artifacts#304Edwardvaneechoud merged 35 commits intofeauture/kernel-implementationfrom
Edwardvaneechoud merged 35 commits intofeauture/kernel-implementationfrom
Conversation
This feature enables persisting Python objects (ML models, DataFrames, configuration objects) from kernel code and retrieving them later across flows and sessions. Key components: - GlobalArtifact database model with versioning and lineage tracking - Storage backend abstraction (SharedFilesystemStorage, S3Storage) - Core API endpoints for prepare-upload, finalize, retrieve, list, delete - Kernel client functions: publish_global, get_global, list_global_artifacts - Serialization utilities with auto-format detection (parquet, joblib, pickle) Design highlights: - Core API never handles blob data - all binary flows directly to storage - SHA-256 integrity verification on all uploads - Automatic versioning when publishing to same name - Catalog integration via namespace_id https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
Test coverage includes: - Core API endpoint tests (test_artifacts.py): - Upload workflow (prepare + finalize) - Retrieval by name, ID, and version - Listing with filters (namespace, tags, name, python_type) - Pagination - Deletion (by ID and by name) - Error handling and edge cases - Serialization module tests (test_serialization.py): - Format detection for various object types - Pickle, parquet, and joblib round-trips - File and bytes serialization - SHA-256 computation - Edge cases (empty, unicode, binary data) - Storage backend tests (test_artifact_storage.py): - SharedFilesystemStorage operations - Upload preparation and finalization - SHA-256 verification - Download source generation - Deletion with directory cleanup - Concurrent upload handling - Kernel client tests (test_global_artifacts.py): - publish_global function - get_global function - list_global_artifacts function - delete_global_artifact function https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
…facts-feature-01d32
…facts-feature-01d32
Tests verify publish_global, get_global, list_global_artifacts, and delete_global_artifact work correctly when executed from within a kernel container against the live Core API. Includes tests for: - Basic publish/get roundtrip - Metadata (description, tags) - Listing artifacts - Deletion - Versioning - Complex types (numpy, polars, nested dicts, custom classes) - Cross-flow artifact sharing - Error handling (KeyError for missing artifacts) https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
The _has_polars(), _has_pandas(), _has_numpy(), and _has_sklearn() helper functions were defined after being used in @pytest.mark.skipif decorators, causing a NameError during test collection. Moved them to the top of the file. https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
Local classes defined inside test methods cannot be pickled. Mock the serialize_to_file function since this test only verifies that Python type information is captured in the prepare request, not that serialization works. https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
- Add check_pickleable() function to validate objects before serialization - Add UnpickleableObjectError with helpful hints for common issues: - Local classes: suggest moving to module level - Lambda functions: suggest using regular functions - Open file handles: suggest closing resources - Call check_pickleable() in publish_global() before making API calls - Add comprehensive tests for the validation This gives users a clear, actionable error message when they try to publish an unpickleable object instead of a cryptic pickle error. https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
- Move /names and / routes before /{artifact_id} to prevent FastAPI
from matching "names" as an artifact_id parameter
- Fix test_publish_stores_python_type mock path to work with imports
inside the function
https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
Critical fixes: - Add internal service authentication (X-Internal-Token header) for kernel → Core API calls to avoid 401 errors at runtime - Fix mock patch target in test_publish_stores_python_type High priority: - Fix tag filtering to use proper JSON parsing instead of substring matching, avoiding false positives (e.g., "ml" matching "html") Medium priority: - Add logging to silent except blocks in service.py for visibility into storage failures - Optimize check_pickleable to skip full serialization for large objects (>10k elements or >100MB), avoiding double memory usage - Add thread-safe singleton pattern with double-checked locking for storage backend initialization - Handle cross-filesystem rename with shutil.move fallback for Docker environments with multiple volumes Low priority: - Exclude pending/failed artifacts from version numbering to avoid version number gaps https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
A. Tag filtering breaks pagination (CORRECTNESS): - Moved tag filtering to SQL level using SQLite's json_each - Now filtering happens BEFORE limit/offset is applied - Pagination works correctly with tag filters B. Auto-generated internal token won't reach kernel (AUTH): - Removed auto-generation in Docker mode - Fail fast with clear error if FLOWFILE_INTERNAL_TOKEN not set - Only auto-generate in Electron mode (single process) C. Deduplicate JWT validation logic: - Refactored get_user_or_internal_service to delegate to get_current_user - Avoids duplicating JWT validation code D. Hardcoded system user id=1: - Made configurable via FLOWFILE_INTERNAL_SERVICE_USER_ID env var - Default is 1, but deployments can configure the correct system user https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
Since serialize_to_file is mocked and doesn't create a real file, os.path.getsize fails. Mock it to return a fake size. https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
The kernel runtime injects the flowfile_client module as a global
variable named 'flowfile' via exec(code, {"flowfile": flowfile_client}).
Test code should use flowfile.publish_global() directly without
importing, similar to existing tests in test_main.py and
test_kernel_integration.py.
https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
Generated by uv when running tests for the kernel_runtime package. Ensures consistent dependency resolution for CI/CD. https://claude.ai/code/session_012jSDobAZCw2qJz4TPdZwrm
This commit addresses the ConnectError and UnpickleableObjectError failures in the global artifacts kernel integration tests. Changes: 1. **flowfile_core/kernel/manager.py**: Pass environment variables to kernel containers for global artifacts support: - FLOWFILE_CORE_URL: How kernel reaches Core API from inside Docker - FLOWFILE_INTERNAL_TOKEN: Service-to-service auth for kernel → Core - FLOWFILE_KERNEL_ID: Kernel ID for lineage tracking These are passed to both start_kernel() and start_kernel_sync() methods. 2. **kernel_runtime/main.py**: Add __name__ and __builtins__ to exec globals so that classes defined in user code get __module__ = "__main__" instead of "builtins", enabling pickle to work for custom classes. 3. **flowfile_core/tests/kernel_fixtures.py**: Start Core API server before kernel tests: - Generate internal token for kernel ↔ Core auth - Start Core API server in a background thread - Set FLOWFILE_CORE_URL and FLOWFILE_INTERNAL_TOKEN environment variables - Clean up on test completion https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
Introduces a cleaner separation of test fixtures: 1. **kernel_fixtures.py**: Add `start_core` parameter to `managed_kernel()`: - `start_core=False` (default): Kernel-only, no Core API - `start_core=True`: Starts Core API + auth tokens for global artifacts 2. **conftest.py**: Two distinct fixtures: - `kernel_manager`: Kernel-only tests (original behavior) - `kernel_manager_with_core`: Tests requiring kernel ↔ Core integration 3. **test_global_artifacts_kernel_integration.py**: - Updated to use `kernel_manager_with_core` fixture - Updated docstring to reflect fixture usage This separation ensures: - Kernel-only tests don't pay the cost of starting Core - Global artifacts tests get proper Core API setup - Clearer test infrastructure https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
Resolved conflicts in kernel/manager.py by combining: - Global artifacts env vars (FLOWFILE_CORE_URL, FLOWFILE_INTERNAL_TOKEN) - Persistence env vars (KERNEL_ID, PERSISTENCE_ENABLED, PERSISTENCE_PATH, RECOVERY_MODE) https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
Updates version in both: - kernel_runtime/__init__.py (runtime version) - kernel_runtime/pyproject.toml (package metadata) This version includes: - Global artifacts support (publish_global, get_global, etc.) - Pickle fix for custom classes (__name__ in exec globals) - Persistence and recovery features from kernel-implementation https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
Creates .github/workflows/test-kernel-integration.yml that: - Builds the flowfile-kernel Docker image - Runs pytest -m kernel tests - Cleans up Docker resources after tests Triggered on changes to: - kernel_runtime/ - flowfile_core/flowfile_core/kernel/ - flowfile_core/flowfile_core/artifacts/ - kernel test files https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
When patching the docker module, preserve docker.errors as real exception classes so that except clauses work properly. This fixes: - test_start_kernel_passes_persistence_env_vars - test_start_kernel_uses_per_kernel_persistence_config The error was: TypeError: catching classes that do not inherit from BaseException is not allowed https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
When kernel tests run in CI (detected via CI or TEST_MODE env vars), use pytest.fail() instead of pytest.skip() so we can see the actual error message. Also added more verbose logging to kernel_fixtures.py and increased pytest verbosity in the CI workflow. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
The os module was being used for os.environ.get() calls but was not imported, causing "name 'os' is not defined" errors. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
The kernel_manager and kernel_manager_with_core fixtures were both using the same kernel ID "integration-test", causing conflicts when both fixtures were used in the same test session. Now kernel_manager uses "integration-test" and kernel_manager_with_core uses "integration-test-core". https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
Documents the two kernel test fixtures (kernel_manager and kernel_manager_with_core), explains why they exist, and provides guidance for writing new kernel tests. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
The Core API and kernel container were using different paths for artifact staging, causing "Blob not found in storage" errors on finalize. Changes: - storage_config.py: Allow FLOWFILE_SHARED_DIR to override shared_directory in non-Docker mode (previously only worked in Docker mode) - kernel_fixtures.py: Set FLOWFILE_SHARED_DIR to the temp shared volume before starting Core, and reset storage singletons so they use the correct paths Now Core writes staging paths that map to the kernel's /shared mount, allowing artifact uploads to work correctly in integration tests. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
The kernel container has /shared mounted but Core returns host paths (e.g., /var/folders/.../kernel_test_shared_xyz/...). The kernel couldn't write to those paths since they don't exist inside the container. Changes: - manager.py: Pass FLOWFILE_HOST_SHARED_DIR to kernel container with the host's shared volume path - flowfile_client.py: Add _translate_host_path_to_container() to convert host paths to /shared/... container paths for both publish_global and get_global operations This ensures the kernel writes/reads from /shared/artifact_staging/... which maps to the correct location on the host filesystem. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
If a previous test run failed to clean up, the kernel might still exist in the database. The fixture now checks for and deletes any existing kernel with the same ID before creating a new one. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
When FLOWFILE_SHARED_DIR is set (e.g., during tests), the global_artifacts directory is now placed under the shared directory. This ensures the kernel container can access finalized artifacts at /shared/global_artifacts/ after they are moved from staging. Previously, artifacts were staged correctly in the shared directory but moved to ~/.flowfile/global_artifacts/ on finalize, which the kernel container couldn't access. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
1. Added joblib>=1.3.0 to kernel_runtime dependencies - required for numpy array and scikit-learn model serialization 2. Changed serialization to use cloudpickle instead of standard pickle for the "pickle" format. cloudpickle can handle classes defined in exec() code, which is how user code runs in the kernel container. Standard pickle fails with "attribute lookup ... on __main__ failed" for such classes. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
cloudpickle (unlike standard pickle) can serialize: - Local classes defined inside functions - Lambda functions - Objects containing lambdas Updated tests to verify these now succeed instead of raising errors. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
The Dockerfile installs dependencies directly rather than from pyproject.toml, so joblib needs to be added here as well for numpy array serialization. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
1. Extract duplicated env-building code in manager.py into _build_kernel_env() 2. Add TODO comment for S3 SHA-256 verification (currently trusts kernel hash) 3. Add TODO for orphaned pending artifacts reaper in ArtifactService 4. Rename ArtifactNotActiveError to ArtifactStateError (with backwards compat alias) 5. Rename `format` parameter to `fmt` in publish_global() to avoid shadowing builtin 6. Add SQLite-specific warning comment on tag filtering using json_each() 7. Improve path translation using Path.relative_to() for robustness 8. Add security note explaining exec() namespace change for cloudpickle support 9. Make reset_storage_backend() private (_reset_storage_backend) with alias https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
1. Fix path translation bug: .resolve() doesn't work inside container since host paths don't exist there. Now uses os.path.normpath for pure string operations without filesystem access. 2. Add retry logic for version numbering race condition: Two concurrent prepare_upload calls can compute the same next_version. Now catches IntegrityError and retries up to 3 times with fresh version numbers. 3. Update service.py to use ArtifactStateError consistently (was still importing the deprecated ArtifactNotActiveError alias). 4. Add SECURITY comment for pickle deserialization explaining the trust boundary - artifacts are only written by user-controlled kernel code, so the pickle RCE risk is acceptable within this trust model. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
This reverts the unintentional removal of key features: Display system restored: - flowfile.display() for matplotlib, plotly, PIL images, HTML - DisplayOutput model for execute results - _MATPLOTLIB_SETUP to patch plt.show() - _maybe_wrap_last_expression for interactive mode - execute_cell endpoint for notebook-style execution - TestDisplay and TestDisplayTypeDetection test classes read_inputs() signature restored: - Returns dict[str, list[LazyFrame]] (not dict[str, LazyFrame]) - Allows distinguishing between multiple upstream nodes - Tests and frontend docs updated accordingly All files synced from feauture/kernel-implementation branch which has both display system and global artifacts merged correctly. https://claude.ai/code/session_01VHPuLVqPFFzGZBuV1ApFp1
…acts-integration-l3qHA
6c60358
into
feauture/kernel-implementation
12 checks passed
Edwardvaneechoud
added a commit
that referenced
this pull request
Feb 23, 2026
* Add kernel runtime management with Docker containerization (#281) * Add Docker-based kernel system for isolated Python code execution Introduces two components: - kernel_runtime/: Standalone FastAPI service that runs inside Docker containers, providing code execution with artifact storage and parquet-based data I/O via the flowfile client API - flowfile_core/kernel/: Orchestration layer that manages kernel containers (create, start, stop, delete, execute) using docker-py, with full REST API routes integrated into the core backend * Add python_script node type for kernel-based code execution - PythonScriptInput/NodePythonScript schemas in input_schema.py - add_python_script method in flow_graph.py that stages input parquet to shared volume, executes on kernel, reads output back - get_kernel_manager singleton in kernel/__init__.py - python_script node template registered in node_store * Add integration tests for Docker-based kernel system - kernel_fixtures.py: builds the flowfile-kernel Docker image, creates a KernelManager with a temp shared volume, starts a container, and tears everything down via a managed_kernel() context manager - conftest.py: adds session-scoped kernel_manager fixture - test_kernel_integration.py: full integration tests covering: - TestKernelRuntime: health check, stdout/stderr capture, syntax errors, artifact publish/list, parquet read/write round-trip, multiple named inputs, execution timing - TestPythonScriptNode: python_script node passthrough and transform via FlowGraph.run_graph(), plus missing kernel_id error handling - manager.py: expose shared_volume_path as public property - flow_graph.py: use public property instead of private attribute * update poetry version * Fix kernel system: singleton routing, state machine, sync execution, port resilience - routes.py: use get_kernel_manager() singleton instead of creating a separate KernelManager instance (was causing dual-state bug) - models.py: replace RUNNING with IDLE/EXECUTING states; store memory_gb, cpu_cores, gpu on KernelInfo from KernelConfig - manager.py: add _reclaim_running_containers() on init to discover existing flowfile-kernel-* containers and reclaim their ports; port allocation now scans for available ports instead of incrementing; add execute_sync() using httpx.Client for clean sync usage; state transitions: IDLE -> EXECUTING -> IDLE during execute() - flow_graph.py: use execute_sync() instead of fragile asyncio.run/ThreadPoolExecutor dance - test: update state assertion from "running" to "idle" * Fix kernel health check and test fixture resilience - _wait_for_healthy: catch all httpx errors (including RemoteProtocolError) during startup polling, not just ConnectError/ReadError/ConnectTimeout - conftest kernel_manager fixture: wrap managed_kernel() in try/except so container start failures produce pytest.skip instead of ERROR * removing breakpoint * Run kernel integration tests in parallel CI worker - Add pytest.mark.kernel marker to test_kernel_integration.py - Register 'kernel' marker in pyproject.toml - Exclude kernel tests from main backend-tests with -m "not kernel" (both Linux and Windows jobs) - Add dedicated kernel-tests job that runs in parallel: builds Docker image, runs -m kernel tests, 15min timeout - Add kernel_runtime paths to change detection filters - Include kernel-tests in test-summary aggregation * Remove --timeout flag from kernel CI step (pytest-timeout not installed) The job-level timeout-minutes: 15 already handles this. * Add unit tests for kernel_runtime (artifact_store, flowfile_client, endpoints) 42 tests covering the three kernel_runtime modules: - artifact_store: publish/get/list/clear, metadata, thread safety - flowfile_client: context management, parquet I/O, artifacts - main.py endpoints: /health, /execute, /artifacts, /clear, parquet round-trips * Add *.egg-info/ to .gitignore * Add kernel_runtime unit tests to CI kernel-tests job Installs kernel_runtime with test deps and runs its 42 unit tests before the Docker-dependent integration tests. * Add Kernel Manager UI for Python execution environments (#282) * Add kernel management UI for Python execution environments Provides a visual interface for managing Docker-based kernel containers used by Python Script nodes. Users can create kernels with custom packages and resource limits, monitor status (stopped/starting/idle/executing/error), and control lifecycle (start/stop/delete) with auto-polling for live updates. * Update package-lock.json version to match package.json * Handle Docker unavailable gracefully with 503 and error banner The kernel routes now catch DockerException during manager init and return a 503 with a clear message instead of crashing with a 500. The frontend surfaces this as a red error banner at the top of the Kernel Manager page so users know Docker needs to be running. * Add /kernels/docker-status endpoint and proactive UI feedback New GET /kernels/docker-status endpoint checks Docker daemon reachability and whether the flowfile-kernel image exists. The UI calls this on page load and shows targeted banners: red for Docker not running, yellow for missing kernel image, so users know exactly what to fix before creating kernels. * Center kernel manager page with margin auto and padding Match the layout pattern used by other views (SecretsView, DatabaseView) with max-width 1200px, margin 0 auto, and standard spacing-5 padding. * Add artifact context tracking for python_script nodes (#283) * Add ArtifactContext for tracking artifact metadata across FlowGraph Introduces an ArtifactContext class that tracks which Python artifacts are published and consumed by python_script nodes, enabling visibility into artifact availability based on graph topology and kernel isolation. - Create artifacts.py with ArtifactRef, NodeArtifactState, ArtifactContext - Integrate ArtifactContext into FlowGraph.__init__ - Add _get_upstream_node_ids and _get_required_kernel_ids helpers - Clear artifact context at flow start in run_graph() - Compute available artifacts before and record published after execution - Add clear_artifacts_sync to KernelManager for non-async clearing - Add 32 unit tests for ArtifactContext (test_artifact_context.py) - Add 7 FlowGraph integration tests (test_flowfile.py) - Add 5 kernel integration tests (test_kernel_integration.py) * Add delete_artifact support, duplicate publish prevention, and model training integration test - ArtifactStore.publish() now raises ValueError if artifact name already exists - Added ArtifactStore.delete() and flowfile_client.delete_artifact() - ExecuteResult/ExecuteResponse track artifacts_deleted alongside artifacts_published - ArtifactContext.record_deleted() removes artifacts from kernel index and published lists - flow_graph.add_python_script records deletions from execution results - Integration test: train numpy linear regression in node A, apply predictions in node B - Integration test: publish -> use & delete -> republish -> access flow - Integration test: duplicate publish without delete raises error - Unit tests for all new functionality across kernel_runtime and flowfile_core * Support N inputs per name in kernel execution with read_first convenience method - Change input_paths from dict[str, str] to dict[str, list[str]] across ExecuteRequest models (kernel_runtime and flowfile_core) - read_input() now scans all paths for a name and concatenates them (union), supporting N upstream inputs under the same key (e.g. "main") - Add read_first() convenience method that reads only input_paths[name][0] - read_inputs() updated to handle list-based paths - add_python_script now accepts *flowfile_tables (varargs) and writes each input to main_0.parquet, main_1.parquet, etc. - All existing tests updated to use list-based input_paths format - New tests: multi-main union, read_first, read_inputs with N paths * adding multiple paths * Fix O(N) deletion, deprecated asyncio, naive datetimes, broad exceptions, global context, and hardcoded timeout - ArtifactContext: add _publisher_index reverse map (kernel_id, name) → node_ids so record_deleted and clear_kernel avoid scanning all node states - Replace asyncio.get_event_loop() with asyncio.get_running_loop() in _wait_for_healthy (deprecated since Python 3.10) - Use datetime.now(timezone.utc) in artifacts.py and models.py instead of naive datetime.now() - Narrow except Exception to specific types: docker.errors.DockerException, httpx.HTTPError, OSError, TimeoutError in manager.py - Add debug logging for health poll failures instead of silent pass - Replace global _context dict with contextvars.ContextVar in flowfile_client for safe concurrent request handling - Make health timeout configurable via KernelConfig.health_timeout and KernelInfo.health_timeout (default 120s), wired through create/start_kernel * fix binding to input_id * remove breakpoint * Preserve artifact state for cached nodes and add multi-input integration tests Snapshot artifact context before clear_all() in run_graph() and restore state for nodes that were cached/skipped (their _func never re-executed so record_published was never called). Also adds two integration tests exercising multi-input python_script nodes: one using read_input() for union and one using read_first() for single-input access. * Allow python_script node to accept multiple main inputs Change the python_script NodeTemplate input from 1 to 10, matching polars_code and union nodes. With input=1, add_node_connection always replaced main_inputs instead of appending, so only the last connection was retained. * adding fix * Scope artifact restore to graph nodes only The snapshot/restore logic was restoring artifact state for node IDs that were not part of the graph (e.g. manually injected via record_published). * Add Python Script node with kernel and artifact support (#287) * Add PythonScript node drawer with kernel selection, code editor, and artifacts panel Implements the frontend drawer UI for the python_script node type: - Kernel selection dropdown with state indicators and warnings - CodeMirror editor with Python syntax highlighting and flowfile API autocompletions - Artifacts panel showing available/published artifacts from kernel - Help modal documenting the flowfile.* API with examples - TypeScript types for PythonScriptInput and NodePythonScript - KernelApi.getArtifacts() method for fetching kernel artifact metadata * Fix published artifacts matching by using correct field name from kernel API The kernel's /artifacts endpoint returns `node_id` (not `source_node_id`) to identify which node published each artifact. Updated the frontend to read the correct field so published artifacts display properly. * add translator * Split artifacts into available (other nodes) vs published (this node) Available artifacts should only show artifacts from upstream nodes, not the current node's own publications. Filter by node_id !== currentNodeId for available, and node_id === currentNodeId for published. * Add kernel persistence and multi-user access control (#286) * Persist kernel configurations in database and clean up on shutdown Kernels are now stored in a `kernels` table (tied to user_id) so they survive core process restarts. On startup the KernelManager restores persisted configs from the DB, then reclaims any running Docker containers that match; orphan containers with no DB record are stopped. All kernel REST routes now require authentication and enforce per-user ownership (list returns only the caller's kernels, mutations check ownership before proceeding). On core shutdown (lifespan handler, SIGTERM, SIGINT) every running kernel container is stopped and removed via `shutdown_all()`. * Check Docker image availability before starting a kernel start_kernel now explicitly checks for the flowfile-kernel image before attempting to run a container, giving a clear error message ("Docker image 'flowfile-kernel' not found. Please build or pull...") instead of a raw Docker API exception. * Allocate port lazily in start_kernel for DB-restored kernels Kernels restored from the database have port=None since ports are ephemeral and not persisted. start_kernel now calls _allocate_port() when kernel.port is None, fixing the "Invalid port: 'None'" error that occurred when starting a kernel after a core restart. * Add kernel runtime management with Docker containerization (#281) (#290) * Add flowfile.log() method for real-time log streaming from kernel to frontend Enable Python script nodes to stream log messages to the FlowFile log viewer in real time via flowfile.log(). The kernel container makes HTTP callbacks to the core's /raw_logs endpoint, which writes to the FlowLogger file. The existing SSE streamer picks up new lines and pushes them to the frontend immediately. Changes: - Add log(), log_info(), log_warning(), log_error() to flowfile_client - Pass flow_id and log_callback_url through ExecuteRequest to kernel - Add host.docker.internal mapping to kernel Docker containers - Update RawLogInput schema to support node_id and WARNING level - Forward captured stdout/stderr to FlowLogger after execution * Add kernel runtime versioning visible in frontend Add __version__ to the kernel_runtime package (0.2.0) and expose it through the /health endpoint. The KernelManager reads the version when the container becomes healthy and stores it on KernelInfo. The frontend KernelCard displays the version badge next to the kernel ID so users can verify which image version a running kernel is using. * Implement selective artifact clearing for incremental flow execution (#291) * Fix artifact loss in debug mode by implementing selective clearing Previously, run_graph() cleared ALL artifacts from both the metadata tracker and kernel memory before every run. When a node was skipped (up-to-date), the metadata was restored from a snapshot but the actual Python objects in kernel memory were already gone. Downstream nodes that depended on those artifacts would fail with KeyError. The fix introduces artifact ownership tracking so that only artifacts from nodes that will actually re-execute are cleared: - ArtifactStore: add clear_by_node_ids() and list_by_node_id() - Kernel runtime: add POST /clear_node_artifacts and GET /artifacts/node/{id} - KernelManager: add clear_node_artifacts_sync() and get_node_artifacts() - ArtifactContext: add clear_nodes() for selective metadata clearing - Kernel routes: add /clear_node_artifacts and /artifacts/node/{id} endpoints - flow_graph.run_graph(): compute execution plan first, determine which python_script nodes will re-run, and only clear those nodes' artifacts. Skipped nodes keep their artifacts in both metadata and kernel memory. * Add integration tests for debug mode artifact persistence Tests verify that artifacts survive re-runs when producing nodes are skipped (up-to-date) and only consuming nodes re-execute, covering the core bug scenario, multiple artifacts, and producer re-run clearing. * Auto-clear node's own artifacts before re-execution in /execute When a node re-executes (e.g., forced refresh, performance mode re-run), its previously published artifacts are now automatically cleared before the new code runs. This prevents "Artifact already exists" errors without requiring manual delete_artifact() calls in user code. The clearing is scoped to the executing node's own artifacts only — artifacts from other nodes are untouched. * Scope artifacts by flow_id so multiple flows sharing a kernel are isolated The artifact store now keys artifacts by (flow_id, name) instead of just name. Two flows using the same kernel can each publish an artifact called "model" without colliding. All artifact operations (publish, read, delete, list, clear) are flow-scoped transparently via the execution context. * fixing issue in index.ts * Fix artifact not found on re-run when consumer deletes artifact (#294) When a python_script node deletes an artifact (via delete_artifact) and is later re-executed (e.g. after a code change), the upstream producer node was not being re-run. This meant the deleted artifact was permanently lost from the kernel's in-memory store, causing a KeyError on the consumer's read_artifact call. The fix tracks which node originally published each deleted artifact (_deletion_origins in ArtifactContext). During the pre-execution phase in run_graph, if a re-running node previously deleted artifacts, the original producer nodes are added to the re-run set and their execution state is marked stale so they actually re-execute and republish. * Add catalog service layer with repository pattern (#298) * Implement service layer for Flow Catalog system Extract business logic from route handlers into a proper layered architecture: - catalog/exceptions.py: Domain-specific exceptions (CatalogError hierarchy) replacing inline HTTPException raises in service code - catalog/repository.py: CatalogRepository Protocol + SQLAlchemy implementation abstracting all data access - catalog/service.py: CatalogService class owning all business logic (validation, enrichment, authorization checks) - catalog/__init__.py: Public package interface Refactor routes/catalog.py into a thin HTTP adapter that injects CatalogService via FastAPI Depends, delegates to service methods, and translates domain exceptions to HTTP responses. All 33 existing catalog API tests pass with no behavior changes. * Address performance and observability concerns 1. Fix N+1 queries in flow listing (4×N → 3 queries): - Add bulk_get_favorite_flow_ids, bulk_get_follow_flow_ids, bulk_get_run_stats to CatalogRepository - Add _bulk_enrich_flows to CatalogService - Update list_flows, get_namespace_tree, list_favorites, list_following, get_catalog_stats to use bulk enrichment 2. Add tech debt comment for ArtifactStore memory pattern: - Document the in-memory storage limitation for large artifacts - Suggest future improvements (spill-to-disk, external store) 3. Promote _auto_register_flow logging from debug to info: - Users can now see why flows don't appear in catalog - Log success and specific failure reasons 4. Improve _run_and_track error handling: - Use ERROR level for DB persistence failures - Add tracking_succeeded flag with explicit failure message - Log successful tracking with run details - Add context about flow status in error messages * Add artifact visualization with edges and node badges (#288) * Add synchronous kernel management and auto-restart functionality (#296) * Auto-restart stopped/errored kernels on execution instead of raising When a kernel is in STOPPED or ERROR state and an operation (execute, clear_artifacts, etc.) is attempted, the KernelManager now automatically restarts the kernel container instead of raising a RuntimeError. This handles the common case where a kernel was restored from the database after a server restart but its container is no longer running. Changes: - Add start_kernel_sync() and _wait_for_healthy_sync() for sync callers - Add _ensure_running() / _ensure_running_sync() helpers that restart STOPPED/ERROR kernels and wait for STARTING kernels - Replace RuntimeError raises in execute, execute_sync, clear_artifacts, clear_node_artifacts, and get_node_artifacts with auto-restart calls * adding stream logs * adding flow logger * Pass flow_logger through _ensure_running_sync to start_kernel_sync - _ensure_running_sync now logs restart attempt to flow_logger - Passes flow_logger to start_kernel_sync so users see kernel restart progress in the flow execution log - Fix bug: error handler was calling logger.error instead of flow_logger.error * Pass flow_logger to clear_node_artifacts_sync in flow_graph * Add tests for kernel auto-restart on stopped/errored state New TestKernelAutoRestart class with 4 tests: - test_execute_sync_restarts_stopped_kernel - test_execute_async_restarts_stopped_kernel - test_clear_node_artifacts_restarts_stopped_kernel - test_python_script_node_with_stopped_kernel Each test stops the kernel, then verifies that the operation auto-restarts it instead of raising RuntimeError. * fix ref to python image * adding python image * fixing img * Add comprehensive README for kernel_runtime (#301) Document how to build and run the Docker image, API endpoints, the flowfile module usage for data I/O and artifact management, and development setup instructions. * Fix parquet corruption race condition in kernel execution (#302) Add fsync calls after writing parquet files to ensure they are fully flushed to disk before being read. This prevents "File must end with PAR1" errors that occur when the kernel reads input files or the host reads output files before the PAR1 footer is fully written. The issue occurs because write_parquet() may leave data in OS buffers, and when sharing files between host and Docker container via mounted volumes, the reader can see an incomplete file. * Add artifact persistence and recovery system to kernel runtime (#299) * Add interactive display outputs for notebook-like cell execution (#303) * Add display output support for rich notebook-like rendering Backend changes: - Add flowfile.display() function to flowfile_client.py that supports matplotlib figures, plotly figures, PIL images, HTML strings, and plain text - Add DisplayOutput model to ExecuteResponse with mime_type, data, and title - Patch matplotlib.pyplot.show() to auto-capture figures as display outputs - Add _maybe_wrap_last_expression() for interactive mode auto-display - Add interactive flag to ExecuteRequest for cell execution mode - Add /execute_cell endpoint that enables interactive mode Frontend changes: - Add DisplayOutput and ExecuteResult interfaces to kernel.types.ts - Add executeCell() method to KernelApi class - Add display() completion to flowfileCompletions.ts Tests: - Add comprehensive tests for display(), _reset_displays(), _get_displays() - Add tests for display output in execute endpoint - Add tests for interactive mode auto-display behavior * Fix review issues in display output support - Add _displays.set([]) to _clear_context() for consistent cleanup - Fix _is_html_string false-positives by using regex to detect actual HTML tags - Export DisplayOutput from flowfile_core/kernel/__init__.py - Change base64 decode to use ascii instead of utf-8 - Simplify _maybe_wrap_last_expression using ast.get_source_segment - Add tests for HTML false-positive cases (e.g., "x < 10 and y > 5") * Polish PythonScript.vue styling for more compact layout (#306) - Reduce spacing: settings gap (0.65rem), block gap (0.3rem), label font-size (0.8rem) - Make kernel selector compact with size="small" and smaller warning box - Improve editor container with subtle inner shadow and tighter border-radius - Make artifacts panel lighter with transparent bg and top border only - Move help button inline with Code label to save vertical space * Change read_inputs() to return list of LazyFrames per input (#309) * Fix read_inputs() to return list of LazyFrames for multiple inputs Previously, when multiple nodes were connected to a python_script node, read_inputs() would concatenate all inputs into a single LazyFrame, making it impossible to distinguish between different input sources. Now read_inputs() returns dict[str, list[pl.LazyFrame]] where each entry is a list of LazyFrames, one per connected input. * Update help and autocomplete docs for read_inputs() return type - FlowfileApiHelp.vue: Updated description and example to show that read_inputs() returns dict of LazyFrame lists - flowfileCompletions.ts: Updated info and detail to reflect the new return type signature * Fix tests for read_inputs() returning list of LazyFrames Updated tests to expect dict[str, list[LazyFrame]] return type: - test_read_inputs_returns_dict: check for list with LazyFrame element - test_multiple_named_inputs: access inputs via [0] index - test_read_inputs_with_multiple_main_paths: verify list length and values - test_multiple_inputs: access inputs via [0] index in code string * Fix test_multiple_inputs in test_kernel_integration.py Updated code string to access read_inputs() results via [0] index since it now returns dict[str, list[LazyFrame]]. * Add comprehensive CLAUDE.md for AI-assisted development (#311) * Global sharing of artifacts (#304) * Add schema_callback to python_script node to fix slow editor opening (#316) When clicking a python_script node before running the flow, the editor took ~17s because get_predicted_schema fell through to execution, triggering upstream pipeline runs and kernel container startup. Add a schema_callback that returns the input node schema as a best-effort prediction, matching the pattern used by other node types like output. * Link global artifacts to flow registrations (#312) * Link global artifacts to registered catalog flows Every global artifact now requires a source_registration_id that ties it to a registered catalog flow. The artifact inherits the flow's namespace_id by default, with the option to override explicitly. Key changes: - Add source_registration_id FK column to GlobalArtifact model - Make source_registration_id required in PrepareUploadRequest schema - Validate registration exists and inherit namespace_id in artifact service - Block flow deletion when active (non-deleted) artifacts reference it - Add FlowHasArtifactsError exception for cascade protection - Pass source_registration_id through kernel execution context - Hard-delete soft-deleted artifacts when their flow is deleted - Add comprehensive tests for all new behaviors (38 tests pass) * Fix kernel_runtime publish_global tests: add source_registration_id context The publish_global function now requires source_registration_id in the execution context. Add autouse fixtures to TestPublishGlobal and TestGlobalArtifactIntegration that set up the flowfile context with source_registration_id before each test. * Fix kernel integration tests: pass source_registration_id in ExecuteRequest Add test_registration fixture that creates a FlowRegistration in the DB, and pass its ID through ExecuteRequest and _create_graph for all tests that call publish_global. This satisfies the required source_registration_id validation in both the kernel context and Core API. * Add Jupyter-style notebook editor for Python Script nodes (#308) * Add artifact management and display to catalog system (#317) * Fix artifact availability to only show upstream DAG ancestors (#320) The PythonScript node's artifact panel was showing all artifacts from the kernel regardless of DAG structure. When two independent chains shared a kernel, artifacts from one chain would incorrectly appear as "available" in the other chain. The fix adds a backend endpoint that exposes upstream node IDs via DAG traversal, and updates the frontend to filter kernel artifacts using this DAG-aware set instead of showing all non-self artifacts. - Backend: Add GET /flow/node_upstream_ids endpoint - Frontend: FlowApi.getNodeUpstreamIds() fetches upstream IDs - Frontend: PythonScript.vue filters artifacts by upstream set - Tests: Add chain isolation tests for ArtifactContext - Tests: Add endpoint test for node_upstream_ids * Fix cursor styling in CodeMirror editor (#321) * Refactor path resolution and add Docker integration tests (#323) * Offload collect() in add_python_script from core to worker (#327) The add_python_script method was calling collect().write_parquet() directly on the core process, which is undesirable for performance. This change offloads the collect and parquet writing to the worker process using the existing ExternalDfFetcher infrastructure. Changes: - Add write_parquet operation to worker funcs.py that deserializes a LazyFrame, collects it, and writes to a specified parquet path with fsync - Add write_parquet to OperationType in both worker and core models - Add kwargs support to ExternalDfFetcher and trigger_df_operation so custom parameters (like output_path) can be passed through both WebSocket streaming and REST fallback paths - Update REST /submit_query/ endpoint to read kwargs from X-Kwargs header - Replace direct collect().write_parquet() in add_python_script with ExternalDfFetcher using the new write_parquet operation type * Add interactive mode support to publish_global (#328) * Fix Docker Kernel E2E test collection failure (#330) Scope pytest collection to tests/integration/ to avoid importing conftest.py from flowfile_core/tests, flowfile_frame/tests, and flowfile_worker/tests, which fail with ModuleNotFoundError due to ambiguous 'tests' package names. Also remove a stray breakpoint() that would hang CI. * Add kernel memory monitoring and OOM detection (#331) * Add kernel memory usage display and OOM detection Display live container memory usage near the kernel selector in the Python Script node and in the Kernel Manager cards. The kernel runtime reads cgroup v1/v2 memory stats, flowfile_core proxies them, and the frontend polls every 3 seconds with color-coded warnings at 80% (yellow) and 95% (red). When a container is OOM-killed during execution, the error is detected via Docker inspect and surfaced as a clear "Kernel ran out of memory" message instead of a generic connection error. * Fix 500 errors on kernel memory stats endpoint Catch httpx/OS errors in get_memory_stats and convert them to RuntimeError so the route handler returns a proper 400 instead of an unhandled 500. This prevents console spam when the kernel runtime container hasn't been rebuilt with the /memory endpoint yet. * Improve error handling for missing upstream inputs in flowfile_client (#333) * Fix source_registration_id lost during flow restore (#334) When a flow is restored (undo/redo or loaded from YAML/JSON), the source_registration_id was not included in the FlowfileSettings serialization schema. This caused publish_global to fail with "source_registration_id is required" because the kernel received None instead of the catalog registration ID. Changes: - Add source_registration_id to FlowfileSettings schema - Include source_registration_id in get_flowfile_data() serialization - Include source_registration_id in _flowfile_data_to_flow_information() deserialization - Preserve source_registration_id in restore_from_snapshot() so undo/redo doesn't lose it even when the snapshot predates the registration * Add Pydantic schemas for artifact metadata responses (#335) * Fix backward compat for source_registration_id in legacy pickle files (#337) * Add shared_location() API for accessing shared directory files (#329) * Display flow run outputs in Python script node editor (#340) * Add kernel execution cancellation via SIGUSR1 signal (#339) * Add kernel execution support to custom node designer (#342) * Handle unsafe path injection sql * fix linting issues in vue code * adding kernel architecture * Adding kernel docs * Add improved limitation * ensure that sql source test does not fail because of higher abstraction in execution * revert name change in tabel * remove breakpoints * Removing docker check from electron app
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.