Route lightweight transforms to local samping strategy#272
Merged
Edwardvaneechoud merged 14 commits intomainfrom Jan 29, 2026
Merged
Route lightweight transforms to local samping strategy#272Edwardvaneechoud merged 14 commits intomainfrom
Edwardvaneechoud merged 14 commits intomainfrom
Conversation
_determine_strategy never returned LOCAL_WITH_SAMPLING — all remote execution went to REMOTE regardless of node type. Nodes registered in nodes_with_defaults (select, sample, union, sort, record_count) are lightweight transforms that can compute locally with an external sampler providing preview data. Changes: - Route nodes with has_default_settings to LOCAL_WITH_SAMPLING in _determine_strategy (wide transforms are still overridden to REMOTE by the existing optimize_for_downstream logic in execute()) - Add detailed strategy logging: node_type, transform_type, has_default, and whether the strategy was overridden - Add comprehensive executor tests covering strategy determination for select, sort, and manual_input nodes, the wide-transform override logic, and parametrized tests for all nodes_with_defaults https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
…PLING routing The condition for LOCAL_WITH_SAMPLING should be based on whether the node is a narrow transform, not whether it has default settings. nodes_with_defaults is about whether a node can run without user config — it's unrelated to execution strategy. Narrow transforms (select, sample, union) are lightweight column-level operations safe to compute locally. Wide transforms (sort, record_count) reshape data and should always go REMOTE. Updated tests to reflect that sort goes directly to REMOTE from _determine_strategy, not via the override in execute(). https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
When cache_results is enabled on a node, the result must be fully materialized and stored by the remote worker. This means even narrow transforms (select, sample, union) that would normally use LOCAL_WITH_SAMPLING must go REMOTE. Strategy priority in _determine_strategy: 1. local → FULL_LOCAL 2. cache_results → REMOTE 3. narrow transform → LOCAL_WITH_SAMPLING 4. everything else → REMOTE Added TestCacheResultsStrategy with 5 tests covering cache override for narrow/wide transforms and the full decision flow. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
✅ Deploy Preview for flowfile-wasm canceled.
|
Both the fast-path (_can_skip_execution_fast) and the executor (_decide_execution) used to fall through to re-execution when a node had already run but no external cache file existed on disk. This happens for LOCAL_WITH_SAMPLING nodes where results live in memory. Now: - Already ran + no cache_results → skip (results in memory) - Already ran + cache_results + cache present → skip - Already ran + cache_results + cache missing → re-run (rebuild cache) - Never ran → run with appropriate strategy - Forced refresh / output node / source changed → run Removed the PERFORMANCE_MODE and SETTINGS_CHANGED re-run paths for already-ran nodes. The fast-path on FlowNode already skipped performance mode, and SETTINGS_CHANGED was misleading (settings hadn't changed — the cache file was just missing). https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
Remove verbose debug info from log line ([node_type=..., transform_type=..., has_default=...]) and the unused initial_strategy variable. The log now shows just: "Starting to run select (NEVER_RAN -> LOCAL_WITH_SAMPLING)" Add "Execution Strategy" section to the Core Internals developer docs covering the three strategies, the decision pipeline, LOCAL_WITH_SAMPLING vs REMOTE flow diagrams, and node classification table. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
The docs described the strategy routing but didn't connect it to the execution modes visible in the frontend. Restructured to lead with what users see (Development vs Performance mode) and then explain the underlying strategies. Added a section for each mode, clarified that LOCAL_WITH_SAMPLING only applies in Development mode, and updated the node classification table with a Performance mode column. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
Sort (reordering) is a wide operation. Narrow transforms do dropping, renaming, and filtering. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
Tests the full preview data path: - TestPreviewAfterExecution: preview returns sample data after run, caps at 100 rows, returns empty before run, doesn't recompute, caches across repeated calls - TestPreviewAfterUpstreamChange: documents that node_stats.has_completed_last_run is NOT reset when settings change, causing stale preview data. Also verifies preview is correct after re-running. Adds create_graph_with_read_and_select helper using fake_data.parquet (1000 rows) for integration-level preview testing. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
When sibling nodes run in parallel and both serialize the same upstream LazyFrame, Polars' internal RefCell can raise "Already borrowed". This is a thread contention issue in the core process during LazyFrame serialization, not a logic error. After a parallel stage completes, any node that failed with "Already borrowed" is now retried sequentially — the contention is gone because no other thread is accessing the shared LazyFrame. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
When sibling nodes run in parallel and both access the same upstream LazyFrame, Polars' internal RefCell raises "Already borrowed". Fix: each node acquires its input nodes' _execution_lock before calling _function(*input_data), holding it through the LazyFrame access. Sibling nodes sharing a parent now serialize their access to the parent's data. Nodes with different parents still run fully parallel. Reverts the retry-based approach in favor of preventing the conflict entirely. https://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements intelligent routing of execution strategies based on node types. Lightweight transforms (select, sample, union, sort, record_count) now use
LOCAL_WITH_SAMPLINGstrategy when running remotely, enabling fast local computation with external sampling for preview data. Wide transforms are further optimized by overriding back toREMOTEwhen optimizing for downstream nodes.Key Changes
Strategy Determination Logic: Modified
_determine_strategy()to check if a node hashas_default_settings=Trueand route toLOCAL_WITH_SAMPLINGfor remote execution, while maintainingFULL_LOCALfor local execution andREMOTEfor nodes without defaults.Wide Transform Override: Added logic in
execute()to override wide transforms (sort, record_count) fromLOCAL_WITH_SAMPLINGback toREMOTEwhen optimizing for downstream nodes, ensuring full data is available for dependent operations.Enhanced Logging: Improved execution logging to include node type, transform type, default settings status, and strategy override information for better debugging and observability.
Comprehensive Test Coverage: Added extensive test suite covering:
Implementation Details
node_default.has_default_settings=Trueare considered lightweight transforms suitable for local execution with samplingrun_location="local") always usesFULL_LOCAL, bypassing the sampling strategyhttps://claude.ai/code/session_01DaTkkXLAJyC2EfopwiY55d