docs: add plan for workflow chaining#552
Conversation
Proposes replacing the in-place allow_resize mechanism with a Pipeline class that chains multiple generation stages. Each stage gets a fresh fixed-size tracker, and resize becomes a between-stage concern.
…ntrols, edge cases
greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`, which attributes to whichever data_designer frame called `load()` — controllers, services, list/reset commands, agent introspection. Every real call path lands on `data_designer.cli.*`, which falls under Python's default `ignore::DeprecationWarning` filter and is silenced. Audit found two more sites with the same problem: - `DatasetBuilder._resolve_async_compatibility` (`allow_resize` / issue #552) — was using `stacklevel=4` to walk past `_resolve_async_compatibility -> build/build_preview -> interface -> user`. Brittle: any added frame (decorator, async wrapping, the `try/except DeprecationWarning: raise` boundary) shifts attribution silently. The existing test passed only because it used `simplefilter("always") + record=True`, which records warnings regardless of attribution. - `ProviderController._handle_change_default` — was using `stacklevel=2`, which lands on the menu dispatcher in the same controller module. `print_warning` already shows the message visually, but programmatic observers (`pytest.warns`, `filterwarnings("error", ...)`) saw a library-attributed entry that default filters silenced. All three migrated to `warn_at_caller` (the helper from 247fa30) so attribution lands on the user's call site regardless of internal chain shape. `data_designer` is already in `DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library in one pass. Added attribution regression tests at each site asserting `warning.filename == __file__`. A future regression to `warnings.warn(stacklevel=N)` now fails CI instead of silently silencing the user-facing nudge: - `test_load_with_yaml_default_attributes_warning_to_caller` (test_provider_repository.py) - `test_resolve_async_compatibility` extended with the same assertion - `test_handle_change_default_emits_deprecation_warning` rewritten from `pytest.warns(...)` to a `catch_warnings(record=True)` block that filters for the message and asserts `filename == __file__` (`pytest.warns` does not check attribution, so the rewrite is required to actually catch the regression). 3,125 tests pass (548 config + 1,923 engine + 654 interface). Refs #589
* feat(models): deprecate implicit default provider routing Emit DeprecationWarning whenever the legacy "implicit default provider" path is exercised: `ModelConfig.provider=None`, the registry-level `ModelProviderRegistry.default`, the YAML `default:` key in `~/.data-designer/model_providers.yaml`, and the CLI's "Change default provider" workflow. `resolve_model_provider_registry` skips passing `default=` in the single-provider case so the common construction path stays quiet. Multi-provider registries still pass `default` (per `check_implicit_default`) and warn accordingly. Update docs, the package README, and test fixtures to specify `provider=` explicitly on every `ModelConfig`. New tests cover each warning entry point and pin the post-deprecation happy paths. Refs #589 Made-with: Cursor * fix(models): address PR #594 review feedback Greptile P1: ProviderRepository.load emitted its DeprecationWarning inside a `try/except Exception` block. Under `filterwarnings("error", DeprecationWarning)` the warn would raise, the except would swallow it, and `load()` would silently return None (losing the registry). Move the warn outside the catch-all so the strict-warning path no longer drops valid configs. Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and `_warn_on_explicit_default` use `stacklevel=2`, which lands inside pydantic v2's validator dispatch rather than at the user's `ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke both attribution (the source line was unhelpful) and Python's once-per-location dedup (every call collapsed to the same pydantic-internal key, suppressing all but the first warning). Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`, which walks past the helper, validator, and any pydantic frames to find the user's call site and emits via `warnings.warn_explicit` with the user frame's `__warningregistry__`. Keeps attribution accurate and dedup keyed on the user's (filename, lineno). johnnygreco: align the `provider_repository.py` warning copy with the sibling site in `default_model_settings.py` ("specify provider= explicitly on each ModelConfig instead") so both YAML-default warning sites give the same migration instruction. The previous wording pointed users at "ModelConfig entries" inside `model_providers.yaml`, where ModelConfig entries don't actually live. johnnygreco: dedup the cascade in `DataDesigner.__init__`. With `model_providers=None` and a YAML `default:`, the user previously saw two DeprecationWarnings for the same root cause — `get_default_provider_name()` warns about the YAML key, then `resolve_model_provider_registry(...)` re-warns from `_warn_on_explicit_default`. Suppress the registry-level duplicate in the YAML-fallback branch via `warnings.catch_warnings()` so users see exactly one warning per user action. johnnygreco: tighten `_warn_on_explicit_default` to fire only when `default is not None`. Passing `default=None` explicitly is semantically equivalent to omitting it (caller is opting *out* of a registry-level default), and shouldn't trigger the deprecation nudge. johnnygreco: add a `model_validate({...})` regression test for `ModelConfig` so the deserialization path (legacy on-disk configs) is pinned alongside the construction path. Tests: - Update `test_load_exists` and `test_save` to omit `default=` so the roundtrip stops exercising the deprecated YAML-default path unguarded (Greptile note). - Wrap `test_resolve_model_provider_registry_with_explicit_default`, `test_get_provider`, and `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default` in `pytest.warns` so the suite stays green under `-W error::DeprecationWarning` (Greptile note). - Add `test_explicit_default_none_does_not_emit_deprecation_warning` to pin the tightened predicate. - Add `test_init_yaml_default_emits_single_deprecation_warning` to pin the cascade-dedup behavior. Refs #589 Made-with: Cursor * fix(models): make deprecation warnings visible under default filters andreatgretel (PR #594): the YAML-default warning in `get_default_provider_name` and the registry-default warning emitted from inside DataDesigner helpers were attributing to data_designer library frames, not user code. Python's default filter chain includes `ignore::DeprecationWarning`, so library-attributed entries are silenced — meaning a normal `DataDesigner()` call with a YAML `default:` set showed nothing, and `resolve_model_provider_registry` warnings were similarly invisible. Two related changes: 1. `warn_at_caller`: extend the default skip-list from `("pydantic",)` to `("pydantic", "pydantic_core", "data_designer")` so the walk escapes both pydantic's validator-dispatch frames and data_designer helper frames before attributing. Also tighten the prefix predicate to exact-or-dotted-prefix matching (`name == p or name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not falsely matched as part of `pydantic` (johnnygreco nit). Allow callers to pass a custom `skip_prefixes` for flexibility. Drop the "skip frame 0+1 unconditionally" guard now that prefix matching covers it. 2. `get_default_provider_name`: switch from `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous stacklevel pointed into `default_model_settings.py`, which is a library file → silenced under default filters. Verified the fix empirically with `python -W default`: warning is now attributed to the user's call site and rendered. johnnygreco (PR #594): add the missing `test_explicit_default_none_does_not_emit_deprecation_warning` regression for the `self.default is not None` predicate landed in the prior round. Tests: - New `test_warning_helpers.py` pins prefix-matching precision (rejects `pydantic_helpers` / `data_designer_other`), default skip-list contents, attribution past skip-prefix frames, and per-call-site dedup behavior. - `test_get_default_provider_name_warning_attributes_to_user_frame` pins andreatgretel's repro for the YAML-default site. - `test_explicit_default_warning_attributes_to_user_frame` pins the multi-frame case: construction goes through `resolve_model_provider_registry`, so the walk has to escape both pydantic and data_designer before landing on the test file. - `test_explicit_default_none_does_not_emit_deprecation_warning` pins johnnygreco's predicate-tightening regression. 3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net from this round). Refs #589 Made-with: Cursor * fix(models): apply warn_at_caller to remaining deprecation sites greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`, which attributes to whichever data_designer frame called `load()` — controllers, services, list/reset commands, agent introspection. Every real call path lands on `data_designer.cli.*`, which falls under Python's default `ignore::DeprecationWarning` filter and is silenced. Audit found two more sites with the same problem: - `DatasetBuilder._resolve_async_compatibility` (`allow_resize` / issue #552) — was using `stacklevel=4` to walk past `_resolve_async_compatibility -> build/build_preview -> interface -> user`. Brittle: any added frame (decorator, async wrapping, the `try/except DeprecationWarning: raise` boundary) shifts attribution silently. The existing test passed only because it used `simplefilter("always") + record=True`, which records warnings regardless of attribution. - `ProviderController._handle_change_default` — was using `stacklevel=2`, which lands on the menu dispatcher in the same controller module. `print_warning` already shows the message visually, but programmatic observers (`pytest.warns`, `filterwarnings("error", ...)`) saw a library-attributed entry that default filters silenced. All three migrated to `warn_at_caller` (the helper from 247fa30) so attribution lands on the user's call site regardless of internal chain shape. `data_designer` is already in `DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library in one pass. Added attribution regression tests at each site asserting `warning.filename == __file__`. A future regression to `warnings.warn(stacklevel=N)` now fails CI instead of silently silencing the user-facing nudge: - `test_load_with_yaml_default_attributes_warning_to_caller` (test_provider_repository.py) - `test_resolve_async_compatibility` extended with the same assertion - `test_handle_change_default_emits_deprecation_warning` rewritten from `pytest.warns(...)` to a `catch_warnings(record=True)` block that filters for the message and asserts `filename == __file__` (`pytest.warns` does not check attribution, so the rewrite is required to actually catch the regression). 3,125 tests pass (548 config + 1,923 engine + 654 interface). Refs #589
…, fingerprint feature available - Update allow_resize framing: now logs DeprecationWarning and falls back to sync (#553), no longer hard-rejected. Async is default as of #592. - Reference DataDesignerConfig.fingerprint() (#587) as the per-stage hash for resume invalidation. - Rename _validate_async_compatibility() to _resolve_async_compatibility() to match current code. - Mark Phase 2 step 1 as done; list the concrete docs that still need updates.
…ant, on-disk handoffs, DAG-ready, acreate sidecar - Resolve in-memory vs on-disk handoff to always-on-disk inside Pipeline; reserve in-memory for to_config_builder() notebook ergonomic. - Add Composability section: parent DataDesigner reuse is a load-bearing API contract for throttle coordination across stages and parallel branches. - Add Engine API surface section: acreate() as a small additive sidecar, independent of chaining v1 but a hard dependency for Phase 4. - Promote DAG semantics from "future work" to "designed-in"; add Phase 4 (parallel branches via asyncio.gather over acreate); demote auto-chaining to Phase 5. - New Resolved decisions section captures the three load-bearing API decisions; trim the Open questions list accordingly. - Mention possible future external orchestration only as a vague composability constraint, no commitment.
- Soften "Door open for external orchestration" - drop throttle-backend-as-seam framing; cross-reference Future considerations. - Make acreate() scope explicit (in-process); cross-process orchestration is not the same problem. - Add Phase 4 scope clarifier - branch parallelism, not stage pipelining. - New Future considerations section: external orchestration (vague, uncommitted) and pipelined execution of dependent stages.
Review: PR #552 — docs: add plan for workflow chainingSummaryThis PR adds a single planning document at
No code changes. FindingsArchitectural alignment — strong
Completeness — mostly complete, with a few areas to tighten before implementation
Feasibility — feasible as structured
Minor observations
VerdictStrong plan. Architecturally aligned with the project's layering and invariants, feasibility-grounded in existing primitives, and thoughtful about forward compatibility (DAG-internal-v1, acreate-as-sidecar). The one place the plan would benefit from a tightening pass before implementation is the stage data contract and fingerprint scope: callback output layout, callback fingerprinting for resume, and inclusion of Recommend approving the plan with a note to resolve items 1–3 above before starting Phase 1 implementation. |
Greptile SummaryThis PR adds a design plan for workflow chaining in DataDesigner: a
|
| Filename | Overview |
|---|---|
| plans/workflow-chaining/workflow-chaining.md | New design plan for composite workflow chaining. Most design gaps from earlier reviews (resume identity, duplicate names, DAG fingerprint ordering, callback output validation, allow_empty contract) are explicitly addressed. Two unresolved items remain: Phase 4's sync→async bridge inside workflow.run() is unspecified, and the behavior of a failed stage during resume is ambiguous. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[DataDesigner.compose_workflow name=required] --> B[CompositeWorkflow]
B --> C[add_stage name, config, num_records]
C --> D{Duplicate name?}
D -- Yes --> E[raise DataDesignerWorkflowError]
D -- No --> F[Stage added to DAG]
F --> G[workflow.run]
G --> H{Phase 3: check workflow-metadata.json}
H -- fingerprint match + completed --> I[Skip stage]
H -- fingerprint match + partial --> J[create resume=ALWAYS via #526]
H -- fingerprint mismatch --> K[Run stage fresh]
K --> L[DataDesigner.create]
J --> L
L --> M{on_success callback?}
M -- Yes --> N[callback writes resolved path]
M -- No --> O[stage parquet output]
N --> O
O --> P{allow_empty?}
P -- No, 0 rows --> Q[raise DataDesignerWorkflowError]
P -- Yes, 0 rows --> R[completed_empty downstream skipped_empty_upstream]
P -- rows > 0 --> S[completed seeds next stage]
S --> T[Next stage]
T --> G
S --> U[workflow-metadata.json]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
plans/workflow-chaining/workflow-chaining.md:393-408
**`workflow.run()` sync→async bridge unspecified for Phase 4**
Phase 4 says `workflow.run()` will gather parallel branches via `asyncio.gather` over `acreate()` calls internally, but `workflow.run()` is synchronous in v1. The sidecar section says `acreate()` bridges the background-loop future "into the caller's loop via `asyncio.wrap_future`", which requires a running event loop at the call site. A synchronous `workflow.run()` calling `asyncio.gather(...)` would need `asyncio.run(...)` or `loop.run_until_complete(...)`, both of which raise `RuntimeError` if called inside Jupyter's already-running event loop — exactly the environment highlighted as a primary use case.
The plan should specify whether Phase 4 introduces a complementary `workflow.arun()` (leaving `workflow.run()` sync-safe), or whether `workflow.run()` becomes async (a breaking change), or whether it falls back to the same singleton-background-loop pattern already used by `_build_async` for its concurrent.futures bridging. Leaving this unresolved will force the decision at implementation time when API compatibility is harder to change.
### Issue 2 of 2
plans/workflow-chaining/workflow-chaining.md:381-391
**`failed` stage resume treatment is unspecified**
Phase 3 says "skip stages whose fingerprints match and are complete" and delegates partial stages to `create(..., resume=ResumeMode.ALWAYS)`, but it never defines how a `failed` stage is classified. A stage can fail after writing several batches of parquet output (partial) or before writing any output (empty). In the first case `ResumeMode.ALWAYS` would recover intra-stage progress via #526; in the second it's effectively a clean restart. Without an explicit rule, implementers must make this call ad-hoc. The Phase 3 bullet list should add: a `failed` stage with existing partial output in its directory is treated the same as a partial stage (delegate to `ResumeMode.ALWAYS`); a `failed` stage with an empty or missing directory is started fresh.
Reviews (9): Last reviewed commit: "docs: require unique composite workflow ..." | Re-trigger Greptile
|
|
||
| ### Part 1: Pipeline class | ||
|
|
||
| A new `Pipeline` class in `data_designer.interface` that orchestrates multi-stage generation. |
There was a problem hiding this comment.
What do you think about using a name like WorkflowChain instead? I'm a bit concerned that Pipeline sounds too much like we are exposing the "Data Designer pipeline builder" (which isn't a thing haha).
There was a problem hiding this comment.
is it strictly going to be a linked list or can it also be a DAG !?
There was a problem hiding this comment.
good call - changed the public api to CompositeWorkflow via data_designer.compose_workflow(name=...). the plan now says v1 exposes linear composition, but the internal stage model is dag-shaped, so this is not strictly a linked list.
johnnygreco
left a comment
There was a problem hiding this comment.
Can't wait for this to be built!
Are you thinking that each stage can have a different RunConfig. Something to think about for the future is a world where different stages can run on different compute either in parallel or serially.
| pipeline.add_stage("personas", config_personas, num_records=100) | ||
| pipeline.add_stage("conversations", config_convos, num_records=1000) # explode: 100 -> 1000 | ||
| pipeline.add_stage("judged", config_judge) # defaults to previous stage's output size |
There was a problem hiding this comment.
Here we assume the seed for the conversations stage is what's in the parquet-files folder generated by personas. What if conversations actually depended on outcome of the schema transform post processor?
There was a problem hiding this comment.
clarified the v1 contract: downstream stages seed from the upstream final dataset only. named processor outputs, schema-transform artifacts, dropped columns, and media need an on_success bridge in v1, with first-class artifact seeding called out as future work.
| result.to_config_builder(columns=["name", "age", "background"]) # optional column selection | ||
| .add_column(name="conversation", column_type="llm_text", prompt="...") | ||
| ) | ||
| result_2 = dd.create(config_convos, num_records=1000) |
There was a problem hiding this comment.
wouldn't this re-do all the columns in the config_personas?
There was a problem hiding this comment.
another way, perhaps, is another seed source which can operate from the PreviewResults class to connect the dots?
There was a problem hiding this comment.
clarified that to_config_builder() starts a new config seeded from existing results, rather than mutating or resuming the original config. kept PreviewResults in phase 1, and richer result/preview seed ergonomics can build on that.
| **Auto-chaining from a single config (future):** | ||
|
|
||
| The engine detects columns that were previously `allow_resize=True` (or a new marker like `stage_boundary=True`) and auto-splits the DAG into stages. This is a convenience layer on top of the explicit API - not required for v1. |
There was a problem hiding this comment.
This is probably overkill... even for the future!?
There was a problem hiding this comment.
agreed, demoted this out of the phase plan. it is now just a future consideration and explicitly not part of the initial roadmap.
|
|
||
| Each stage seeds from the **previous stage's final dataset** - the post-processor output with dropped columns excluded. This is the same DataFrame returned by `DatasetCreationResults.load_dataset()`. | ||
|
|
||
| Processor outputs (named processor artifacts) and media assets (images stored on disk with relative paths in the DataFrame) are NOT automatically forwarded. If a downstream stage needs image columns from an upstream stage, the pipeline must resolve image paths relative to the upstream stage's artifact directory. This needs explicit handling - TBD in implementation. |
There was a problem hiding this comment.
as a bridge, we can have documentation for the downstream stage to use use expression column configs to modify the relative paths of the media files
There was a problem hiding this comment.
added this bridge to the media open question: v1 can document using on_success or expression columns to rewrite relative media paths against the upstream artifact directory.
| pipeline.add_stage( | ||
| "enriched", | ||
| config_enrich, | ||
| after=filter_high_quality, # runs on stage output before next stage seeds from it |
There was a problem hiding this comment.
perhaps name these to look like a call back?
on_success=filter_high_quality
There was a problem hiding this comment.
done, renamed the hook to on_success / on_success_version.
| ) | ||
| ``` | ||
|
|
||
| The callback receives the path to the completed stage's artifact directory (containing `parquet-files/`, `metadata.json`, etc.) and returns a path that the next stage will seed from. This keeps large DataFrames on disk and gives users full control. |
There was a problem hiding this comment.
where should the call back dump files seems open ended. But should we enforce some structure? Like another sub-folder to live alongside parquet_files?
Another question, how do we envision push to hf to work in this scenario? Only the last stage can push?
There was a problem hiding this comment.
added a managed callback output convention under <stage-dir>/callback-outputs/<callback-name>/. also made export/push a v1 decision: helpers default to the final stage dataset, while selected-stage or full workflow bundle export stays future work.
|
|
||
| **Callback resume policy**: The pipeline does not hash arbitrary Python source or bytecode in v1. `after_version` is the explicit callback identity recorded in `pipeline-metadata.json` and included in the next stage's fingerprint. If `after` is set without `after_version`, that stage is treated as dirty on every resume so a changed callback cannot silently reuse stale transformed data. The resolved path returned by the callback is also recorded as the dependent stage's seed path; a stage seeded from callback output is skippable only if that recorded path still exists and is readable by `LocalFileSeedSource`. | ||
|
|
||
| **Empty stage policy**: If a callback filters all rows (or a stage produces zero rows), the pipeline raises `DataDesignerPipelineError` by default. Stages can opt in to empty output with `allow_empty=True` on `add_stage()`, in which case the pipeline marks that stage as `completed_empty` and all downstream stages as `skipped_empty_upstream`. `PipelineResults` still contains every declared stage name: executed stages map to `DatasetCreationResults`, while skipped downstream stages map to `SkippedStageResult` with `status="skipped_empty_upstream"` and `upstream_stage=<stage_name>`. This avoids `KeyError`/`None` ambiguity and gives resume a durable state distinct from normal completion. |
There was a problem hiding this comment.
what if we added on_failure call back? I default it's a noop, but folks can choose to decide how they want to handle it in situations like all rows filtered, etc.
There was a problem hiding this comment.
added this as future scope. v1 keeps failure behavior simple and raises by default; a later on_failure hook can support cleanup or custom recovery.
| The `Pipeline` is constructed via `dd.pipeline(name=...)` and holds a reference to the parent `DataDesigner`. Every stage runs `dd.create()` (or `dd.acreate()` once available - see Engine API surface below) on that same instance. This is a load-bearing API contract for two reasons. | ||
|
|
||
| **Throttle coordination across stages.** A `DataDesigner` owns one `ModelRegistry`, which owns one `ThrottleManager`. AIMD rate-limit state is per-instance. If the pipeline constructed a fresh `DataDesigner` per stage, each stage would adapt independently and the aggregate request rate against a provider could exceed the configured cap by a multiple of the stage count. The same hazard applies to parallel branches in Phase 4: branches sharing one `DataDesigner` automatically share throttling; branches each holding their own `DataDesigner` silently fragment it. Reusing one instance is the simple, correct default. | ||
|
|
||
| **Door open for external orchestration.** The pipeline's choice to reuse one `DataDesigner` is the in-process strategy: shared throttling across stages, branches gathered in the orchestrator process. A cross-process strategy is a separate but compatible model - see Future considerations. v1 only needs to avoid encoding assumptions that would prevent it. |
There was a problem hiding this comment.
Based on the snippets above the a pipeline is a config code.
pipeline = dd.pipeline # because we do import data_designer.config as dd
pipeline.run
What is the contract between the pipeline and the data designer instance since it's planned to be shared."
Something seems off here?
There was a problem hiding this comment.
good catch, the examples were confusing because dd usually means data_designer.config. changed them to use data_designer.compose_workflow(...) and clarified that CompositeWorkflow is created from, and holds onto, the parent DataDesigner instance.
|
@johnnygreco added this under future considerations and clarified the split: v1 stages can already use different |
Summary
allow_resizeand simplification of sync/async engine convergence (deprecation already shipped in chore: async engine readiness - blockers and polish before default #553).What's in the plan
add_stage(),run(), between-stage callbacks. Reuses the parentDataDesignerso all stages share oneModelRegistry/ThrottleManager.to_config_builder()convenience on results for lightweight notebook chaining.LocalFileSeedSource. In-memoryDataFrameSeedSourceis reserved for theto_config_builder()notebook ergonomic and is explicitly not a Pipeline.depends_on=[...]).acreate()engine sidecar. Small additive async API onDataDesigner. Independent of chaining v1; hard dependency for Phase 4. Enables in-process parallel-independent workflows viaasyncio.gather.allow_resizeremoval following the deprecation already inmainfrom chore: async engine readiness - blockers and polish before default #553.DataDesignerConfig.fingerprint()(feat(config): add deterministic fingerprint for workflow configs #587) composed withnum_records, DD version, and upstream stage fingerprint.Phases
to_config_builder()(can ship independently).acreate()onDataDesigner(independent track; can land before/alongside/after Phase 1).allow_resize(deprecation already shipped in chore: async engine readiness - blockers and polish before default #553; this phase finishes the removal).asyncio.gatheroveracreate(). Hard dependency on the sidecar.Resolved decisions
Pipeline. In-memory mode reserved forto_config_builder().Pipelineis constructed viadd.pipeline()and reuses the parentDataDesigneracross all stages - load-bearing for throttle coordination.Future considerations (uncommitted)
DataDesignerreuse, on-disk handoffs, no new engine surface) compose naturally with such a system.Open questions
Preview support, config serialization for auto-chaining, naming, image/media column forwarding, downstream seeding scope.
No code changes - plan document only.