You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transform the dataset builder from sequential column-by-column processing into an async task queue with dependency-aware scheduling. Generators become async-first, and the builder dispatches individual cell/batch tasks as soon as their upstream dependencies are satisfied — enabling pipeline parallelism across columns and rows.
Technical Details & Implementation Plan
Dependency map: built from each column config's required_columns property (Jinja2 template introspection) — no config schema changes needed
Priority Level
High
Task Summary
Transform the dataset builder from sequential column-by-column processing into an async task queue with dependency-aware scheduling. Generators become async-first, and the builder dispatches individual cell/batch tasks as soon as their upstream dependencies are satisfied — enabling pipeline parallelism across columns and rows.
Technical Details & Implementation Plan
required_columnsproperty (Jinja2 template introspection) — no config schema changes needed_run_batchloop; dispatches tasks as dependencies are met, bounded by semaphoreageneratemethods (LLM generators already have native async from feat(engine): env-var switch for async-first models experiment #280)Dependencies
Part of #260. Builds on #280 (merged). Related: #269, #344.