refactor: unify duplicate DAG construction (dag.py + ExecutionGraph)

## Summary

The codebase constructs a column-dependency DAG in two separate places using two different graph implementations:

1. **`dag.py`** (`topologically_sort_column_configs`) — uses **networkx** (`lazy.nx.DiGraph`) to build a graph from `required_columns`, `side_effect_columns`, and `skip.columns`, then returns a topologically sorted list of `ColumnConfigT`. Called once during config compilation in `config_compiler.py`.

2. **`ExecutionGraph`** (`execution_graph.py`) — uses a **hand-rolled** adjacency-list graph with Kahn's algorithm. Builds edges from the same three dependency sources. Created later at generator-init time by `_initialize_generators_and_graph`. Used by both sequential (skip logic) and async (scheduler) paths.

Both construct a DAG from the same dependency sources, both validate acyclicity, and both compute topological order. The only difference is timing (pre-compilation vs post-compilation) and what metadata they carry.

## Problems

1. **Duplicated edge logic** — `_add_dependency_edges` in `dag.py` and the second pass in `ExecutionGraph.create` implement the same side-effect resolution and skip-column edge logic. A bug fix in one must be mirrored in the other.
2. **Unnecessary networkx dependency** — networkx is a heavy library used in the DAG path for exactly one function (topological sort) that `ExecutionGraph` already implements natively.
3. **Two cycle checks** run on the same data at different times.

## Proposed Solution

Make `ExecutionGraph` the single graph abstraction. Add a `topologically_sort_column_configs` classmethod that accepts raw `ColumnConfigT` (like `dag.py` does today) without requiring strategies:

- Filters DAG vs non-DAG columns using `column_type_used_in_execution_dag`
- Builds edges from `required_columns`, `side_effect_columns`, `skip.columns`
- Runs Kahn's algorithm (already implemented in `get_topological_order`)
- Returns the sorted `ColumnConfigT` list

Then:
1. Update `config_compiler.py` to call `ExecutionGraph.topologically_sort_column_configs` instead of importing from `dag.py`.
2. Delete `dag.py`.
3. Update `test_dag.py` to import from `ExecutionGraph`.

## Files Involved

- `packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/dag.py` — delete
- `packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/execution_graph.py` — add classmethod
- `packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/config_compiler.py` — update import
- `packages/data-designer-engine/tests/engine/dataset_builders/utils/test_dag.py` — update import
- `architecture/dataset-builders.md` — update docs to reflect single graph

## Alternative

Extract a shared `ColumnDependencyGraph` base class with pure graph operations (nodes, edges, topo sort, cycle check), then have `ExecutionGraph` extend it with strategy/skip metadata. Cleaner separation but more refactoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) #510

Summary

Problems

Proposed Solution

Files Involved

Alternative

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) #510

Description

Summary

Problems

Proposed Solution

Files Involved

Alternative

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions