Comparing changes

* fix: cache notebook builds to avoid failures from flaky upstream models The build-notebooks CI executes all tutorial notebooks on every run. When an upstream model (e.g. black-forest-labs/flux.2-pro) is down, the entire docs build fails even if no notebooks changed. Add per-notebook caching based on source file SHA-256 hashes. Unchanged notebooks are served from cache, and only modified ones are re-executed. On the first CI run (empty cache), the workflow seeds the cache from the last successful build artifact. Also add a minimal test script (test_flux_image_gen.py) to reproduce the flux.2-pro health check failure locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments on notebook caching - Don't write .sha256 during seeding so changed notebooks are detected - Rename TMPDIR to SEED_TMPDIR to avoid shadowing the POSIX env var - Use portable sha256 helper (sha256sum with shasum fallback) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only seed cache when truly empty, restore hash writing Skip artifact seeding when a partial cache was restored (it already has correct per-file hashes). Only seed + write current hashes when the cache dir is completely empty (true bootstrapping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restrict artifact seed lookup to main branch Prevents seeding from feature branch runs that may have different notebook sources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add actions:read permission for artifact seeding The seed step uses gh run list and gh run download which require actions:read. Without it, these calls silently fail and the cold-start cache bootstrapping never executes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only use notebook cache when called from build-docs Scheduled Monday runs and manual workflow_dispatch should execute all notebooks to catch regressions (e.g. library changes that break a notebook). Caching is only used via workflow_call (from build-docs) where the goal is fast, resilient doc deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use jq // empty to avoid "null" string on empty run list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add use_cache input flag to notebook and docs workflows Replace event_name-based cache logic with an explicit use_cache boolean input. Defaults: - build-notebooks: workflow_call=true, dispatch=false, schedule=false - build-docs: dispatch=true (toggleable), release=false This gives full control over caching from the GitHub Actions UI. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pter (#359) * plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * address feedback --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: processor artifacts type annotation, discovery, and loading - Fix PreviewResults.processor_artifacts type from dict[str, list[str] | str] to dict[str, list[dict]], matching the actual data produced by df.to_dict(orient="records") - Add ArtifactStorage.load_processor_dataset() to centralize loading logic, handling both directory-based (batched) and single-file (preview) layouts - Add ArtifactStorage.list_processor_names() to discover processor outputs from disk rather than iterating config, fixing a bug where processors that don't write artifacts (e.g. DropColumnsProcessor) would crash the library's preview - Simplify DatasetCreationResults.load_processor_dataset() to delegate to ArtifactStorage * fix: deduplicate list_processor_names when both dir and file exist * refactor: extract standalone functions for processor discovery and loading Extract list_processor_names() and load_processor_dataset() as module-level functions that accept a Path, so consumers can use them without constructing an ArtifactStorage. The ArtifactStorage methods now delegate to these. * refactor: move standalone functions to io_helpers in config package Move list_processor_names() and load_processor_dataset() to data_designer.config.utils.io_helpers so they're usable by any consumer that depends on the lightweight config package, without needing the engine's ArtifactStorage. The standalone functions raise FileNotFoundError; the ArtifactStorage methods wrap this as ArtifactStorageError. * chore: trim docstrings and deduplicate tests * fix: align get_processor_file_paths with list_processor_names Reuse list_processor_names so both methods discover directories and single parquet files consistently.

@staticmethod

… scheduler (#356) * feat: add ExecutionGraph, CompletionTracker, and Task model for async scheduler Add the foundational data structures for the async task-queue dataset builder (plan #346, PR 1/4): - ExecutionGraph: column-level static DAG with topological ordering, critical path, task counts, cell-dependency resolution, Mermaid output, and side-effect column mapping (__trace, __reasoning_content). - CompletionTracker: lightweight (column, row_group, row_index) completion state with row dropping and ready-task enumeration. - Task/TaskResult/TaskTrace: frozen hashable task dataclass, result container, and opt-in tracing record. All three are pure data structures with no side effects on the existing codebase. They live in new modules under engine/dataset_builders/utils/ and are only imported by code introduced in later PRs. 56 unit tests covering graph construction, validation, dependency resolution, completion tracking, row drops, and task model semantics. Refs #346 * refactor: extract readiness helpers and cache topological order Add `is_ready` and `is_batch_ready` methods to CompletionTracker to simplify `ready_tasks`. Cache topological order in ExecutionGraph since the graph is immutable after construction. Move DatasetBuilderColumnConfigT type alias to multi_column_configs. Fix license header years. * refactor: address PR review feedback - Rename all_complete → is_all_complete for boolean method convention - Add ColumnName, RowGroup, RowIndex type aliases for readability - Add public mutation API to ExecutionGraph (add_column, add_edge, set_side_effect, resolve_side_effect) and rewrite build_execution_graph to use it instead of private attributes - Change TaskTrace.from_task from @staticmethod to @classmethod * refactor: address remaining PR review feedback - Rename RowGroup type alias to RowGroupIndex for consistency - Convert ExecutionGraph from dataclass to plain class - Move build_execution_graph logic to ExecutionGraph.create() classmethod * refactor: event-driven frontier for CompletionTracker Replace the poll-based get_ready_tasks (O(C × R × G) per tick) with an event-driven frontier maintained on mark_complete/mark_batch_complete/ drop_row. get_ready_tasks now returns O(frontier) instead of scanning all columns × rows × row groups. * refactor: extract ready_ctx fixture in completion tracker tests - Add ReadyTasksFixture dataclass and ready_ctx pytest fixture to deduplicate graph/tracker/dispatched setup across get_ready_tasks tests - Align test with ExecutionGraph.create API rename - Remove redundant inline comments * fix: validate tracker args and resolve side-effect name collisions - CompletionTracker now raises ValueError when graph/row_groups are provided without each other - resolve_side_effect prefers real columns over aliases when a name collision exists * refactor: address PR review feedback — naming, CellRef, batch semantics - Fix critical_path() crash on empty graph (early return) - Fix is_all_complete batch semantics via _batch_complete tracking set - Add row-group size mismatch validation in mark_row_range_complete - Add unknown row_group validation in mark_cell_complete - Rename methods for verb-prefix convention: upstream → get_upstream_columns, downstream → get_downstream_columns, critical_path → get_longest_dependency_chain, mark_complete → mark_cell_complete, mark_batch_complete → mark_row_range_complete - Introduce CellRef NamedTuple, remove ColumnName/RowGroupIndex/RowIndex aliases - Delete deprecated build_execution_graph() wrapper - Return defensive copy from topological_order() - Add regression tests for fixed bugs * fix: prevent completed tasks from re-entering the frontier Skip adding downstream tasks to the frontier when they are already marked complete, avoiding redundant work in CompletionTracker. * harden completion tracker and execution graph APIs - Enforce strategy-safe completion: mark_cell_complete rejects non-CELL_BY_CELL columns, mark_row_range_complete rejects CELL_BY_CELL columns (ValueError in graph mode) - Return defensive copies from ExecutionGraph public API (columns, get_upstream/downstream_columns) - Add re-enqueue regression tests for cell and batch paths - Add immutability tests for ExecutionGraph collections * address second round of review feedback - Reject duplicate column names in add_column with ValueError - Validate buffer_size > 0 in task_count - Use _batch_complete for batch upstream readiness checks - Remove duplicate section header in test file * fix AGENTS.md compliance violations - Add `from __future__ import annotations` to 5 files missing it - Rename ExecutionGraph methods to start with action verbs (strategy → get_strategy, topological_order → get_topological_order, upstream_by_strategy → split_upstream_by_strategy, task_count → compute_task_count, cell_dependencies → compute_cell_dependencies) - Reorder methods in CompletionTracker and ExecutionGraph: __init__ → properties → classmethods → public → private * address review feedback: CellRef dataclass, batch done-guard, is_complete API - Convert CellRef from NamedTuple to frozen dataclass - Change is_complete to accept CellRef instead of 3 positional args - Unify batch done-guards in _enqueue_downstream and _reevaluate_batch_tasks to use rg_batch_complete instead of rg_completed * fix test to use cell_by_cell column for row-group validation test * address review feedback: constructor, assertions, root columns, test fix - Split CompletionTracker into __init__() + with_graph() classmethod - Replace assert with RuntimeError in private methods - Add get_root_columns() to ExecutionGraph - Remove "no locks needed" from docstring - Fix re-enqueue regression test to exercise the actual scenario - Remove unused ready_ctx fixture parameter * rename CellRef to SliceRef A slice naturally represents both a single cell and a full row group, removing the semantic mismatch of CellRef representing batches. * add missing tests for drop_row unblock, buffer_size, and duplicate column

* fix: handle discriminated unions in oneOf pruning validator The pruning validator modifies instances in-place during oneOf validation. When trying a wrong variant, it strips properties needed by the correct variant, causing all variants to fail. Add a discriminator-aware oneOf validator that reads the discriminator mapping to select the correct variant directly, skipping the try-all-variants loop that causes the corruption. Fixes #375 * test: add regression test for non-discriminated oneOf fallback

* Added skill for code review * Address CR feedback * more feedback from greptile

DuckDB 1.5.0 (released 2026-03-09) removed the record_batch() method from the Python Relation API, breaking SeedDatasetColumnGenerator. Migrate to the stable to_arrow_reader() API and bump the minimum DuckDB version to >=1.5.0. Fixes #379

… (#385) * monkey patch litellm to make index optional * address greptile feedbaack * fix tests * fix: add model_rebuild calls for proper test isolation Rebuild the Pydantic Message model after restoring the TypedDict to unpatched state and in the finally block so cached schemas don't leak across tests. Also assert that Message construction fails without the patch to prove it is necessary. Made-with: Cursor * fix greptile's weak suggestion

Add explicit rules for testing public APIs only, requiring type annotations on tests/fixtures, and keeping imports at module level. Fix fixture reference to reflect the actual pytest_plugins layout. Made-with: Cursor

…#383) * fix: raise clear error when all records are dropped during generation (#382) When every record fails during generation (e.g. LLM or image model errors), the empty dataset previously reached the profiler, which raised a misleading DatasetProfilerConfigurationError about missing columns. Now both preview() and create() check for an empty dataset before profiling and raise a DataDesignerGenerationError with an actionable message instead. Made-with: Cursor * fix: keep load_dataset_with_dropped_columns inside profiling try/except Moves the call back inside the try/except block so ArtifactStorageError is caught and re-raised as DataDesignerProfilingError, preserving the documented API contract. Also reduces test num_records to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address Andre's feedback * more small feedback --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@overload

…fulness (#378) * feat: add async generator migration with symmetric bridging and statefulness - Symmetric generate/agenerate bridging in base ColumnGenerator - is_stateful property; SeedDatasetColumnGenerator declares True - Async wrappers for FromScratchColumnGenerator and ColumnGeneratorFullColumn - Native async paths for ImageCellGenerator and EmbeddingCellGenerator - CustomColumnGenerator.agenerate with full validation parity - Extract _postprocess_result for shared sync/async output validation * fix: avoid blocking caller on sync bridge timeout Use explicit pool lifecycle instead of context manager so that a TimeoutError releases the caller immediately via shutdown(wait=False) rather than blocking on pool.__exit__. * fix: widen agenerate type signature to match generate Add @overload declarations so the base agenerate accepts both dict and pd.DataFrame, mirroring the existing generate pattern. * fix: ensure pool shutdown on sync bridge success path The else clause after return was unreachable, leaking the ThreadPoolExecutor on every successful call. Capture the result first, shut down the pool, then return. * fix: use try/finally for pool shutdown in sync bridge Ensures ThreadPoolExecutor is shut down on all exit paths, including non-TimeoutError exceptions from the coroutine. * refactor: extract shared validation in ImageCellGenerator Move duplicated input validation and prompt rendering into _prepare_image_inputs, shared by generate and agenerate. * refactor: extract shared input prep in EmbeddingCellGenerator * address PR review feedback - add _is_overridden helper for symmetric generate/agenerate guards - move defensive .copy() into base agenerate, remove subclass overrides - re-raise as builtin TimeoutError for Python 3.10 compat - rename is_stateful to is_order_dependent with improved docstring - replace brittle .fget test with object.__new__ - add async tests for ImageCellGenerator and EmbeddingCellGenerator

feat: add Nemotron Super Text-to-SQL and Search Agent recipes Add two new recipes derived from the Nemotron Super post-training pipelines: Nemotron Super Text-to-SQL: - Five-stage pipeline: seeding, prompt generation, schema with distractors, dialect-specific SQL, validation + quality scoring - 14 conditional samplers (10 industries, 50 topics, complexity-gated task types, data quality concepts, knowledge dependencies, 100 style combos) - Dialect-specific prompts for SQLite, MySQL, and PostgreSQL - 5 LLM judges (prompt, SQL, context, data quality, knowledge) with 15 scoring dimensions and flat score extraction columns - Per-dialect syntax validation via CodeValidatorParams Nemotron Super Search Agent: - Four-stage pipeline: Wikidata KG seed paths, two-stage riddle generation (draft + BrowseComp-style obfuscation), Tavily web search trajectories via MCP, structured JSON formatting - Tavily hosted MCP endpoint (streamable_http) -- no local server or extra dependencies beyond data-designer - Full tool-call trace capture (with_trace=ALL_MESSAGES) for SFT data - Built-in demo seeds (3 Wikidata paths) for quick testing Both recipes include ASCII pipeline diagrams, Nemotron Super context in docstrings, dev note links in the markdown pages, and follow existing recipe conventions (PEP 723 metadata, --model-alias/--num-records/ --artifact-path CLI args).

…373) * plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @DataClass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * wrap facade close in try/catch * clean up stray params * fix stray inclusion of metadata * small regression fix * address more feedback --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(cli): bootstrap default configs on command run * fix(cli): use active interpreter in bootstrap warning * refactor(cli): simplify bootstrap warning flow * refactor(cli): bootstrap defaults in main entrypoint * refactor(cli): keep bootstrap ownership in main * test(cli): cover lazy dispatch and runtime failure flag * refactor(cli): remove redundant bootstrap state * test(cli): assert bootstrap warning includes error * test: address cli bootstrap review feedback

requests 2.32.5 asserts chardet<6 at import time, but sqlfluff and diff_cover pull in chardet without an upper bound, resolving to 7.1.0 on fresh installs. Add a workspace constraint to cap chardet until a new requests release ships the fix from psf/requests#7220. Closes #404

* fix: add chardet<6 constraint to published engine package The workspace-level constraint-dependencies in [tool.uv] is not included in published wheel metadata, so PyPI consumers still get chardet>=6 via sqlfluff, triggering RequestsDependencyWarning from requests<2.33. Move the pin to an explicit dependency in data-designer-engine so it ships with the package. * chore: remove redundant workspace-level chardet constraint Now that chardet<6 is an explicit dependency of data-designer-engine, the workspace constraint-dependencies entry is no longer needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Mar 5, 2026

Commits on Mar 6, 2026

Commits on Mar 9, 2026

Commits on Mar 10, 2026

Commits on Mar 11, 2026

Commits on Mar 12, 2026

This comparison is taking too long to generate.

Uh oh!