Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: NVIDIA-NeMo/DataDesigner
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.5.7
Choose a base ref
...
head repository: NVIDIA-NeMo/DataDesigner
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.5.8
Choose a head ref
  • 10 commits
  • 44 files changed
  • 7 contributors

Commits on Apr 20, 2026

  1. chore: bump pillow and python-multipart for CVEs, add SECURITY.md (#564)

    - pillow 12.1.1 -> 12.2.0 fixes CVE-2026-40192 (FITS GZIP decompression bomb)
    - python-multipart 0.0.22 -> 0.0.26 via workspace constraint (transitive via mcp)
    - add NVIDIA SECURITY.md disclosure policy
    johnnygreco authored Apr 20, 2026
    Configuration menu
    Copy the full SHA
    9648154 View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2026

  1. fix(ci): grant permissions to reusable workflow calls in build-docs a…

    …nd pack-tutorials (#561)
    
    The top-level `permissions: {}` added in #517 restricts all jobs to zero
    permissions by default. The `build-notebooks` jobs that call the reusable
    workflow did not override this, so GitHub Actions refused to start them
    (startup_failure). Add the required `actions: read` and `contents: write`
    permissions to both calling jobs.
    
    Fixes the v0.5.7 release docs build failure.
    andreatgretel authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    addece9 View commit details
    Browse the repository at this point in the history
  2. ci: bump the all-actions group across 1 directory with 5 updates (#558)

    * ci: bump the all-actions group with 5 updates
    
    Bumps the all-actions group with 5 updates:
    
    | Package | From | To |
    | --- | --- | --- |
    | [actions/checkout](https://github.com/actions/checkout) | `4` | `6` |
    | [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) | `7.6.0` | `8.1.0` |
    | [actions/cache](https://github.com/actions/cache) | `5.0.4` | `5.0.5` |
    | [cloudflare/wrangler-action](https://github.com/cloudflare/wrangler-action) | `3.14.1` | `3.15.0` |
    | [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates) | `0.88.1` | `0.93.0` |
    
    
    Updates `actions/checkout` from 4 to 6
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Commits](actions/checkout@v4...v6)
    
    Updates `astral-sh/setup-uv` from 7.6.0 to 8.1.0
    - [Release notes](https://github.com/astral-sh/setup-uv/releases)
    - [Commits](astral-sh/setup-uv@v7.6...0880764)
    
    Updates `actions/cache` from 5.0.4 to 5.0.5
    - [Release notes](https://github.com/actions/cache/releases)
    - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
    - [Commits](actions/cache@6682284...27d5ce7)
    
    Updates `cloudflare/wrangler-action` from 3.14.1 to 3.15.0
    - [Release notes](https://github.com/cloudflare/wrangler-action/releases)
    - [Changelog](https://github.com/cloudflare/wrangler-action/blob/main/CHANGELOG.md)
    - [Commits](cloudflare/wrangler-action@da0e0df...9acf94a)
    
    Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.88.1 to 0.93.0
    - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases)
    - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md)
    - [Commits](NVIDIA-NeMo/FW-CI-templates@2a49420...38cee3a)
    
    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-version: '6'
      dependency-type: direct:production
      update-type: version-update:semver-major
      dependency-group: all-actions
    - dependency-name: astral-sh/setup-uv
      dependency-version: 8.1.0
      dependency-type: direct:production
      update-type: version-update:semver-major
      dependency-group: all-actions
    - dependency-name: actions/cache
      dependency-version: 5.0.5
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: all-actions
    - dependency-name: cloudflare/wrangler-action
      dependency-version: 3.15.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
      dependency-group: all-actions
    - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml
      dependency-version: 0.93.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
      dependency-group: all-actions
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    
    * ci: pin actions/checkout to SHA in agentic-ci-issue-triage
    
    ---------
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Andre Manoel <amanoel@nvidia.com>
    dependabot[bot] and andreatgretel authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    8266eb7 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2026

  1. refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) (#…

    …511)
    
    * refactor: unify DAG construction by moving topological sort into execution_graph.py
    
    Eliminates dag.py and its networkx dependency by moving
    topologically_sort_column_configs into execution_graph.py as a
    module-level function. Side-effect resolution is now O(1) via a
    side_effect_map dict (previously O(n²) linear scan). Kahn's algorithm
    is reused in-place rather than leaning on networkx.topological_sort.
    
    Closes #510
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    
    * test: relax non-deterministic ordering assertion in test_dag
    
    test_judge and test_code_and_depends_on_validation_reasoning_traces have
    no mutual dependency and reach in-degree 0 simultaneously in Kahn's
    algorithm. Set iteration order varies with PYTHONHASHSEED, making the
    strict list assertion flaky. Assert only the topological invariants.
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    
    * docs(engine): document intentional skip.columns omission in topologically_sort_column_configs
    
    ExecutionGraph.create handles skip.when ordering edges in its own
    two-pass build; the pre-sort function only needs required_columns
    to produce a valid ColumnConfigT ordering for config compilation.
    
    * fix(engine): restore skip.columns edges in topologically_sort_column_configs
    
    The sync builder executes generators in compile-time sort order (from
    _column_configs, populated via this function), not ExecutionGraph order.
    Dropping skip.columns edges caused evaluate_skip_when to hit UndefinedError
    when the referenced column hadn't been generated yet, silently skipping rows.
    
    Also:
    - Refactor edge building into _add_edge() helper with a label parameter to
      distinguish "required" from "skip.when" edges in debug output
    - Rename test_dag.py -> test_topological_sort.py to match the new module location
    - Add from __future__ import annotations (required by AGENTS.md)
    - Add test_side_effect_column_ordering covering the side_effect_map.get() path
    - Add test_skip_when_column_ordering covering the skip.columns edge path
    
    * fix(tests): move SkipConfig import to module level in test_topological_sort
    
    * refactor(execution-graph): extract nested closures to module-level helpers, fix docs and test style
    
    - Extract `resolve`/`_add_edge` nested closures in `topologically_sort_column_configs`
      to module-level `_resolve_dag_column` and `_add_dag_edge` per STYLEGUIDE.md
    - Add self-edge guard in `_add_dag_edge` (consistent with `ExecutionGraph.create`)
    - Update `architecture/dataset-builders.md` to remove stale `dag.py`/NetworkX references
    - Fix import order in `test_topological_sort.py` (SkipConfig before column_configs)
    - Add `-> None` return annotations to legacy test functions
    
    * refactor(execution-graph): extract shared Kahn helper, inline resolve, fix module-level ordering
    
    - Extract `_kahns_topological_sort` shared helper used by both
      `ExecutionGraph.get_topological_order` and `topologically_sort_column_configs`
    - Inline `_resolve_dag_column` into `_add_dag_edge` (no other call sites)
    - Move private helpers after the public function (public-before-private per STYLEGUIDE)
    - Add docstring to `topologically_sort_column_configs`
    - Update architecture/dataset-builders.md to mention both sort sites for skip.when edges
    
    ---------
    
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
    3 people authored Apr 22, 2026
    Configuration menu
    Copy the full SHA
    956b8cd View commit details
    Browse the repository at this point in the history
  2. fix: prevent sticky progress bar ghost lines from terminal wrapping (#…

    …565)
    
    * fix: prevent sticky progress bar ghost lines from terminal wrapping
    
    When a formatted bar line exceeded the terminal width, it wrapped to
    multiple physical lines but _drawn_lines only counted 1 per bar.
    Subsequent cursor-up clears missed the wrapped portions, leaving ghost
    lines that accumulated every redraw cycle.
    
    - Count physical lines in _redraw based on visible width vs terminal width
    - Cap rate at 9999.9 and eta at 999s to prevent stats field overflow
    - Remove max(10, ...) bar_width floor; degrade gracefully on narrow terminals
    - Add update_many() for batch updates with a single redraw cycle
    - Use update_many() in AsyncProgressReporter to reduce N redraws to 1
    
    * test: strengthen wrapping and degradation tests per review feedback
    
    - Monkeypatch shutil.get_terminal_size to force narrow terminal in tests
    - Inject oversized lines via _format_bar patch to exercise physical line
      counting (ceiling division) code path
    - Assert output line width <= width-1 in graceful degradation test
    - Assert _drawn_lines == 1 in degradation mode (no false wrapping)
    
    * test: flatten test classes and replace private attribute assertions
    
    Address review feedback: use flat pytest functions per DEVELOPMENT.md
    conventions instead of class-based test suites. Move inline imports
    to module level.
    
    Replace most _drawn_lines and _bars assertions with public output
    proxies (counting CURSOR_UP_CLEAR sequences and checking rendered
    bar content). Keep _drawn_lines access only where no clean public
    proxy exists (multi-checkpoint add/remove test, zero-bars-remaining
    after log_final).
    
    * refactor: expose drawn_lines as public read-only property
    
    Replace _drawn_lines access in tests with a public property,
    consistent with the existing is_active pattern.
    andreatgretel authored Apr 22, 2026
    Configuration menu
    Copy the full SHA
    f612822 View commit details
    Browse the repository at this point in the history
  3. fix: Updated default reasoning model for nvidia (#568)

    * Updated default reasoning model for nvidia
    
    * Updated inference params for super
    
    * Add reasoning_effort to Nemotron Super params, update stale docs
    
    - Add extra_body.reasoning_effort=medium to
      NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS (mirrors GPT-5 config)
    - Update README telemetry example and model-configs.md to use
      nvidia/nemotron-3-super-120b-a12b instead of openai/gpt-oss-20b
    - Broaden inference-parameters.md reasoning effort tip to cover
      Nemotron Super
    
    * Remove build-time README accidentally tracked
    
    ---------
    
    Co-authored-by: Andre Manoel <amanoel@nvidia.com>
    Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
    3 people authored Apr 22, 2026
    Configuration menu
    Copy the full SHA
    4c6823c View commit details
    Browse the repository at this point in the history
  4. chore: async engine readiness - blockers and polish before default (#553

    )
    
    * chore: async engine readiness blockers (#462)
    
    - Processor callback failures (pre-batch and post-batch) now raise
      DatasetGenerationError instead of silently dropping row groups
    - Early shutdown and all error paths drain in-flight workers via a
      finally block in AsyncTaskScheduler.run()
    - Pre-batch and post-batch processors that change row count in async
      mode raise immediately (strict_row_count guard)
    - Partial completion logs a warning when actual < target records
    - allow_resize=True auto-falls back to sync engine with a deprecation
      warning instead of raising, using a per-run _use_async flag
    - Preview path mirrors the trace check from the full build path;
      PreviewResults exposes task_traces
    
    Closes #462
    
    * fix: address review findings for async engine readiness
    
    - Prevent double-wrapping of DatasetGenerationError in scheduler callbacks
    - Fix stacklevel in allow_resize DeprecationWarning to point at user code
    - Update stale comment to reflect fail-fast behavior
    - Rename misleading test and remove unused caplog fixture
    - Add zero-warnings assertion for happy-path case
    - Move warnings import to module level
    
    * fix: address review comments on async engine readiness
    
    - Extract _is_async_trace_enabled() helper to deduplicate trace check
    - Post-batch row-count guard now raises DatasetProcessingError (not
      DatasetGenerationError) so the scheduler wraps it with rg_id
      symmetrically with the pre-batch path
    - Add test_dropped_rows_reduce_actual_record_count for partial
      completion path
    
    * fix: address second-round review feedback on async engine readiness
    
    - DeprecationWarning no longer swallowed by interface error wrapper
    - Incomplete-RG log only fires on clean scheduler exits
    - Post-batch row-count guard moved into ProcessorRunner (strict_row_count)
    - Expose active_worker_count property on AsyncTaskScheduler
    - Drop unused monkeypatch fixture and pytest import
    
    * test: fold metadata-count test into dropped-rows test
    
    Remove test_write_metadata_records_actual_and_target_counts (poked
    _actual_num_records directly) and assert metadata counts in
    test_dropped_rows_reduce_actual_record_count instead, which exercises
    the same path through the public API.
    andreatgretel authored Apr 22, 2026
    Configuration menu
    Copy the full SHA
    bfa7a46 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2026

  1. chore: remove obsolete Cerebro ignore entries (#570)

    Drop stale .gitignore patterns tied to Cerebro now that the tool is no longer used in this repository.
    johnnygreco authored Apr 23, 2026
    Configuration menu
    Copy the full SHA
    d75113c View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2026

  1. chore: add ko_KR locale to nemotron personas datasets (#572)

    * chore: add ko_KR locale to nemotron personas datasets
    
    Register Korean (ko_KR, 2.66 GB) as an available managed persona
    dataset locale, update related CLI/repository tests, and document the
    new locale and its NGC download command.
    
    * update  person fields
    
    * update fr_FR size
    
    * docs: reconcile personas field tables with installed parquet schemas
    
    Remove stale per-locale fields that no longer exist in any managed
    parquet (commune, departement, prefecture), drop district from the
    India-specific section since it's already listed in Core Fields,
    rename digital_skills → digital_skill to match the actual ja_JP
    column, and add sections for ko_KR, en_SG, and the en_US/en_SG
    shared ethnic_background. Corrects the religion-family membership
    to include en_SG.
    
    * test: add missing fr_FR assertion in test_run_personas_with_all_flag
    
    The test asserts all 9 locales were downloaded but only enumerates 8
    in its per-locale checks — fr_FR has been missing since before the
    ko_KR addition. Align the enumeration with the count.
    
    * docs: add ko_KR to locale parameter list
    johnnygreco authored Apr 24, 2026
    Configuration menu
    Copy the full SHA
    a65903e View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2026

  1. chore: bump lxml and nbconvert to address security advisories (#574)

    Bump lxml floor to 6.1.0 (direct dep in data-designer-engine) and add
    nbconvert>=7.17.1 to workspace constraint-dependencies (transitive via
    jupyter in the notebooks group).
    johnnygreco authored Apr 27, 2026
    Configuration menu
    Copy the full SHA
    4662288 View commit details
    Browse the repository at this point in the history
Loading