-
Notifications
You must be signed in to change notification settings - Fork 181
Comparing changes
Open a pull request
base repository: NVIDIA-NeMo/DataDesigner
base: v0.5.7
head repository: NVIDIA-NeMo/DataDesigner
compare: v0.5.8
- 10 commits
- 44 files changed
- 7 contributors
Commits on Apr 20, 2026
-
chore: bump pillow and python-multipart for CVEs, add SECURITY.md (#564)
- pillow 12.1.1 -> 12.2.0 fixes CVE-2026-40192 (FITS GZIP decompression bomb) - python-multipart 0.0.22 -> 0.0.26 via workspace constraint (transitive via mcp) - add NVIDIA SECURITY.md disclosure policy
Configuration menu - View commit details
-
Copy full SHA for 9648154 - Browse repository at this point
Copy the full SHA 9648154View commit details
Commits on Apr 21, 2026
-
fix(ci): grant permissions to reusable workflow calls in build-docs a…
…nd pack-tutorials (#561) The top-level `permissions: {}` added in #517 restricts all jobs to zero permissions by default. The `build-notebooks` jobs that call the reusable workflow did not override this, so GitHub Actions refused to start them (startup_failure). Add the required `actions: read` and `contents: write` permissions to both calling jobs. Fixes the v0.5.7 release docs build failure.
Configuration menu - View commit details
-
Copy full SHA for addece9 - Browse repository at this point
Copy the full SHA addece9View commit details -
ci: bump the all-actions group across 1 directory with 5 updates (#558)
* ci: bump the all-actions group with 5 updates Bumps the all-actions group with 5 updates: | Package | From | To | | --- | --- | --- | | [actions/checkout](https://github.com/actions/checkout) | `4` | `6` | | [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) | `7.6.0` | `8.1.0` | | [actions/cache](https://github.com/actions/cache) | `5.0.4` | `5.0.5` | | [cloudflare/wrangler-action](https://github.com/cloudflare/wrangler-action) | `3.14.1` | `3.15.0` | | [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates) | `0.88.1` | `0.93.0` | Updates `actions/checkout` from 4 to 6 - [Release notes](https://github.com/actions/checkout/releases) - [Commits](actions/checkout@v4...v6) Updates `astral-sh/setup-uv` from 7.6.0 to 8.1.0 - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](astral-sh/setup-uv@v7.6...0880764) Updates `actions/cache` from 5.0.4 to 5.0.5 - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@6682284...27d5ce7) Updates `cloudflare/wrangler-action` from 3.14.1 to 3.15.0 - [Release notes](https://github.com/cloudflare/wrangler-action/releases) - [Changelog](https://github.com/cloudflare/wrangler-action/blob/main/CHANGELOG.md) - [Commits](cloudflare/wrangler-action@da0e0df...9acf94a) Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.88.1 to 0.93.0 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](NVIDIA-NeMo/FW-CI-templates@2a49420...38cee3a) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: astral-sh/setup-uv dependency-version: 8.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/cache dependency-version: 5.0.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: all-actions - dependency-name: cloudflare/wrangler-action dependency-version: 3.15.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 0.93.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> * ci: pin actions/checkout to SHA in agentic-ci-issue-triage --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andre Manoel <amanoel@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 8266eb7 - Browse repository at this point
Copy the full SHA 8266eb7View commit details
Commits on Apr 22, 2026
-
refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) (#…
…511) * refactor: unify DAG construction by moving topological sort into execution_graph.py Eliminates dag.py and its networkx dependency by moving topologically_sort_column_configs into execution_graph.py as a module-level function. Side-effect resolution is now O(1) via a side_effect_map dict (previously O(n²) linear scan). Kahn's algorithm is reused in-place rather than leaning on networkx.topological_sort. Closes #510 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: relax non-deterministic ordering assertion in test_dag test_judge and test_code_and_depends_on_validation_reasoning_traces have no mutual dependency and reach in-degree 0 simultaneously in Kahn's algorithm. Set iteration order varies with PYTHONHASHSEED, making the strict list assertion flaky. Assert only the topological invariants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(engine): document intentional skip.columns omission in topologically_sort_column_configs ExecutionGraph.create handles skip.when ordering edges in its own two-pass build; the pre-sort function only needs required_columns to produce a valid ColumnConfigT ordering for config compilation. * fix(engine): restore skip.columns edges in topologically_sort_column_configs The sync builder executes generators in compile-time sort order (from _column_configs, populated via this function), not ExecutionGraph order. Dropping skip.columns edges caused evaluate_skip_when to hit UndefinedError when the referenced column hadn't been generated yet, silently skipping rows. Also: - Refactor edge building into _add_edge() helper with a label parameter to distinguish "required" from "skip.when" edges in debug output - Rename test_dag.py -> test_topological_sort.py to match the new module location - Add from __future__ import annotations (required by AGENTS.md) - Add test_side_effect_column_ordering covering the side_effect_map.get() path - Add test_skip_when_column_ordering covering the skip.columns edge path * fix(tests): move SkipConfig import to module level in test_topological_sort * refactor(execution-graph): extract nested closures to module-level helpers, fix docs and test style - Extract `resolve`/`_add_edge` nested closures in `topologically_sort_column_configs` to module-level `_resolve_dag_column` and `_add_dag_edge` per STYLEGUIDE.md - Add self-edge guard in `_add_dag_edge` (consistent with `ExecutionGraph.create`) - Update `architecture/dataset-builders.md` to remove stale `dag.py`/NetworkX references - Fix import order in `test_topological_sort.py` (SkipConfig before column_configs) - Add `-> None` return annotations to legacy test functions * refactor(execution-graph): extract shared Kahn helper, inline resolve, fix module-level ordering - Extract `_kahns_topological_sort` shared helper used by both `ExecutionGraph.get_topological_order` and `topologically_sort_column_configs` - Inline `_resolve_dag_column` into `_add_dag_edge` (no other call sites) - Move private helpers after the public function (public-before-private per STYLEGUIDE) - Add docstring to `topologically_sort_column_configs` - Update architecture/dataset-builders.md to mention both sort sites for skip.when edges --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 956b8cd - Browse repository at this point
Copy the full SHA 956b8cdView commit details -
fix: prevent sticky progress bar ghost lines from terminal wrapping (#…
…565) * fix: prevent sticky progress bar ghost lines from terminal wrapping When a formatted bar line exceeded the terminal width, it wrapped to multiple physical lines but _drawn_lines only counted 1 per bar. Subsequent cursor-up clears missed the wrapped portions, leaving ghost lines that accumulated every redraw cycle. - Count physical lines in _redraw based on visible width vs terminal width - Cap rate at 9999.9 and eta at 999s to prevent stats field overflow - Remove max(10, ...) bar_width floor; degrade gracefully on narrow terminals - Add update_many() for batch updates with a single redraw cycle - Use update_many() in AsyncProgressReporter to reduce N redraws to 1 * test: strengthen wrapping and degradation tests per review feedback - Monkeypatch shutil.get_terminal_size to force narrow terminal in tests - Inject oversized lines via _format_bar patch to exercise physical line counting (ceiling division) code path - Assert output line width <= width-1 in graceful degradation test - Assert _drawn_lines == 1 in degradation mode (no false wrapping) * test: flatten test classes and replace private attribute assertions Address review feedback: use flat pytest functions per DEVELOPMENT.md conventions instead of class-based test suites. Move inline imports to module level. Replace most _drawn_lines and _bars assertions with public output proxies (counting CURSOR_UP_CLEAR sequences and checking rendered bar content). Keep _drawn_lines access only where no clean public proxy exists (multi-checkpoint add/remove test, zero-bars-remaining after log_final). * refactor: expose drawn_lines as public read-only property Replace _drawn_lines access in tests with a public property, consistent with the existing is_active pattern.
Configuration menu - View commit details
-
Copy full SHA for f612822 - Browse repository at this point
Copy the full SHA f612822View commit details -
fix: Updated default reasoning model for nvidia (#568)
* Updated default reasoning model for nvidia * Updated inference params for super * Add reasoning_effort to Nemotron Super params, update stale docs - Add extra_body.reasoning_effort=medium to NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS (mirrors GPT-5 config) - Update README telemetry example and model-configs.md to use nvidia/nemotron-3-super-120b-a12b instead of openai/gpt-oss-20b - Broaden inference-parameters.md reasoning effort tip to cover Nemotron Super * Remove build-time README accidentally tracked --------- Co-authored-by: Andre Manoel <amanoel@nvidia.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 4c6823c - Browse repository at this point
Copy the full SHA 4c6823cView commit details -
chore: async engine readiness - blockers and polish before default (#553
) * chore: async engine readiness blockers (#462) - Processor callback failures (pre-batch and post-batch) now raise DatasetGenerationError instead of silently dropping row groups - Early shutdown and all error paths drain in-flight workers via a finally block in AsyncTaskScheduler.run() - Pre-batch and post-batch processors that change row count in async mode raise immediately (strict_row_count guard) - Partial completion logs a warning when actual < target records - allow_resize=True auto-falls back to sync engine with a deprecation warning instead of raising, using a per-run _use_async flag - Preview path mirrors the trace check from the full build path; PreviewResults exposes task_traces Closes #462 * fix: address review findings for async engine readiness - Prevent double-wrapping of DatasetGenerationError in scheduler callbacks - Fix stacklevel in allow_resize DeprecationWarning to point at user code - Update stale comment to reflect fail-fast behavior - Rename misleading test and remove unused caplog fixture - Add zero-warnings assertion for happy-path case - Move warnings import to module level * fix: address review comments on async engine readiness - Extract _is_async_trace_enabled() helper to deduplicate trace check - Post-batch row-count guard now raises DatasetProcessingError (not DatasetGenerationError) so the scheduler wraps it with rg_id symmetrically with the pre-batch path - Add test_dropped_rows_reduce_actual_record_count for partial completion path * fix: address second-round review feedback on async engine readiness - DeprecationWarning no longer swallowed by interface error wrapper - Incomplete-RG log only fires on clean scheduler exits - Post-batch row-count guard moved into ProcessorRunner (strict_row_count) - Expose active_worker_count property on AsyncTaskScheduler - Drop unused monkeypatch fixture and pytest import * test: fold metadata-count test into dropped-rows test Remove test_write_metadata_records_actual_and_target_counts (poked _actual_num_records directly) and assert metadata counts in test_dropped_rows_reduce_actual_record_count instead, which exercises the same path through the public API.
Configuration menu - View commit details
-
Copy full SHA for bfa7a46 - Browse repository at this point
Copy the full SHA bfa7a46View commit details
Commits on Apr 23, 2026
-
chore: remove obsolete Cerebro ignore entries (#570)
Drop stale .gitignore patterns tied to Cerebro now that the tool is no longer used in this repository.
Configuration menu - View commit details
-
Copy full SHA for d75113c - Browse repository at this point
Copy the full SHA d75113cView commit details
Commits on Apr 24, 2026
-
chore: add ko_KR locale to nemotron personas datasets (#572)
* chore: add ko_KR locale to nemotron personas datasets Register Korean (ko_KR, 2.66 GB) as an available managed persona dataset locale, update related CLI/repository tests, and document the new locale and its NGC download command. * update person fields * update fr_FR size * docs: reconcile personas field tables with installed parquet schemas Remove stale per-locale fields that no longer exist in any managed parquet (commune, departement, prefecture), drop district from the India-specific section since it's already listed in Core Fields, rename digital_skills → digital_skill to match the actual ja_JP column, and add sections for ko_KR, en_SG, and the en_US/en_SG shared ethnic_background. Corrects the religion-family membership to include en_SG. * test: add missing fr_FR assertion in test_run_personas_with_all_flag The test asserts all 9 locales were downloaded but only enumerates 8 in its per-locale checks — fr_FR has been missing since before the ko_KR addition. Align the enumeration with the count. * docs: add ko_KR to locale parameter list
Configuration menu - View commit details
-
Copy full SHA for a65903e - Browse repository at this point
Copy the full SHA a65903eView commit details
Commits on Apr 27, 2026
-
chore: bump lxml and nbconvert to address security advisories (#574)
Bump lxml floor to 6.1.0 (direct dep in data-designer-engine) and add nbconvert>=7.17.1 to workspace constraint-dependencies (transitive via jupyter in the notebooks group).
Configuration menu - View commit details
-
Copy full SHA for 4662288 - Browse repository at this point
Copy the full SHA 4662288View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.5.7...v0.5.8