Comparing changes

The docs-preview workflow triggered on all source code changes due to the broad `packages/*/src/data_designer/**` path glob. This caused unnecessary Cloudflare Pages deployments on code-only PRs like #505. Remove the source code path filter so the workflow only triggers on actual docs content changes (docs/**, mkdocs.yml, and the workflow file itself).

* ci: harden CI supply chain Pin all GitHub Actions to commit SHAs to prevent tag-based supply chain attacks (same class as CVE-2025-30066). Replace softprops/action-gh-release (single-maintainer, no security policy) with gh CLI. Add top-level permissions: {} to all workflows that lacked it, enforcing least-privilege by default. Enable Dependabot for GitHub Actions and pip dependencies. Closes #471 * fix: add dependabot pip entries for each sub-package The root directory has no pyproject.toml; the actual packages live under packages/data-designer-config, packages/data-designer-engine, and packages/data-designer.

* fix: bump pytest, aiohttp, and cryptography for security CVEs - pytest 9.0.2 → 9.0.3 (CVE-2025-71176, High — RCE via symlink TOCTOU) - aiohttp 3.13.3 → 3.13.5 (10 Medium CVEs — DoS, CRLF injection, credential theft, request smuggling) - cryptography 46.0.6 → 46.0.7 (CVE-2026-39892, Medium — buffer overflow on Python >3.11) Add constraint-dependencies for transitive deps (aiohttp, cryptography) to enforce minimum safe versions across both workspace and e2e lockfiles. * style: fix indentation in tests_e2e/pyproject.toml Match the 2-space indentation used throughout the file.

* fix: restrict Dependabot pip updates to security-only The Dependabot config added in #517 included weekly version-bump PRs for all three pip packages. This would generate noisy PRs for routine dep updates we don't need. Set open-pull-requests-limit: 0 on the pip ecosystems so only CVE-triggered security updates open PRs. GitHub Actions weekly bumps are kept as-is to keep SHA pins current. * fix: group Dependabot Actions PRs and fix DCO allowlist - Add a Dependabot group to bundle all GitHub Actions updates into a single weekly PR instead of one per action - Fix DCO allowlist: dependabot -> dependabot[bot] to match the actual GitHub username (the old value never matched, but there were no Dependabot PRs before #517 to expose the bug) * fix: align DCO assistant if-condition with custom sign-off text The step's if-condition checked for the default sign-off text but custom-pr-sign-comment uses different wording. This meant the issue_comment trigger was always skipped - sign-offs only worked by accident when a subsequent push re-triggered the action via pull_request_target.

* ci: add workflow to publish devnotes independently of releases Adds a GitHub Actions workflow that rebuilds the `latest` docs alias when devnotes change on main, so blog posts go live without cutting a package release. * ci: pin actions to commit SHAs and restrict default permissions Address Greptile review findings: - Pin checkout, setup-uv, and download-artifact to commit SHAs matching the pattern from #517 - Add top-level permissions: {} to restrict default token scope * ci: build devnotes from last deployed state, not main Instead of building the full site from main (which could include unreleased docs), checkout the commit that latest was last built from (tracked in gh-pages commit messages) and overlay only docs/devnotes/ from main. Download notebooks from the last successful build-docs run instead of rebuilding them. * ci: add actions:read permission for notebook download The gh run list/download calls need actions:read on GITHUB_TOKEN, which is denied by the top-level permissions: {} block.

…tion (#509) * fix: async engine side-effect column propagation and collision resolution ExecutionGraph.set_side_effect() now uses first-writer-wins instead of last-writer-wins, matching sync engine semantics where earlier consumers see the first producer's value. This prevents false DAGCircularDependencyError when multiple generators declare the same side-effect column at different pipeline stages. AsyncTaskScheduler now includes side-effect columns in _instance_to_columns so their values are written to the RowGroupBufferManager and available to downstream prompt templates. Fixes #508 * fix: separate side-effect columns from completion tracking in async scheduler Side-effect columns added to _instance_to_columns caused KeyError in CompletionTracker._validate_strategy() because they are not registered in the execution graph. Split into _instance_to_write_columns (buffer writes, includes side-effects) and _instance_to_columns (completion tracking, real columns only). * fix: warn on side-effect collision and clarify scheduler column maps Log a warning when multiple producers register the same side-effect column (first-writer-wins still applies). Rename _instance_to_columns and _instance_to_write_columns per review feedback for clarity. * fix: raise ConfigCompilationError on duplicate side-effect producers Replace first-writer-wins collision handling with a hard error. Each side-effect column must have exactly one producer; duplicates are a configuration issue to be fixed at the source. * fix: reject duplicate side-effect producers in sync DAG path Mirror the async path check: raise ConfigCompilationError when two custom columns declare the same side-effect column name during topological sort.

…#521) * ci: add PR hygiene automation (linked issue check + stale PR cleanup) Add two workflows to enforce contribution quality and clean up abandoned PRs: - pr-linked-issue.yml: required status check that validates external PRs reference a triaged issue. Collaborators bypass. Re-triggers automatically when a maintainer adds the `triaged` label to the linked issue. - pr-stale.yml: daily cron that reminds authors of failing checks after 7/14 days of inactivity and auto-closes after 14/28 days (external/collaborator). Respects `keep-open` label. New labels created: `triaged`, `task`, `keep-open`. Closes #518 Signed-off-by: Andrea Manoel <amanoel@nvidia.com> * ci: add agentic repository triage workflow Add a weekly scheduled workflow that uses Claude to triage all open issues and PRs, producing a combined dashboard report on a pinned tracking issue. - New recipe (.agents/recipes/issue-triage/) classifies issues, checks staleness, cross-references merged PRs, detects duplicates, and flags PR health problems (missing linked issues, failing checks, orphaned PRs) - New workflow (.github/workflows/agentic-ci-issue-triage.yml) runs every Monday 10:00 UTC on the agentic-ci runner, with manual dispatch support - pr-stale.yml now adds needs-attention label to linked issues when a PR is auto-closed, bridging the two workflows via labels * docs: document stale PR policy and auto-retrigger in CONTRIBUTING.md * fix: address review findings in PR hygiene workflows - pr-linked-issue: fix comment gate so failure comments are posted - pr-stale: upgrade issues permission to write for labeling - pr-stale: compare reminder timestamp against last activity so push/comment actually resets the stale timer * fix: use --body-file in retrigger job to avoid shell quoting issues PR bodies with backticks or unmatched quotes would break the gh pr edit --body "$NEW_BODY" call. Write to a temp file and use --body-file instead. * fix: retrigger job drops PRs after the first jq outputs newline-separated numbers but GITHUB_OUTPUT only preserves the first line. Convert to space-separated so the for loop processes all matching PRs. * fix: harden workflows against shell injection - Move attacker-influenced values (${{ user.login }}, step outputs) from expression interpolation in run: blocks to env vars - Replace echo "$PR_BODY" | grep with write-to-file + grep-file to avoid shell expansion of untrusted PR body content - Same treatment for PR body handling in retrigger and stale jobs * refactor: replace peter-evans actions with gh api calls Remove peter-evans/find-comment and peter-evans/create-or-update-comment third-party action dependencies. Replace with gh api calls for finding, creating, updating, and deleting bot comments. Eliminates supply chain risk from unpinned third-party actions. * docs: add pull_request_target security comment --------- Signed-off-by: Andrea Manoel <amanoel@nvidia.com>

* ci: bump the all-actions group with 5 updates Bumps the all-actions group with 5 updates: | Package | From | To | | --- | --- | --- | | [actions/checkout](https://github.com/actions/checkout) | `4.3.1` | `6.0.2` | | [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) | `7.6.0` | `8.0.0` | | [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `6.0.0` | `7.0.1` | | [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates) | `0.65.12` | `0.88.1` | Updates `actions/checkout` from 4.3.1 to 6.0.2 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4.3.1...de0fac2) Updates `astral-sh/setup-uv` from 7.6.0 to 8.0.0 - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](astral-sh/setup-uv@37802ad...cec2083) Updates `actions/download-artifact` from 7.0.0 to 8.0.1 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@37930b1...3e5f45b) Updates `actions/upload-artifact` from 6.0.0 to 7.0.1 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@b7c566a...043fb46) Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.65.12 to 0.88.1 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](NVIDIA-NeMo/FW-CI-templates@21f18ae...2a49420) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: astral-sh/setup-uv dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/download-artifact dependency-version: 8.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 0.88.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> * ci: skip docs preview deploy for Dependabot PRs GitHub does not expose repository secrets to Dependabot PRs, so the Cloudflare Pages deploy always fails with a missing API token. Skip the entire job when the actor is dependabot[bot]. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andre Manoel <amanoel@nvidia.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>

* docs: add text-to-sql devnote * add diagram, update content * correct inconsistencies * docs: address PR #349 feedback and add BIRD benchmark results PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway #1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post * docs: address second round of PR #349 feedback - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha) * docs: address round 2 PR #349 feedback, replace production block with recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha) * docs: polish Try It Yourself and Summary sections - Wrap minimal inline example in collapsible <details> dropdown - Rename "A Team Effort" section to "Summary" - Remove redundant Scale/Dialects/Dataset line * docs: add missing sql_dialect sampler to Step 1 code snippet The Step 3/4 prompt templates reference {{ sql_dialect }} but the Step 1 seeding code never defined it, leaving an unresolved Jinja2 variable for readers following along. Add the sql_dialect sampler with a comment explaining the pipeline runs once per dialect. * fix ascii diagram * docs: fix BIRD score framing and MySQL dialect wording - Remove specific "60-70%" BIRD claim from intro to avoid contradiction with the 41.80%/38.25% direct-generation results shown later (those higher figures come from specialized systems with schema linking) - Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and CONVERT_TZ are valid MySQL functions; the pipeline excluded them for portability, not because the dialect forbids them * docs: move text-to-sql images to assets/ convention and update refs * docs: address text-to-sql devnote review comments - Add devnote to mkdocs nav after Async All the Way Down - Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe) - Fix score extraction truthy check to use 'is not none' (preserves score-0 values) - Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect) - Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y - Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe) --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Co-authored-by: Yev Meyer <ymeyer@nvidia.com>

- Update post date from 2026-03-11 to 2026-04-14 so it appears as the newest post on the devnotes page. - Replace raw <img> tags with markdown image syntax so mkdocs rewrites relative paths correctly for the blog plugin's slug-based URLs. - Overlay mkdocs.yml from HEAD in publish-devnotes workflow so new nav entries are included in devnotes-only rebuilds.

The yq JSON roundtrip was mangling the entire mkdocs.yml file (indentation, quoting, comments), causing mike deploy to fail. Extract a Python script that surgically replaces only the Dev Notes nav block, leaving all other content byte-identical.

* plan: add skip_when for conditional column generation (#479) Adds implementation plan for a `skip_when` field on `SingleColumnConfig` that enables conditional column generation. When the Jinja2 expression evaluates truthy, the cell is set to None and the generator is skipped. Skips auto-propagate through the DAG to downstream columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: remove HopChain example from skip_when plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: replace HopChain example with generic product review example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: add open questions on skip sentinel value and row filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: major revision — SkipConfig model, sync engine support, decouple propagation - Introduce SkipConfig(when, value) as nested model on SingleColumnConfig - Move propagate_skip to SingleColumnConfig as independent field, fixing bug where columns with no SkipConfig couldn't participate in propagation - Add full sync engine implementation (Steps 4a-4d) covering both _fan_out_with_threads and _run_full_column_generator dispatch paths - Add serialization boundary stripping for both DatasetBatchManager (sync) and RowGroupBufferManager (async) - Simplify architecture diagrams for readability - Update all references, design decisions, verification plan Made-with: Cursor * updates * plan: document get_required_columns for skip propagation - Explain why propagation must not use get_upstream_columns() once skip.when adds DAG edges; add _required_columns and get_required_columns() to the execution graph plan - Point async _run_cell at get_required_columns for parity with sync - Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for DataFrames; tighten resolved-questions wording - Extend DAG/graph verification with gating_col regression case Refs #479 Made-with: Cursor * plan: centralize __skipped__ handling in skip_provenance - Document new skip_provenance.py (key constant, read/write/strip API) - Point sync builder, async scheduler, and batch buffers at shared helpers - Strip metadata before every DataFrame from buffer dicts, including FULL_COLUMN active subsets - Split §3 into skip_evaluator vs skip_provenance; extend verification Refs #479 Made-with: Cursor * plan: align doc title with SkipConfig / skip.when Drop legacy skip_when naming in headings and #362 cross-reference. Refs #479 Made-with: Cursor * plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization - SkipConfig._validate_when_syntax now checks find_undeclared_variables is non-empty, rejecting expressions without {{ }} delimiters that would silently skip every row - evaluate_skip_when centralizes try/except so both sync and async engines get identical fail-safe behavior on eval errors - evaluate_skip_when takes a single pre-deserialized record; caller runs deserialize_json_values once and passes to both skip eval and generator (no double deserialization, no redundant parameter) - Update _should_skip_cell, async _run_cell, Files Modified table, and verification section accordingly Refs #479 Made-with: Cursor * plan: add get_side_effect_columns accessor to execution graph spec Document _side_effects_by_producer inverse map and get_side_effect_columns() accessor on ExecutionGraph, needed by _write_skip_to_record / apply_skip_to_record to clear __trace, __reasoning_content, etc. on skip. Added to both Step 2b metadata section and Files Modified table. The __skipped__ leak into active_df (greptile's other P1) was already fixed in 7046378 via strip_skip_metadata_from_records. Refs #479 Made-with: Cursor * add skip.when conditional column generation Introduce SkipConfig on SingleColumnConfig to gate column generation with a Jinja2 expression. Columns can be skipped by expression or by upstream propagation (propagate_skip flag). - SkipConfig: Pydantic model with config-time syntax/delimiter/variable validation and cached column extraction from the Jinja2 AST - skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment with fail-safe error handling (skip on expected failures) - skip_provenance: centralized __skipped__ record tracking shared by sync builder, async scheduler, and buffer managers - DAG/ExecutionGraph: skip.columns wired as dependency edges in both topological sort and static execution graph - Validation: validate_skip_references checks reference existence, sampler/seed scope, and allow_resize conflicts - Sync builder: cell-by-cell and full-column skip with merge-back - Async scheduler: cell and batch skip with live-buffer provenance Made-with: Cursor * fix review findings for skip.when implementation - Add skip evaluation to _fan_out_with_async (was missing, causing skipped rows to still be sent to the LLM) - Preserve __skipped__ provenance on non-skipped records after full-column generation so multi-hop propagation works - Use single live-buffer reference in _run_batch skip loop for consistency with _run_cell - Move Template import to TYPE_CHECKING and reorder import blocks - Replace O(n²) sum() with itertools.chain in dag.py - Add set_required_columns/set_propagate_skip/set_skip_config setters to ExecutionGraph for symmetry with existing API Made-with: Cursor * add conditional generation with skip recipe and refactor skip helpers Add a new recipe demonstrating skip.when patterns (expression gate, propagation, opt-out) with a customer support ticket pipeline. Also extract _should_skip_record in async_scheduler, remove the redundant propagate_skip param from should_skip_by_propagation, and pass a precomputed all_side_effects set through the DAG sort. Made-with: Cursor * updates * fixes * remove recipe > inject conditional gen into existing tutorial * regen colab notebooks * fix: handle missing execution graph in _column_can_skip Return False when the graph has not been initialized instead of raising, since skip logic cannot apply before generators are set up. Made-with: Cursor * parametrize some tests * public before private * slight refactor for readability * parametrize some tests * minor fixes * reanme internla skip tracker key name * clarify intent in comment * when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it * remove inline import * minor refactor for clarity * fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch Two bugs in the sequential engine's _run_full_column_generator: 1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False fallthrough), breaking propagate_skip for downstream columns when an independent FULL_COLUMN generator ran between skip-setting and propagating columns. 2. _column_can_skip returned True for allow_resize=True columns via propagation, causing the skip-aware merge path to raise on the 1:1 row-count check for 1:N generators. - Add restore_skip_metadata helper to skip_tracker.py - Guard _column_can_skip against allow_resize=True columns - Refactor _run_full_column_generator into three focused methods - Remove dead allow_resize / _log_resize_if_changed from skip path - Remove redundant _require_graph() calls in skip helpers - Add single_column_config_by_name cached property - Add integration tests for both bugs and unit tests for the helper Made-with: Cursor * address review comments on skip.when PR (#502) - Extract shared skip decision logic (_should_skip_cell / _should_skip_record) into should_skip_column_for_record() in skip_evaluator.py so both sync and async engines call the same function (andreatgretel review comment) - Extend SkipConfig self-reference validation to cover side-effect columns (e.g. review__trace on the review column) — previously only checked self.name, now checks self.name | self.side_effect_columns - Add async engine integration tests for skip paths: cell-by-cell with propagation and full-column batch skip (exercises _run_cell / _run_batch) - Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default propagate_skip=True so it actually exercises the allow_resize guard - Move get_skipped_column_names from skip_tracker to skip_evaluator (sole production consumer) Made-with: Cursor * address cr feedback * Fix issue with full column generating messing up order of skipped rows * add skip conditional generation edge case tests - test_skip_evaluator: parametrized should_skip_column_for_record covering propagation, expression gates, short-circuiting, and disabled propagation - test_execution_graph: skip metadata accessors (get_skip_config, should_propagate_skip, get_required_columns, get_side_effect_columns, resolve_side_effect, skip.when DAG edges) - test_dataset_builder: chained transitive propagation (4 levels), two independent skip gates, custom skip.value, row count preservation Made-with: Cursor * fix: make expression jinja validator private Rename assert_expression_valid_jinja to _assert_expression_valid_jinja to match the private naming convention used by other model validators. Made-with: Cursor --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use pull_request_target for agentic CI on fork PRs * fix: read recipe files from base branch to prevent prompt injection Recipe files define the agent's prompt. When using pull_request_target, the fork's HEAD is checked out, so a malicious fork could craft recipe files to exfiltrate API secrets via prompt injection. Fix by adding a second sparse checkout from the base branch for .agents/recipes/ and reading prompts from there instead of the fork tree. * fix: align actions/checkout version for base-recipes checkout Match the base-branch recipe checkout to v6.0.2 (same SHA as the PR branch checkout) for consistency. * fix: move expression interpolations to env vars in gate and review jobs Replace direct ${{ }} interpolation in run: blocks with env vars. Most values are GitHub-controlled, but github.event.label.name can contain arbitrary characters and could break shell quoting. Moving everything to env: is consistent with the injection-hardening pattern applied in the rest of the workflow.

* Added starter dev notes on push to huggingface hub * fix: move excerpt marker to intro and remove redundant markers Move the single <\!-- more --> to after the intro paragraph for a shorter blog teaser and remove the 6 redundant markers throughout the post. * Update docs/devnotes/posts/push-datasets-to-hugging-face-hub.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * docs: add HF ecosystem context to push-to-hub dev notes (#474) * docs: add HF ecosystem context to push-to-hub dev notes Add section on what datasets get on the Hub (Dataset Viewer, streaming, Viewer API), link to Hub search for DataDesigner datasets, and note that private datasets can be flipped to public. * Update docs/devnotes/posts/push-datasets-to-hugging-face-hub.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: remove doubled library: prefix in Hub search URL --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update date * fix date for text-to-sql * update hero images" * updates --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>

…nc engine (#545) * feat: bridge model.generate() to agenerate() for custom columns in async engine Custom column generators that call model.generate() fail under the async engine because the sync HTTP client is unavailable. Add an _AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the sync-client RuntimeError and schedules agenerate() on the engine's persistent event loop via run_coroutine_threadsafe. Includes a deadlock guard for async custom columns running on the event loop. * refactor: wrap facades at sync call site, not in _build_models_dict Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into _invoke_generator_function() so the async path gets raw facades. The bridge proxy is only needed for sync custom columns; async columns already have direct access to model.agenerate(). * fix: address review feedback - typed exception, timeout cleanup, kwargs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4) * fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions The decorator catches all exceptions and wraps them into DataDesignerError, which prevented the async bridge proxy from ever seeing the original error. Add an early match case that re-raises SyncClientUnavailableError directly. * refactor: make SYNC_BRIDGE_TIMEOUT a public constant Drop the underscore prefix since the constant is exported and used across modules (base.py and custom.py).

…flow (#543) * ci: add daily audit suites with 5 recipes and scheduled workflow Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI plan). A new workflow runs one audit suite per weekday via day-of-week rotation, with runner memory persisted via actions/cache. Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed), code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff don't cover: cross-reference validation, transitive dep analysis, lazy import compliance, complexity trends, and test-to-source mapping. Reports go to the Actions step summary. Code changes use /create-pr. * ci: add executable smoke checks and harden runner memory Add executable smoke checks to test-health and code-quality recipes that exercise real code paths (config build, validate, import timing, registry completeness, error hierarchy, input rejection) without needing an LLM provider. Checks are split into fixed canaries (same every run) and creative checks (agent varies inputs each run). Harden runner memory: define JSON schema in _runner.md with TTL and size rules, validate state file after agent runs, only update last_run on success, drop unused audit-log.md. Add make install-dev workflow step so recipes can run Python against the installed packages. * ci: fix codex review findings - test paths, provider check, step gating Fix issues found by Codex review: - Fix test paths: tests/ does not exist at repo root, use packages/*/tests/ and packages/data-designer/tests/test_import_perf.py - Remove DataDesigner(model_providers=[]) from smoke checks - raises NoModelProvidersError; keep config-layer checks only - Fix audit step gating: remove continue-on-error, use step outcome to gate runner memory update (|| true + continue-on-error made the step always "succeed", defeating the success() condition) * ci: fix review findings - heredoc, state validation, lazy import wording Fix heredoc with indented EOF terminator that never terminates - replace with printf. Run state validation on all outcomes (not just success) so corrupted state from a failed audit is caught before caching. Only stamp last_run when audit succeeds. Align test-health lazy import section with its own Constraints (report count only, don't duplicate structure audit). Also fixes datetime.utcnow() deprecation and shell variable injection in Python string by using os.environ instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Apr 9, 2026

Commits on Apr 13, 2026

Commits on Apr 14, 2026

Commits on Apr 15, 2026

Commits on Apr 16, 2026

Commits on Apr 17, 2026

This comparison is taking too long to generate.

Uh oh!