docs: adding initial mkdocs structure by andreatgretel · Pull Request #1 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2025-10-29T13:53:58Z

Just adding some initial structure. Can add some more scaffold if needed, maybe we can think about sessions we want etc.

PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway #1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post

- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha)

… recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha)

* docs: add text-to-sql devnote * add diagram, update content * correct inconsistencies * docs: address PR #349 feedback and add BIRD benchmark results PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway #1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post * docs: address second round of PR #349 feedback - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha) * docs: address round 2 PR #349 feedback, replace production block with recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha) * docs: polish Try It Yourself and Summary sections - Wrap minimal inline example in collapsible <details> dropdown - Rename "A Team Effort" section to "Summary" - Remove redundant Scale/Dialects/Dataset line * docs: add missing sql_dialect sampler to Step 1 code snippet The Step 3/4 prompt templates reference {{ sql_dialect }} but the Step 1 seeding code never defined it, leaving an unresolved Jinja2 variable for readers following along. Add the sql_dialect sampler with a comment explaining the pipeline runs once per dialect. * fix ascii diagram * docs: fix BIRD score framing and MySQL dialect wording - Remove specific "60-70%" BIRD claim from intro to avoid contradiction with the 41.80%/38.25% direct-generation results shown later (those higher figures come from specialized systems with schema linking) - Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and CONVERT_TZ are valid MySQL functions; the pipeline excluded them for portability, not because the dialect forbids them * docs: move text-to-sql images to assets/ convention and update refs * docs: address text-to-sql devnote review comments - Add devnote to mkdocs nav after Async All the Way Down - Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe) - Fix score extraction truthy check to use 'is not none' (preserves score-0 values) - Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect) - Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y - Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe) --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Co-authored-by: Yev Meyer <ymeyer@nvidia.com>

* docs: add text-to-sql devnote * add diagram, update content * correct inconsistencies * docs: address PR NVIDIA-NeMo#349 feedback and add BIRD benchmark results PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway NVIDIA-NeMo#1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post * docs: address second round of PR NVIDIA-NeMo#349 feedback - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway NVIDIA-NeMo#1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha) * docs: address round 2 PR NVIDIA-NeMo#349 feedback, replace production block with recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway NVIDIA-NeMo#1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha) * docs: polish Try It Yourself and Summary sections - Wrap minimal inline example in collapsible <details> dropdown - Rename "A Team Effort" section to "Summary" - Remove redundant Scale/Dialects/Dataset line * docs: add missing sql_dialect sampler to Step 1 code snippet The Step 3/4 prompt templates reference {{ sql_dialect }} but the Step 1 seeding code never defined it, leaving an unresolved Jinja2 variable for readers following along. Add the sql_dialect sampler with a comment explaining the pipeline runs once per dialect. * fix ascii diagram * docs: fix BIRD score framing and MySQL dialect wording - Remove specific "60-70%" BIRD claim from intro to avoid contradiction with the 41.80%/38.25% direct-generation results shown later (those higher figures come from specialized systems with schema linking) - Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and CONVERT_TZ are valid MySQL functions; the pipeline excluded them for portability, not because the dialect forbids them * docs: move text-to-sql images to assets/ convention and update refs * docs: address text-to-sql devnote review comments - Add devnote to mkdocs nav after Async All the Way Down - Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe) - Fix score extraction truthy check to use 'is not none' (preserves score-0 values) - Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect) - Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y - Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe) --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Co-authored-by: Yev Meyer <ymeyer@nvidia.com>

…gs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4)

…nc engine (#545) * feat: bridge model.generate() to agenerate() for custom columns in async engine Custom column generators that call model.generate() fail under the async engine because the sync HTTP client is unavailable. Add an _AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the sync-client RuntimeError and schedules agenerate() on the engine's persistent event loop via run_coroutine_threadsafe. Includes a deadlock guard for async custom columns running on the event loop. * refactor: wrap facades at sync call site, not in _build_models_dict Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into _invoke_generator_function() so the async path gets raw facades. The bridge proxy is only needed for sync custom columns; async columns already have direct access to model.agenerate(). * fix: address review feedback - typed exception, timeout cleanup, kwargs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4) * fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions The decorator catches all exceptions and wraps them into DataDesignerError, which prevented the async bridge proxy from ever seeing the original error. Add an early match case that re-raises SyncClientUnavailableError directly. * refactor: make SYNC_BRIDGE_TIMEOUT a public constant Drop the underscore prefix since the constant is exported and used across modules (base.py and custom.py).

…585) * fix(async): exclude all retryable errors from early-shutdown gate The gate previously only excluded `ModelRateLimitError`, leaving `ModelTimeoutError`, `ModelInternalServerError`, and `ModelAPIConnectionError` to count toward the sliding-window error rate. Under provider degradation these errors cluster in time (concurrent in-flight requests time out together), so 5/10 in a row is easy and trips the gate even when salvage could recover the rows. Refs #575. * feat(async): WARN log when provider showing degraded performance Diagnostic A/Bs against build.nvidia.com showed runs failing silently under provider degradation - no log indication that retryable errors were piling up until the early-shutdown gate fired (or, post-fix, until salvage exhaustion). Surfacing this earlier helps users distinguish "DataDesigner is broken" from "the upstream provider is slow today." Tracks a separate sliding window over retryable-vs-not for every task outcome (independent of the early-shutdown gate's window) and emits a throttled WARN when the rolling fraction crosses the threshold. Refs #575. * fix(async): salvage partial row groups on early shutdown Before: when the early-shutdown gate fired, any row group still in flight stayed in `_rg_states` un-checkpointed. The buffer manager later raised `FileNotFoundError` when the builder tried to read the finalized parquet. User-visible result: `0 records produced`. After: a new `_finalize_after_shutdown` step runs in `run()`'s finally block, after `_cancel_workers` has drained in-flight tasks (Codex caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to write into a buffer that's being finalized). For each remaining row group it drops rows that aren't fully complete, then delegates to the existing `_checkpoint_completed_row_groups` so the buffer manager's zero-survivor handling (skip empty parquet, free buffer) kicks in unchanged. Also surfaces partial completion as a structured signal: scheduler exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]` properties so callers can detect partial completion programmatically rather than parsing log lines. Builder uses this to emit a more specific WARN distinguishing early shutdown from non-shutdown drops. Refs #575. * fix(throttle): reset consecutive_429s on non-rate-limit failure In `release_failure`, the cascade counter wasn't reset, so a sequence like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade counter feeds AIMD's reduce-once-per-cascade logic; the second 429 should start a fresh cascade and trigger another concurrency reduction, but currently doesn't. Standalone bug surfaced during #575 investigation; not on the failure path that drives the gate-trip outcome but worth fixing while we're in this code. * fix(custom): preserve retryability through CustomColumnGenerator wrap A real-workload run of #575 showed the early-shutdown gate still trips even with the gate-exclusion fix in place: the trigger is 10 timeouts inside Anonymizer's QA-repair custom columns, all wrapped in CustomColumnGenerationError (non-retryable) by the catch-all in CustomColumnGenerator. Two fixes here: 1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the scheduler's _is_retryable correctly classifies them. 2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError instead of stdlib TimeoutError. Without this the sync bridge times out as the wrong exception type and is still classified non-retryable even after fix #1. Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the scheduler and the wrap site need it, and models/errors is the appropriate home alongside the error class definitions. Refs #575. * feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs When the async scheduler hits early shutdown and produces zero records, the buffer manager skips writing parquet (correctly), so ArtifactStorage.load_dataset_with_dropped_columns() raises FileNotFoundError. Previously this surfaced as a generic DataDesignerGenerationError wrapping the FileNotFoundError, which is ambiguous (could be missing files for any reason). This commit: - Adds DataDesignerEarlyShutdownError as a subclass of DataDesignerGenerationError so existing handlers still match while callers that want to react programmatically (retry on different alias, surface a degraded-provider message, etc.) can catch the specific type. - Plumbs the scheduler's structured signals (early_shutdown, partial_row_groups) up through the builder so they're available at data_designer.create() time without re-introspecting the scheduler. - create() raises the typed error in both failure modes (load fails or empty DataFrame returned) when builder.early_shutdown is True. Refs #575. * fix(async): emit first degraded-provider WARN regardless of clock state Initialize _last_degraded_warn_at to -inf so the first WARN is always emitted. The previous initialization to 0.0 suppressed the first WARN on fresh CI runners where time.monotonic() returns a small value (system boot uptime), making the throttle interval check (now - 0.0 < interval) true on the first attempt. * fix(async): address review findings on early-shutdown salvage PR Five real correctness issues caught in review of the original PR, plus a few smaller cleanups and test simplifications. Throttle - cascade reset (regression of existing AIMD invariant): release_failure() now resets consecutive_429s only when in_flight == 0. Resetting unconditionally broke "reduce once per cascade" when 429/500/429 arrived interleaved within a single in-flight burst - the second 429 was treated as a new cascade and the limit got halved twice for what was effectively one rate-limit event. Interface - typed-error gating: DataDesignerEarlyShutdownError now fires only when early_shutdown is true AND actual_num_records == 0. Without this, a partial-salvage run that fails to load for unrelated reasons (corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero records produced," hiding the real cause. Async - WARN window scope: the degraded-provider warning was fed by every task outcome, including samplers and non-LLM customs. In realistic pipelines (one model column, several non-model columns) the rate stayed under threshold even when every model call was failing, silencing the WARN exactly when it mattered. Now gated on is_llm. Async/builder - signal preservation across raises: scheduler.early_shutdown and partial_row_groups are captured in a try/finally around future.result(), so a processor failure during the salvage path doesn't drop the structured signal. Both build() and build_preview() now reset per-run state at the start so reused builders don't leak prior-run flags. Async - dead code: dispatch_error capture in run() was unread (the post- finally check is unreachable on the exception path). Removed. Smaller cleanups: - early-shutdown WARN says "non-retryable error rate exceeded threshold" - bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces it; the throttled degraded-provider WARN is the user-facing signal) - TODO note for threading degraded_warn_* through RunConfig - doc note in _finalize_after_shutdown clarifying that pre-batch processor isn't re-run on partial-salvage row groups Tests: - new regression tests for the cascade burst case, partial-salvage error gating, and LLM-only WARN window - direct unit test for _reset_run_state - dedup via _make_storage / _seed_plus_cell_setup helpers - WARN emission cases parametrized into a single test - shared parametrize lists hoisted to module-level constants - redundant cascade test dropped in favor of the more thorough drain variant; redundant healthy-baseline test folded into the zero-survivor test * chore(async): address Nabin's review comments Style cleanups, parametrization, docstring polish, and one consistency fix in the typed-error path. All non-blocking ("Ship it (with nits)"). interface/data_designer.py: - preview() now raises DataDesignerEarlyShutdownError when shutdown produced zero records (parity with create()), and also gates on actual_num_records == 0 so partial-salvage runs that fail to load don't get misdiagnosed - create()'s defensive empty-DF guard mirrors the load-failure guard with the same actual_num_records == 0 check async_scheduler.py: - _record_retryable_outcome docstring clarifies that the call site filters by is_llm; the function alone reads as if every outcome feeds the window dataset_builder.py: - moved _reset_run_state() down past the public methods to match the project's public-before-private convention test_custom.py: - flattened TestAsyncBridgedModelFacade class into module-level test functions (matches the rest of the file) - hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade, SyncClientUnavailableError) to top of file - driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly instead of the hand-rolled factory list, so new retryable types pick up coverage automatically - dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_ model_timeout_error - pytest.raises already enforces the type, the duplicate block was running the same slow scenario twice test_async_scheduler.py: - parametrize over RETRYABLE_MODEL_ERRORS directly (same as above) test_data_designer.py: - added preview-path tests for the typed-error and partial-salvage fall-through cases - updated the existing empty-DF test to also patch actual_num_records=0 (otherwise the new gating in the empty-DF guard skips the typed error) * test(interface): consolidate create() error-dispatch tests into a matrix Five separate tests (two existing, three new from earlier in this PR) all probed the same dispatch logic in create(): "given a load outcome and a builder state, which error type should fire?" Pulled them into a single parametrized matrix indexed by (load_side_effect, early_shutdown, actual_num_records). Net result: 5 named tests → 1 parametrized test with 6 cells, and the previously-missing empty_df + shutdown + partial salvage cell is now covered. Test names retain readable IDs (load_fails_shutdown_zero_records etc.) so failures still pinpoint the exact case in pytest output.

andreatgretel requested a review from johnnygreco October 29, 2025 13:53

johnnygreco reviewed Oct 29, 2025

View reviewed changes

Comment thread pyproject.toml Outdated

johnnygreco approved these changes Nov 3, 2025

View reviewed changes

andreatgretel added 5 commits November 3, 2025 13:48

first test

c1fdd4c

don't autopublish for now

06f7018

linting etc

90d3258

added headers

c0b6ddc

adding group for docs dependencies

f556250

andreatgretel force-pushed the andreatgretel/docs branch from 6c66bcf to f556250 Compare November 3, 2025 16:48

andreatgretel merged commit 19e5e46 into main Nov 3, 2025
10 checks passed

andreatgretel deleted the andreatgretel/docs branch November 19, 2025 16:37

greptile-apps Bot mentioned this pull request Mar 9, 2026

docs: add text-to-sql dev note #349

Merged

This was referenced Apr 7, 2026

ci: add PR review workflow and recipe for agentic CI #498

Merged

ci: publish devnotes independently of releases #536

Merged

ci: add PR hygiene automation (linked issue check + stale PR cleanup) #521

Merged

github-actions Bot mentioned this pull request Apr 14, 2026

fix: bridge model.generate() to agenerate() for custom columns in async engine #545

Merged

6 tasks

nabinchha mentioned this pull request Apr 17, 2026

ci: add daily audit suites with 5 rotating recipes and scheduled workflow #543

Merged

6 tasks

andreatgretel mentioned this pull request Apr 29, 2026

feat: resume interrupted dataset generation runs (sync + async engine) #526

Merged

40 tasks

johnnygreco mentioned this pull request Apr 30, 2026

feat(models): deprecate implicit default provider routing #594

Merged

7 tasks

github-actions Bot mentioned this pull request May 29, 2026

docs: re-adopt global theme; components self-inject CSS #716

Merged

5 tasks

github-actions Bot mentioned this pull request Jun 11, 2026

feat: add workflow stage resume #747

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: adding initial mkdocs structure#1

docs: adding initial mkdocs structure#1
andreatgretel merged 5 commits into
mainfrom
andreatgretel/docs

andreatgretel commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreatgretel commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants