docs: add structured outputs SDG dev notes by dhruvnathawani · Pull Request #338 · NVIDIA-NeMo/DataDesigner

dhruvnathawani · 2026-02-19T07:01:42Z

Summary

Add a dev note documenting the structured outputs SDG pipeline used to generate training data for Nemotron Nano v3's structured output capabilities.

What's in the post

Motivation: Why structured output reliability matters for agentic AI applications (14.81% error rate on baseline)
Benchmark results: JSONSchemaBench (80.2% → 86.9%) and StructEval-Text (64.5% → 72.1%) with per-format breakdown (CSV, JSON, TOML, XML, YAML)
Pipeline walkthrough: Seed data → diversity samplers → schema generation → conversation → structured output → rejection sampling (3x rollouts) → programmatic validation
ASCII pipeline diagram showing the 4-stage flow
Screenshot of display_sample_record() output showing a complete generated record
Discussion of LLMStructuredColumnConfig vs dynamic per-record schemas
Published dataset: nvidia/Nemotron-RL-instruction_following-structured_outputs (9,949 samples, CC BY 4.0)
Caveats: TOML/XML challenges, schema depth diminishing returns
Collapsible demo script using default DD config (pip install + run)

Files changed

docs/devnotes/posts/structured-outputs.md (new)
docs/devnotes/posts/images/structured-outputs-sample-record.png (new)
docs/devnotes/.authors.yml (added dnathawani)

…m/NVIDIA-NeMo/DataDesigner into dhruv/devnotes/structured-outputs

greptile-apps · 2026-02-19T07:42:59Z

Greptile Summary

Adds comprehensive documentation for the structured outputs SDG pipeline used to generate training data for Nemotron Nano v3's structured output capabilities.

The post includes:

Benchmark results showing significant improvements (JSONSchemaBench: 80.2% → 86.9%, StructEval-Text: 64.5% → 72.1%)
Detailed 4-stage pipeline architecture with ASCII diagram
Technical walkthrough of schema generation, diversity sampling, multi-rollout generation, and rejection sampling
Working demo code with installation instructions
Link to published dataset on HuggingFace (9,949 samples, CC BY 4.0)
Discussion of design choices and caveats

The documentation is well-structured, technically accurate, and follows the existing devnotes format established in other posts.

Confidence Score: 5/5

This PR is safe to merge with no risk
Documentation-only PR with three well-formed files: a properly formatted markdown devnote, a valid screenshot image, and a clean author registry update. No code changes or functional modifications.
No files require special attention

Important Files Changed

Filename	Overview
docs/devnotes/.authors.yml	Adds new author `dnathawani` to the documentation authors list with proper formatting
docs/devnotes/posts/images/structured-outputs-sample-record.png	Screenshot showing sample Data Designer output with seed columns, schema, conversation, and validation results
docs/devnotes/posts/structured-outputs.md	Comprehensive dev note documenting structured output SDG pipeline with benchmarks, architecture diagram, and working demo code

_{Last reviewed commit: f3765f3}

…rmatting

mvansegbroeck · 2026-02-19T17:40:32Z

+                             │
+                             ▼
+ ┌─────────────────────────────────────────────────────────────────┐
+ │                  STAGE 2: DIVERSITY CONTROLS                    │


would suggest your AI to make the boxes a bit wider to avoid the warping. Would make those look nicer

Good suggestion, done

mvansegbroeck · 2026-02-19T17:45:18Z

+
+---
+
+## **Step 1: Seed Data and Schema Generation**


Many headings use ##. Maybe not all need to be a heading and just some need to be in bold only.

mvansegbroeck · 2026-02-19T17:48:12Z

+3. **Diversity at every level.** Diverse topics, diverse schemas (depth/width/rigidity), diverse formats, diverse prompts. Each dimension independently improves robustness.
+4. **Rejection sampling is cheap insurance.** 3x rollouts push per-record validity from ~80% to >95%. The marginal token cost is small compared to the quality gain.
+5. **Validation must be programmatic.** LLM judges assess *design quality* but cannot reliably detect *schema violations*. `jsonschema` + format parsers are non-negotiable.
+6. **The hardest formats need the most data.** TOML and XML lag behind JSON and YAML. The pipeline makes it easy to oversample hard formats.


Your demo script only does JSON right. Maybe a brief note here how what is needed to extend this to TOML/XML etc

Good point, made a note

mvansegbroeck · 2026-02-19T17:52:45Z

+
+The stakes are high. When an LLM serves as a backend for tool-calling agents, a single malformed JSON response doesn't just produce a bad answer; it crashes the entire agentic pipeline. The function call fails, the agent can't recover, and the user sees an error. OpenAI, Anthropic, and Google have all invested heavily in structured output guarantees for exactly this reason.
+
+When we measured our baseline model, roughly 1 in 5 structured outputs was malformed. For an API serving thousands of requests, that's hundreds of failures per hour. Our goal was to reduce this as much as possible through targeted synthetic data.


On JSONSchemaBench and 35% on StructEval-Text, right?

mvansegbroeck

Great blogpost. Few minor changes but approving already.

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nabinchha · 2026-02-23T18:56:37Z

+
+config.with_seed_dataset(
+    dd.DataFrameSeedSource(df=seed_df),
+    sampling_strategy=SamplingStrategy.SHUFFLE,


nit: this can just be dd.SamplingStrategy.SHUFFLE? Then you wouldn't need to explicitly import SamplingStrategy up top.

Changed, thanks

nabinchha · 2026-02-23T18:57:38Z

+
+**Key Resources:**
+
+- **Dataset (download):** [nvidia/Nemotron-RL-instruction_following-structured_outputs](https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs) (CC BY 4.0)


This dataset doesn't yet have a datadesigner tag. Can it be added?

Nor a link in references

Good point, will reach out to add these

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

nabinchha

🚀

dhruvnathawani added 2 commits February 18, 2026 22:59

devnotes: add structured outputs SDG blog post

dbf77d4

Add author

0043db0

dhruvnathawani changed the title ~~devnotes: Add Structured Outputs SDG Blog Post~~ docs: Add Structured Outputs SDG dev notes Feb 19, 2026

dhruvnathawani changed the title ~~docs: Add Structured Outputs SDG dev notes~~ docs: add structured outputs SDG dev notes Feb 19, 2026

dhruvnathawani added 3 commits February 18, 2026 23:14

Add author

9989a7d

Add author

afcaf71

Merge branch 'dhruv/devnotes/structured-outputs' of https://github.co…

97a374c

…m/NVIDIA-NeMo/DataDesigner into dhruv/devnotes/structured-outputs

dhruvnathawani marked this pull request as ready for review February 19, 2026 07:41

dhruvnathawani requested a review from a team as a code owner February 19, 2026 07:41

dhruvnathawani added 2 commits February 18, 2026 23:52

docs: add benchmark links, clean up flowchart, remove em dashes

377492c

docs: add collapsible demo script, use default DD config, clean up fo…

12e5f72

…rmatting

dhruvnathawani requested a review from mvansegbroeck February 19, 2026 08:10

docs: update baseline error rate, remove specific percentage targets

6b74504

mvansegbroeck reviewed Feb 19, 2026

View reviewed changes

docs: widen ASCII pipeline diagram, update baseline error rate

259ce2a

mvansegbroeck reviewed Feb 19, 2026

View reviewed changes

dhruvnathawani added 2 commits February 19, 2026 09:49

docs: reduce heading levels per review feedback

ed98dac

docs: add note on extending demo to YAML/XML formats

91188b0

mvansegbroeck reviewed Feb 19, 2026

View reviewed changes

mvansegbroeck previously approved these changes Feb 19, 2026

View reviewed changes

docs: clarify baseline error rate range (20-35% depending on benchmark)

947529b

dhruvnathawani dismissed mvansegbroeck’s stale review via 947529b February 19, 2026 17:59

dhruvnathawani and others added 3 commits February 19, 2026 14:27

docs: increase diagram spacing

e64b797

Merge branch 'main' into dhruv/devnotes/structured-outputs

180ec32

Merge branch 'main' into dhruv/devnotes/structured-outputs

0b3d652

greptile-apps Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread docs/devnotes/posts/structured-outputs.md Outdated

nabinchha reviewed Feb 23, 2026

View reviewed changes

nabinchha previously approved these changes Feb 23, 2026

View reviewed changes

Update typo

d59582d

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

dhruvnathawani dismissed nabinchha’s stale review via d59582d February 23, 2026 23:29

docs: use dd.SamplingStrategy instead of explicit import

bce36ef

dhruvnathawani requested review from kirit93, mvansegbroeck and nabinchha February 24, 2026 02:01

Merge branch 'main' into dhruv/devnotes/structured-outputs

f3765f3

nabinchha approved these changes Feb 25, 2026

View reviewed changes

dhruvnathawani merged commit f07624b into main Feb 25, 2026
47 checks passed


		The stakes are high. When an LLM serves as a backend for tool-calling agents, a single malformed JSON response doesn't just produce a bad answer; it crashes the entire agentic pipeline. The function call fails, the agent can't recover, and the user sees an error. OpenAI, Anthropic, and Google have all invested heavily in structured output guarantees for exactly this reason.

		When we measured our baseline model, roughly 1 in 5 structured outputs was malformed. For an API serving thousands of requests, that's hundreds of failures per hour. Our goal was to reduce this as much as possible through targeted synthetic data.


		Key Resources:

		- Dataset (download): [nvidia/Nemotron-RL-instruction_following-structured_outputs](https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs) (CC BY 4.0)

Conversation

dhruvnathawani commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the post

Files changed

Uh oh!

greptile-apps Bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mvansegbroeck left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nabinchha Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nabinchha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dhruvnathawani commented Feb 19, 2026 •

edited

Loading

greptile-apps Bot commented Feb 19, 2026 •

edited

Loading

nabinchha Feb 23, 2026 •

edited

Loading