feat: Plumb LLM retry controls through RunConfig by eric-tramel · Pull Request #208 · NVIDIA-NeMo/DataDesigner

eric-tramel · 2026-01-13T19:48:16Z

Summary

This PR introduces global, user-configurable controls for LLM generation retry effort via RunConfig, and removes prior module-level hardcoding inside generation tasks (notably llm_completion generators and LLM judge). This is a necessary change so users can explicitly bound/scale effort spent per sample (latency/cost/stability) across all tasks that rely on ModelFacade.generate().

What changed

Added new RunConfig fields (single source of truth):
- max_conversation_restarts: int (default 5, ge=0)
- max_conversation_correction_steps: int (default 0, ge=0)
Removed generator-local hardcoded retry defaults in llm_completion generators
- Generators now read retry settings from self.resource_provider.run_config
Updated LLM judge behavior
- Previously, LLMJudgeCellGenerator used a 2× restart multiplier relative to the module default.
- Now, LLM judge uses the same global RunConfig values as other LLM completion generators.

Why this is needed (user impact)

LLM generation effort per sample was previously implicitly determined by internal defaults (and judge had an additional multiplier), making it difficult for users to:

Cap worst-case latency for dataset generation
Cap cost (token usage) per sample
Tune quality vs throughput consistently

With this change, users can control these tradeoffs in one place.

Behavioral impact / compatibility notes

Defaults preserve current non-judge behavior: RunConfig.max_conversation_restarts=5, max_conversation_correction_steps=0 matches the previous LLM generator defaults.
LLM judge behavior changes: judge no longer gets extra restart budget. Users who relied on higher judge persistence can now set higher global values explicitly via RunConfig.

Plumbing details

DataDesigner already threads RunConfig into the engine via ResourceProvider.run_config.
LLM completion generators (including judge) now pass run_config values into:
- ModelFacade.generate(max_correction_steps=..., max_conversation_restarts=...)

Example usage

from data_designer.essentials import DataDesigner, RunConfig

dd = DataDesigner()
dd.set_run_config(
    RunConfig(
        max_conversation_restarts=2,
        max_conversation_correction_steps=1,
    )
)

Files changed

src/data_designer/config/run_config.py
src/data_designer/engine/column_generators/generators/llm_completion.py
tests/engine/column_generators/generators/test_llm_completion_generators.py
tests/interface/test_data_designer.py

Tests

Updated unit tests verify defaults/persistence and that generators/judge pass run_config values into model.generate(...).

Control retry logic with RunConfig

128451f

eric-tramel changed the title ~~Plumb LLM retry controls through RunConfig (remove hardcoded retries; unify LLM judge + generators)~~ feat: Plumb LLM retry controls through RunConfig (remove hardcoded retries; unify LLM judge + generators) Jan 13, 2026

eric-tramel changed the title ~~feat: Plumb LLM retry controls through RunConfig (remove hardcoded retries; unify LLM judge + generators)~~ feat: Plumb LLM retry controls through RunConfig Jan 13, 2026

eric-tramel requested review from johnnygreco and nabinchha January 13, 2026 19:50

eric-tramel self-assigned this Jan 13, 2026

eric-tramel added the enhancement New feature or request label Jan 13, 2026

nabinchha approved these changes Jan 13, 2026

View reviewed changes

johnnygreco reviewed Jan 13, 2026

View reviewed changes

Comment thread src/data_designer/engine/column_generators/generators/llm_completion.py

johnnygreco approved these changes Jan 13, 2026

View reviewed changes

eric-tramel merged commit b18fc57 into main Jan 13, 2026
21 of 25 checks passed

github-actions Bot mentioned this pull request Apr 28, 2026

docs: add VLM long-document understanding dev note and recipes #579

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Plumb LLM retry controls through RunConfig#208

feat: Plumb LLM retry controls through RunConfig#208
eric-tramel merged 1 commit into
mainfrom
ewt/configure-restarts

eric-tramel commented Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eric-tramel commented Jan 13, 2026

Summary

What changed

Why this is needed (user impact)

Behavioral impact / compatibility notes

Plumbing details

Example usage

Files changed

Tests

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants