Summary
DataDesigner.__init__ always reads the default: key from ~/.data-designer/model_providers.yaml and applies it to the runtime ModelProviderRegistry, even when the user supplies their own model_providers list. This causes two related problems:
- Hard failure — if the YAML's
default names a provider that isn't in the user-supplied list, construction raises ValidationError: Specified default 'X' not found in providers list.
- Silent override — if the YAML's
default happens to match a provider in the user-supplied list (but not the first one), the documented "first wins" behavior is silently overridden.
Dormant on fresh installs (seed YAML is written without a default: key), but hit immediately by anyone who uses dd config providers "Change default provider", hand-edits the YAML, or relies on a service/plugin that programmatically writes a default.
Repro 1: hard failure
import os, tempfile, yaml
from pathlib import Path
tmp_home = Path(tempfile.mkdtemp(prefix="dd_home_"))
os.environ["DATA_DESIGNER_HOME"] = str(tmp_home)
(tmp_home / "model_providers.yaml").write_text(yaml.safe_dump({
"default": "nvidia",
"providers": [{
"name": "nvidia",
"endpoint": "https://integrate.api.nvidia.com/v1",
"provider_type": "openai",
"api_key": "NVIDIA_API_KEY",
}],
}))
from data_designer.config.models import ModelProvider
from data_designer.interface.data_designer import DataDesigner
custom_providers = [
ModelProvider(name="my-vllm", endpoint="https://my-vllm.example.com/v1",
provider_type="openai", api_key="MY_VLLM_API_KEY"),
]
DataDesigner(model_providers=custom_providers)
ValidationError: 1 validation error for ModelProviderRegistry
Value error, Specified default 'nvidia' not found in providers list
Repro 2: silent override
Same setup but YAML has default: foo and user passes [bar, foo] (in that order). Expected default (per the "first wins" documented behavior): bar. Actual: foo.
Root cause
DataDesigner.__init__ passes get_default_provider_name() (which reads the YAML) unconditionally:
https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer/src/data_designer/interface/data_designer.py#L153-L157
self._model_providers = self._resolve_model_providers(model_providers)
self._mcp_providers = mcp_providers or []
self._model_provider_registry = resolve_model_provider_registry(
self._model_providers, get_default_provider_name()
)
get_default_provider_name() reads the YAML:
https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-config/src/data_designer/config/default_model_settings.py#L97-L98
The resolver then sets it as the registry's default, trusted over model_providers[0].name:
https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-engine/src/data_designer/engine/model_provider.py#L70-L78
And the registry's validator hard-rejects a default that isn't in the providers list:
https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-engine/src/data_designer/engine/model_provider.py#L47-L51
Existing tests confirm the friction is felt
Two tests in packages/data-designer/tests/interface/test_data_designer.py already work around this by patching get_default_provider_name (lines 861-867 and 901-907). The stub_model_providers fixture has exactly one provider named stub-model-provider and the patch exists purely to prevent the YAML's default from leaking in. No test asserts the buggy behavior — a fix would let those patches drop away.
Suggested fix (minimal, non-breaking)
Only consult the YAML default when the user didn't supply their own providers:
if model_providers is None:
self._model_providers = self._resolve_model_providers(None)
default_name = get_default_provider_name()
else:
self._model_providers = self._resolve_model_providers(model_providers)
default_name = None # User-supplied list owns its default (first wins)
self._mcp_providers = mcp_providers or []
self._model_provider_registry = resolve_model_provider_registry(
self._model_providers, default_name
)
Closes both repros. Doesn't break anything — the two test patches become unnecessary but still pass. No public API change.
Severity
- Fresh installs: dormant (seed YAML has no
default: key).
- CLI users who set a default: hit immediately.
- Service/plugin scenarios: high impact. Anywhere a service writes a YAML default and a plugin then constructs
DataDesigner with its own providers, this lands.
Related
Architectural follow-ups are tracked separately:
Summary
DataDesigner.__init__always reads thedefault:key from~/.data-designer/model_providers.yamland applies it to the runtimeModelProviderRegistry, even when the user supplies their ownmodel_providerslist. This causes two related problems:defaultnames a provider that isn't in the user-supplied list, construction raisesValidationError: Specified default 'X' not found in providers list.defaulthappens to match a provider in the user-supplied list (but not the first one), the documented "first wins" behavior is silently overridden.Dormant on fresh installs (seed YAML is written without a
default:key), but hit immediately by anyone who usesdd config providers"Change default provider", hand-edits the YAML, or relies on a service/plugin that programmatically writes a default.Repro 1: hard failure
Repro 2: silent override
Same setup but YAML has
default: fooand user passes[bar, foo](in that order). Expected default (per the "first wins" documented behavior):bar. Actual:foo.Root cause
DataDesigner.__init__passesget_default_provider_name()(which reads the YAML) unconditionally:https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer/src/data_designer/interface/data_designer.py#L153-L157
get_default_provider_name()reads the YAML:https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-config/src/data_designer/config/default_model_settings.py#L97-L98
The resolver then sets it as the registry's default, trusted over
model_providers[0].name:https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-engine/src/data_designer/engine/model_provider.py#L70-L78
And the registry's validator hard-rejects a default that isn't in the providers list:
https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/packages/data-designer-engine/src/data_designer/engine/model_provider.py#L47-L51
Existing tests confirm the friction is felt
Two tests in
packages/data-designer/tests/interface/test_data_designer.pyalready work around this by patchingget_default_provider_name(lines 861-867 and 901-907). Thestub_model_providersfixture has exactly one provider namedstub-model-providerand the patch exists purely to prevent the YAML'sdefaultfrom leaking in. No test asserts the buggy behavior — a fix would let those patches drop away.Suggested fix (minimal, non-breaking)
Only consult the YAML default when the user didn't supply their own providers:
Closes both repros. Doesn't break anything — the two test patches become unnecessary but still pass. No public API change.
Severity
default:key).DataDesignerwith its own providers, this lands.Related
Architectural follow-ups are tracked separately:
ModelProviderRegistry.defaultentirely