Skip to content

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625

Merged
nabinchha merged 5 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider
Jun 12, 2026
Merged

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625
nabinchha merged 5 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider

Conversation

@nabinchha

@nabinchha nabinchha commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Final cleanup after the #589 deprecation cycle. Providers are now a named list referenced explicitly by each ModelConfig: no hidden registry default, no YAML default: flowing into DataDesigner.__init__, and no ambiguity about which provider wins.

The original release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes #594, so users have had a release containing the deprecation warnings before this removal lands.

Related Issue

Closes #590. Depends on #588 (merged as #591) and #589 (merged as #594, released in v0.6.0).

Changes

Removed

  • Removed ModelProviderRegistry.default, default-provider validators, and get_default_provider_name() from the engine registry.
  • Removed the config-layer get_default_provider_name() helper.
  • Removed the CLI default-provider field, set_default / get_default service methods, default-tracking in update/delete, the "Change default provider" menu option, and the "Default" column from data-designer config list.
  • Removed the top-level default_provider and per-item configured_provider / effective_provider fields from agent introspection output in favor of a single explicit per-item provider field.

Changed

  • Made ModelConfig.provider required.
  • Changed ModelProviderRegistry.get_provider(name) to require a concrete provider name.
  • Changed resolve_model_provider_registry(providers) to drop default_provider_name.
  • Simplified DataDesigner.__init__ so it resolves providers without consulting YAML defaults.
  • Split the CLI model-form error for "no providers configured" versus "multiple providers configured but none selected."
  • Updated legacy docs and Fern docs to remove the deprecated default-provider workflow, mark provider as required, and call out the upgrade impact for stale model_configs.yaml entries and agent context consumers.

Compatibility

  • Existing model_providers.yaml files with a legacy default: key still load cleanly because both ModelProviderRegistry classes inherit directly from pydantic.BaseModel, preserving pydantic v2's default extra="ignore" behavior. This invariant is now documented on the classes and covered by tests.
  • Existing model_configs.yaml entries that omit provider or set it to null must be updated with an explicit provider name.

Attention Areas

Reviewers should pay special attention to:

Testing

  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer/tests/cli/forms/test_model_builder.py packages/data-designer-engine/tests/engine/test_model_provider.py packages/data-designer/tests/cli/repositories/test_provider_repository.py packages/data-designer/tests/interface/test_data_designer.py packages/data-designer-config/tests/config/test_models.py packages/data-designer-config/tests/config/test_default_model_settings.py -q (162 passed before the final docs-wording amend)
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff check ...
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff format --check ...
  • Full GitHub Actions on rebased branch: restarted after the final docs-wording amend
  • fern check: not run locally because the fern CLI is not installed in this environment

Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Unit tests added/updated
  • Docs updated

@nabinchha nabinchha requested a review from a team as a code owner May 9, 2026 00:19
@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

MkDocs preview: https://95f6d2d7.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-625.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Review: PR #625refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

Summary

Final cleanup of the #589 deprecation cycle: drops the implicit "default provider" concept end-to-end. ModelConfig.provider goes from str | None = None to required str; ModelProviderRegistry.default (and both its validators + get_default_provider_name()) is removed; the CLI's set_default/get_default/"Change default provider" workflow is gone; agent introspection drops default_provider/effective_provider. Net −722 lines across 31 files, test suite updated in lockstep (deprecation-warning tests → strict-validation tests). Migration for on-disk YAMLs carrying a legacy default: relies on pydantic v2's extra="ignore" default, locked in by two new tests.

The PR is well-scoped, well-tested, and architecturally clean. The dependency direction (interface → engine → config) is preserved and the two ModelProviderRegistry classes (engine + CLI repo) are updated consistently. CI is green on the Python 3.10–3.12 matrix; 3.13 runs are still in progress at review time.

The primary risk is a hard public-API break, gated on the #594 deprecation cycle having shipped in a release — which the PR author has correctly flagged as a blocker and the release history confirms has not yet happened (last release v0.5.9, 2026-04-28; #594 merged 2026-05-05).

Findings

⛔ Blocker (already called out by author)

⚠️ Breaking changes worth an explicit user-facing note

  • ModelConfig.provider becomes required. Anyone constructing ModelConfig(alias=..., model=...) without provider= now gets ValidationError. Likewise any on-disk ~/.data-designer/model_configs.yaml entry that omitted provider: or serialised it as null will fail to load after upgrade. The PR body notes "Migration is one-line per call site" for code, but on-disk configs from pre-feat(models): deprecate implicit default provider routing #594 users have no migration helper — they will hit ValidationError on load. Consider either (a) adding a CHANGELOG/release-notes entry calling this out, or (b) catching this specific ValidationError in ProviderRepository.load / ModelRepository.load with a targeted error message pointing at the fix. I lean toward (a) since users on stale model configs are a shrinking set and a clear CHANGELOG entry is cheaper than a bespoke migration path.

  • Agent introspection shape change. get_model_aliases_state() drops default_provider from the top level and collapses configured_provider + effective_provider into a single provider field (see packages/data-designer/src/data_designer/cli/utils/agent_introspection.py). This is observable output that agents consuming data-designer agent context parse. The PR description lists this under "Removed" but doesn't flag it as a schema change. If any downstream agent scaffolding keys off those three field names, it will silently produce missing-field errors or empty strings. Worth an explicit callout in release notes alongside the ModelConfig change.

✅ Migration path for legacy YAML default: — correct and well-tested

  • Both ModelProviderRegistry classes (engine model_provider.py and CLI provider_repository.py) inherit from BaseModel directly, not ConfigBase (which overrides extra="forbid"). The PR correctly locks this in:
    • test_registry_ignores_legacy_default_key (engine)
    • test_load_silently_ignores_legacy_default_key (CLI repo)
  • Subtle-but-important invariant: if someone later rebases these classes onto ConfigBase, the migration silently breaks (old YAMLs would start raising ValidationError on the stale default: key). A short comment on the class itself — "inherits BaseModel directly so extra='ignore' lets us drop legacy default: keys" — would protect against this. The docstring on ProviderRepository.load already says this; the class itself doesn't.

🟡 Minor / nits

  • provider_controller.py menu order change. The pre-refactor code appended "exit" after the conditional "change_default" so it always rendered last. The new code inlines "exit" in the initial dict literal, which keeps it last only because there is now nothing else to append. Behaviour is identical today; flagging only because a future "add menu option" patch could accidentally push exit out of its conventional last-position slot. Not worth changing.

  • Unused import cleanup is thoroughwarn_at_caller removed from models.py, default_model_settings.py, model_provider.py, provider_controller.py, provider_repository.py. Good. No orphaned warnings import.

  • scripts/benchmarks/benchmark_engine_v2.py correctly updated to drop default_provider_name=. Nice catch — benchmark scripts are easy to miss.

  • Docs parity: Four docs/concepts/models/*.md files had their deprecation warnings stripped. Consistent. Didn't spot any orphaned #589 references in the docs tree, but a quick grep -r "589" docs/ before merging wouldn't hurt.

  • CLI README.md drops the "default: nvidia" line from the example YAML. Good — keeps user-visible examples aligned with the new schema.

Test coverage observations

  • Previously-added warning-emission tests (test_model_config_provider_none_emits_deprecation_warning, test_explicit_default_emits_deprecation_warning, test_handle_change_default_emits_deprecation_warning, etc.) have been correctly replaced by their post-cycle counterparts (ValidationError-based). No test appears to have been silently dropped.
  • test_init_no_user_providers_loads_from_yaml is the right shape for the remaining DataDesigner.__init__ YAML-fallback path, replacing the older three-test cluster (test_init_user_supplied_providers_ignore_unrelated_yaml_default, test_init_user_supplied_providers_preserve_first_wins_over_yaml_default, test_init_yaml_default_emits_single_deprecation_warning). The first two were bug: default model provider leaks from YAML into DataDesigner constructor #588 regressions — those bugs can't reoccur once the default field is gone, so removal is correct.
  • No new e2e tests, but the PR description's "N/A (no runtime behavior change beyond stricter validation)" is accurate: this is a pure API-surface tightening.

Security / performance

  • No security implications. No credentials handling changed.
  • No performance implications. One fewer model_validator on ModelConfig and three fewer on ModelProviderRegistry — negligible but nominally a micro-win.

Verdict

Approve pending release gate. The code change is clean, the migration strategy for existing YAMLs is sound and tested, and the CI signal is strong. The two items worth addressing before merge are non-blocking:

  1. (Recommended) Add a CHANGELOG / release-notes entry explicitly calling out (a) ModelConfig.provider becoming required, and (b) the agent-introspection schema change. Users upgrading from v0.5.9 → next release with stale on-disk configs or downstream agent tooling will otherwise hit an undocumented break.
  2. (Nit) Drop a one-line comment on both ModelProviderRegistry classes pinning the BaseModel-not-ConfigBase invariant that makes the legacy-default: migration work.

Both are worth doing but neither should block merge once the release gate lifts.

@greptile-apps

greptile-apps Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR completes the removal of the deprecated ModelProviderRegistry.default registry-level default provider that was deprecated in v0.6.0. ModelConfig.provider is now a required str field, all implicit-default routing is removed, and a migration guard (LegacyModelConfigMigrationError) is added to the ModelRepository so existing on-disk configs missing provider fail fast with a clear message rather than silently returning None.

  • Engine/config layer: ModelProviderRegistry.default, its validators, and get_default_provider_name() are fully removed; resolve_model_provider_registry drops default_provider_name; pydantic v2 extra="ignore" silently drops legacy default: keys on load.
  • CLI layer: Provider service drops set_default/get_default; provider controller drops the "Change default provider" menu entry; the model-form builder now raises distinct errors for "no providers" vs "multiple providers but none selected"; the agent introspection schema exposes only a single provider field per model alias item.
  • Test suite: 162 tests pass; deprecated-path tests are replaced with new regression tests covering the required-provider enforcement and the legacy-YAML compatibility invariant.

Confidence Score: 5/5

Safe to merge. The removal is mechanically complete — every call site has been updated, deprecated validators are gone, and the pydantic extra-ignore invariant for legacy YAML compatibility is both documented and tested.

The deprecation cycle was completed cleanly: ModelConfig.provider is now a required field enforced at construction time by pydantic, legacy on-disk configs missing provider surface a clear LegacyModelConfigMigrationError instead of silently returning None, and the YAML backward-compatibility path is verified by regression tests. All 162 test cases pass and the changed contracts are consistently reflected across all layers.

No files require special attention. The three attention areas called out by the PR author — models.py, model_provider.py, and agent_introspection.py — all look correct.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/model_provider.py Removes ModelProviderRegistry.default, all related validators, and get_default_provider_name(); get_provider now requires a concrete name; resolve_model_provider_registry drops default_provider_name parameter; clean removal with correct pydantic extra-ignore docstring added.
packages/data-designer-config/src/data_designer/config/models.py Makes ModelConfig.provider a required str; removes _warn_on_implicit_provider validator and the deprecation-warning import; docstring updated to reflect required status.
packages/data-designer/src/data_designer/cli/repositories/model_repository.py Adds LegacyModelConfigMigrationError and a pre-validation raw-dict scan (_aliases_missing_provider) that raises a descriptive error when on-disk entries omit provider; file-load failure path correctly split from validation failure path.
packages/data-designer/src/data_designer/interface/data_designer.py Removes YAML-default consulted path; resolve_model_provider_registry called without default_provider_name; all warnings.catch_warnings suppression blocks removed; simplified provider init to a single line.
packages/data-designer/src/data_designer/cli/utils/agent_introspection.py Removes default_provider, configured_provider, and effective_provider fields from introspection state; now reports only provider per item; adds LegacyModelConfigMigrationError handling in _load_registry to surface migration errors as a structured AgentIntrospectionError.
packages/data-designer/src/data_designer/cli/forms/model_builder.py Replaces silent provider=None fallback with two distinct ValueError raises — one for no providers configured, one for multiple providers without a selection — correctly addressing the previously reported misleading error message.
packages/data-designer/src/data_designer/cli/services/provider_service.py Removes set_default, get_default, and the registry.default tracking in update/delete; remaining logic is clean and correct.
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py Drops default field from CLI-layer ModelProviderRegistry; removes deprecation-warning emission; relies on pydantic v2 extra=ignore to silently drop legacy default: key, with that invariant documented and tested.
packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py Removes default_provider: header line from formatted model-alias output; _format_table simplified by removing column_labels parameter now that provider is the direct column key.
packages/data-designer/tests/interface/test_data_designer.py Removes ~175 lines of deprecation-path tests; replaces with a single clean YAML-fallback test; all deprecated-provider construction paths correctly removed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Before["Before (deprecated path)"]
        A1["ModelConfig(alias=..., model=...)"] -->|"provider=None"| B1["ModelProviderRegistry.get_provider(None)"]
        B1 -->|"fallback to registry.default or providers[0]"| C1["ModelProvider resolved"]
    end

    subgraph After["After (this PR)"]
        A2["ModelConfig(alias=..., model=..., provider='openai')"] -->|"provider required"| B2["ModelProviderRegistry.get_provider('openai')"]
        B2 -->|"exact name lookup"| C2["ModelProvider resolved"]
        B2 -->|"name not found"| E2["UnknownProviderError"]
    end

    subgraph CLI["CLI model_repository.load()"]
        F["load_config_file()"] --> G{{"provider field present & non-null?"}}
        G -->|"No"| H["LegacyModelConfigMigrationError (descriptive message)"]
        G -->|"Yes"| I["ModelConfigRegistry.model_validate()"]
        I -->|"OK"| J["Registry returned"]
        I -->|"Other error"| K["return None (silent)"]
    end

    subgraph Compat["Legacy YAML Compatibility"]
        L["model_providers.yaml with 'default: foo'"] -->|"pydantic extra=ignore"| M["ModelProviderRegistry loaded, default key dropped"]
    end
Loading

Reviews (6): Last reviewed commit: "fix: block model config writes when lega..." | Re-trigger Greptile

Comment on lines 294 to +299
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in form_data. When len(self.available_providers) > 1 and no "provider" key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

Suggested change
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
else:
if self.available_providers:
raise ValueError(
"Cannot build ModelConfig: multiple providers are configured but none "
"was specified in form data. Specify provider= explicitly on each ModelConfig."
)
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/forms/model_builder.py
Line: 294-299

Comment:
The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in `form_data`. When `len(self.available_providers) > 1` and no `"provider"` key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

```suggestion
        else:
            if self.available_providers:
                raise ValueError(
                    "Cannot build ModelConfig: multiple providers are configured but none "
                    "was specified in form data. Specify provider= explicitly on each ModelConfig."
                )
            raise ValueError(
                "Cannot build ModelConfig: no provider specified in form data and no "
                "available providers configured. Add a provider first via "
                "`data-designer config providers`."
            )
```

How can I resolve this? If you propose a fix, please make it concise.

model: str
inference_parameters: InferenceParamsT = Field(default_factory=ChatCompletionInferenceParams)
provider: str | None = None
provider: str

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and Codex both caught a migration edge here: once provider is required, an existing model_configs.yaml with one provider-less entry makes ModelRepository.load() return None, so CLI and agent flows can treat the whole model file as missing. Could we add a migration/error path in ModelRepository.load() so users see "add provider to alias X" instead of silently losing the registry?

@andreatgretel

Copy link
Copy Markdown
Contributor

docs/concepts/models/model-configs.md:18

provider | str | No | Reference to the name of the Provider to use ... If not specified, one set as the default provider ...

model-configs.md still says provider is not required and falls back to a default. Since this PR removes that path, this doc should say Required: Yes and drop the default-provider fallback text.

@andreatgretel

Copy link
Copy Markdown
Contributor

Looks like the release gate in the PR body is still active: latest GitHub release is v0.5.9, published before #594 merged, so users have not yet had a released version with the deprecation warning. Assuming that gate still applies, this should wait for the next release before merging.

@andreatgretel andreatgretel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the migration issue and stale docs noted above. The release gate in the PR body also still looks active.

@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch 2 times, most recently from f97431c to eab4026 Compare May 20, 2026 15:52
…g.provider (#590)

Final cleanup after the #589 deprecation cycle. Providers are now a
named list referenced by name end-to-end — no hidden registry default,
no YAML default leaking into DataDesigner construction (#588), and no
ambiguity about which provider wins.

ModelConfig.provider becomes a required str. ModelProviderRegistry.default
and the associated validators, get_default_provider_name() helper, CLI
"Change default provider" workflow, and the "Default" column in
`data-designer config list` are removed. resolve_model_provider_registry()
drops its default_provider_name kwarg, and DataDesigner.__init__ no
longer threads a YAML default through registry construction.

Pydantic v2's extra="ignore" default lets existing on-disk YAMLs with
default: keys load cleanly, so users on the deprecation cycle migrate
without breakage.
@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch from eab4026 to 09eeae7 Compare May 20, 2026 15:54
@nabinchha

Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. I pushed an updated branch and believe the comments are now addressed:

  • docs/concepts/models/model-configs.md now marks provider as required and removes the default-provider fallback text. I also made the same update in the Fern docs page.
  • The release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes feat(models): deprecate implicit default provider routing #594, so users have had a release containing the deprecation warnings before this removal lands. I updated the PR body accordingly.
  • The breaking-change notes are now explicit in both the PR body and docs: stale model_configs.yaml entries that omit provider or set it to null need an explicit provider name.
  • The agent-context schema change is called out more precisely: consumers should read each model alias item's provider; the top-level default_provider and per-item configured_provider / effective_provider fields are no longer emitted.
  • I added class docstrings on both ModelProviderRegistry classes documenting the BaseModel / extra="ignore" invariant for legacy default: YAML keys.
  • I addressed Greptile's CLI diagnostic note by splitting the ModelFormBuilder.build_config error messages for "no providers configured" versus "multiple providers configured but none selected", with a focused test.

The branch is rebased onto current main. GitHub Actions, docs preview, DCO, and Greptile are all green on the latest commit. Could you re-review when you have a chance?

@nabinchha nabinchha requested a review from andreatgretel May 20, 2026 16:09
@andreatgretel

Copy link
Copy Markdown
Contributor

hey, sorry for the slow re-review here, I was out on PTO and this slipped. One migration edge still looks open: with an older model_configs.yaml entry that omits provider, ModelRepository.load() returns None, so ModelService.add() treats the file like an empty registry and overwrites it. I reproduced that with a legacy alias plus one new add, and the saved file only kept the new alias. Could we make this a clear migration error and block writes until the stale aliases are fixed?

Raise LegacyModelConfigMigrationError from ModelRepository.load() when
on-disk model_configs.yaml entries lack a required provider field, so
CLI and agent flows fail fast instead of treating the file as empty and
overwriting legacy aliases on add.
@nabinchha

Copy link
Copy Markdown
Contributor Author

Thanks for catching that — pushed a fix in 5615847.

ModelRepository.load() now raises a new LegacyModelConfigMigrationError when it encounters a model_configs.yaml entry missing the required provider field, instead of returning None. That stops ModelService.add() (and update/delete/list) from treating the file as an empty registry and overwriting stale aliases. The error message names the offending alias(es) and points users at editing the file or re-running data-designer config models after the fix.

Call sites that load the registry surface it directly:

  • ModelController.run() catches it and prints the error before bailing out, so the CLI never reaches the add/update/delete handlers with a stale-but-non-empty file.
  • agent_introspection._load_registry() wraps it as AgentIntrospectionError(code=\"legacy_model_config\") so agent context consumers see a structured failure instead of a silent empty registry.

Coverage added in test_model_repository.py::test_load_legacy_missing_provider_raises and test_model_service.py::test_add_blocks_when_legacy_config_missing_provider (the latter reproduces your scenario: legacy alias on disk + one new `add` → assert it raises and the on-disk file still contains the legacy alias). PTAL.

@nabinchha nabinchha merged commit 241b0f8 into main Jun 12, 2026
62 checks passed
@nabinchha nabinchha deleted the nmulepati/fix-590-remove-implicit-default-provider branch June 12, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

2 participants