Skip to content

Conversation

@zerob13
Copy link
Collaborator

@zerob13 zerob13 commented Nov 21, 2025

Summary by CodeRabbit

  • New Features
    • Added support for explicit classification of different model types: chat, embedding, reranking, and image generation.
    • Implemented intelligent model type inference based on model capabilities when explicit type information is unavailable.
    • Enhanced model database schema to include model type metadata for improved organization and categorization.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Adds model type normalization and inference throughout the data pipeline. The provider database fetch script normalizes model types by lowercasing and standardizing format. A new schema field and type helper support model type metadata. A new inference method enables dynamic type determination based on explicit values or inferred modalities.

Changes

Cohort / File(s) Summary
Provider DB Normalization
scripts/fetch-provider-db.mjs
Introduces normalization for model.type by lowercasing, removing underscores/dashes, and mapping variants (image_generation, image-gen) to standard enum values (chat, embedding, rerank, imageGeneration). Assigns normalized type to each sanitized model.
Model Type Schema
src/shared/types/model-db.ts
Adds optional type field to ModelSchema with enum values ['chat', 'embedding', 'rerank', 'imageGeneration']. Introduces internal getModelTypeValue helper to parse and normalize input forms. Updates sanitizeAggregate to populate the new type field.
Config Building with Type Inference
src/main/presenter/configPresenter/modelConfig.ts
Adds private inferModelType method with three-tier priority: (1) explicit model.type, (2) inference from modalities.output containing 'image', (3) default to Chat. Updates buildConfigFromProviderModel to use inferred type instead of hardcoded ModelType.Chat.

Sequence Diagram(s)

sequenceDiagram
    participant DB as Provider DB
    participant Fetch as fetch-provider-db.mjs
    participant Schema as model-db.ts
    participant Config as modelConfig.ts
    
    DB->>Fetch: Raw model data
    Fetch->>Fetch: Normalize type field<br/>(lowercase, remove _, -)
    Fetch->>Schema: Sanitized model with type
    Schema->>Schema: Apply getModelTypeValue()<br/>parse & validate type
    Schema->>Schema: Populate ModelSchema.type
    Config->>Config: inferModelType() priority:<br/>1) explicit type<br/>2) modalities inference<br/>3) default Chat
    Config->>Config: buildConfigFromProviderModel<br/>uses inferred type
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Type normalization logic in scripts/fetch-provider-db.mjs requires verification that all variant mappings (image_generation, image-gen, etc.) are correctly handled
  • Priority-based inference in inferModelType() method needs validation that the three-tier logic correctly handles edge cases and missing modalities data
  • Schema integration in model-db.ts should be reviewed to ensure getModelTypeValue handles all input variants robustly

Possibly related PRs

  • PR #971: Also modifies src/main/presenter/configPresenter/modelConfig.ts — adds provider ID alias resolution logic in the same module, complementing the model type inference changes introduced here.

Poem

🐰 Types aligned, the models now shine bright,
Normalized forms, from left to right,
Chat, embedding, images too—
Inferred with care, made fresh and new!
~CodeRabbit's whisker twitch of delight

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: provider db parse image type' directly relates to the main change: adding model type normalization and inference across the provider database parsing pipeline, including specific handling for image generation types.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/provider-db-parse-image

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/main/presenter/configPresenter/modelConfig.ts (1)

54-83: Type inference logic is sound; consider optional richer heuristics for non-image models

The new inferModelType correctly:

  • Respects normalized model.type from the provider DB (chat, embedding, rerank, imageGeneration).
  • Falls back to ImageGeneration when modalities.output includes 'image'.
  • Defaults to Chat when nothing is specified, preserving previous behavior for unknown models.

This should fix the image-type parsing issue for provider DB entries.

If you later see providers that omit type for embeddings or rerankers, you might consider extending the modalities-based inference (e.g., output containing 'embedding' or some rerank-specific marker) before defaulting to chat. Not required for this bugfix, but worth keeping in mind.

Also applies to: 93-93

src/shared/types/model-db.ts (1)

56-57: Schema + normalization pipeline for model.type/cost is coherent; just note coupling with the fetch script

The additions here are internally consistent:

  • ModelSchema.cost as z.record(z.union([z.string(), z.number()])).optional() matches getStringNumberRecord, ensuring cost is a flat map of primitive values.
  • ModelSchema.type plus ModelTypeValue ('chat' | 'embedding' | 'rerank' | 'imageGeneration') lines up with ModelConfigHelper.inferModelType and the values used in the UI.
  • getModelTypeValue correctly:
    • Accepts exact standard values including 'imageGeneration'.
    • Normalizes variants like image_generation, image-generation, image-gen, and differing case to 'imageGeneration'.
    • Returns undefined for unknowns so type remains optional.

sanitizeAggregate then uses both getStringNumberRecord and getModelTypeValue, so runtime data passed around as ProviderModel should always match ModelSchema.

The only thing to keep an eye on is the duplication of this normalization with sanitizeAggregateJson in scripts/fetch-provider-db.mjs. If a new model type or alias is introduced, both places will need to be updated together.

Also applies to: 134-134, 161-188, 301-302

scripts/fetch-provider-db.mjs (1)

103-119: Model type normalization is correct but duplicated; document coupling or consider centralizing

The type normalization from m.type to 'chat' | 'embedding' | 'rerank' | 'imageGeneration' (including image_generation / image-generation variants) is working correctly, as confirmed by the generated providers.json output showing normalized types. However, this logic is duplicated in two places—here and in getModelTypeValue from src/shared/types/model-db.ts—which increases drift risk if new types are added later.

Additionally, cost is passed through as any object without type validation, while the shared sanitizer expects Record<string, string | number>. This creates a potential type-shape mismatch.

Suggested actions:

  • Add a comment referencing getModelTypeValue to document the coupling and warn future maintainers to keep both branches in sync.
  • After running the fetch script, verify that cost values in providers.json conform to the expected shape or align validation here with the shared schema.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4a3ff4 and 41192ee.

📒 Files selected for processing (3)
  • scripts/fetch-provider-db.mjs (2 hunks)
  • src/main/presenter/configPresenter/modelConfig.ts (2 hunks)
  • src/shared/types/model-db.ts (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/main/presenter/configPresenter/modelConfig.ts (1)
src/shared/types/model-db.ts (1)
  • ProviderModel (60-60)
src/shared/types/model-db.ts (1)
scripts/fetch-provider-db.mjs (1)
  • normalized (108-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-check (x64)

@zerob13 zerob13 merged commit bba8cea into dev Nov 21, 2025
2 checks passed
@zerob13 zerob13 deleted the bugfix/provider-db-parse-image branch November 23, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants