Skip to content

feat: add modalities field to /v1/models API response#1772

Merged
looplj merged 3 commits into
looplj:unstablefrom
MoshiCoCo:unstable
Jun 3, 2026
Merged

feat: add modalities field to /v1/models API response#1772
looplj merged 3 commits into
looplj:unstablefrom
MoshiCoCo:unstable

Conversation

@MoshiCoCo

Copy link
Copy Markdown
Contributor

Summary

Add modalities field to the /v1/models API extended response, returning the supported input/output types for each model (e.g. text, image, audio, video, pdf).

Background

Coding Agent like PI need to understand model capability boundaries before making API calls. While the /v1/models endpoint already returns capabilities (vision, tool_call, reasoning) and pricing metadata, it lacks information about the input/output modality types a model supports.

In typical agent scenarios, the Coding Agent must dynamically select the appropriate model based on the user's content type:

  • Text-only input → any chat model works
  • Input with images → requires a model with image in its input modalities
  • Image generation needed → requires a model with image in its output modalities

Without modalities, Coding Agent can only hardcode model capabilities or rely on trial-and-error, making dynamic routing impossible.

API Response

{
  "id": "gpt-4o",
  "object": "model",
  "modalities": {
    "input": ["text", "image", "audio"],
    "output": ["text"]
  }
}

Behavior

The field follows the same rules as existing extended fields (capabilities, pricing, etc.):

Scenario Modalities returned
Default request (default_model_api_include_all=false) ❌ No
?include=modalities ✅ Yes
?include=all ✅ Yes
default_model_api_include_all=true ✅ Yes

Data source: Read from the existing ModelCard.Modalities field — no database schema changes required.

Files Changed

  • internal/server/api/openai.go — Add Modalities struct, extend field parsing and conversion logic
  • internal/server/api/openai_retrieve_test.go — Add modalities assertions to existing tests
  • docs/zh/api-reference/openai-api.md — Update Chinese documentation
  • docs/en/api-reference/openai-api.md — Update English documentation

Add modalities (input/output types) to the extended model metadata in the
/v1/models endpoint. This allows clients to discover which input types
(text, image, audio, video, pdf) and output types (text) a model supports.

Changes:
- Add Modalities struct and field to OpenAIModel response
- Include modalities in extended fields list and model conversion logic
- Update API documentation (en/zh) with modalities field description
- Add and update test cases for modalities coverage
@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a modalities field to the /v1/models extended API response, exposing the supported input/output types (e.g. text, image, audio) for each model. It follows the existing pattern for optional extended fields — omitted by default, included via ?include=modalities or ?include=all.

  • openai.go: Adds the Modalities struct, wires it into OpenAIModel, extends the extendedFields list, and safely normalises nil Input/Output slices to []string{} in the conversion path to prevent null arrays in the response.
  • openai_retrieve_test.go: Populates Modalities in existing test fixtures, adds assertions for the populated case, and adds a dedicated new test verifying that a ModelCard with a zero-value ModelCardModalities still returns {\"input\":[],\"output\":[]} rather than null.
  • Docs: Both English and Chinese API reference pages are updated with the new field description, example JSON, and possible values.

Confidence Score: 5/5

Safe to merge — the change is additive, all nil-slice edge cases are handled, and the new test verifies zero-value behaviour end-to-end.

The conversion code explicitly guards every nil Input/Output slice before constructing the response struct, so the null-array problem that existed for zero-value ModelCardModalities is fully addressed. The new TestOpenAIHandlers_RetrieveModel_ReturnsEmptyModalitiesWhenZeroValue test validates this path end-to-end through the full HTTP handler stack. The change is purely additive (new optional field, omitted by default), so existing callers that do not request modalities are unaffected.

No files require special attention.

Important Files Changed

Filename Overview
internal/server/api/openai.go Adds Modalities struct, extends OpenAIModel with modalities pointer field, updates extendedFields list, and correctly guards nil slices with []string{} in the conversion path.
internal/server/api/openai_retrieve_test.go Adds modalities assertions to existing tests and a new test covering the zero-value ModelCardModalities case, ensuring empty arrays are returned rather than null.
docs/en/api-reference/openai-api.md Updates English docs to document the new modalities field, its sub-keys, and possible values; includes a JSON example in the extended response block.
docs/zh/api-reference/openai-api.md Mirrors the English documentation changes in Chinese, consistent with existing bilingual doc pattern.

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIHandlers
    participant parseOpenAIModelInclude
    participant convertModelToOpenAIExtended
    participant ModelCard

    Client->>OpenAIHandlers: "GET /v1/models?include=modalities"
    OpenAIHandlers->>parseOpenAIModelInclude: "includeParam=modalities, defaultIncludeAll=false"
    parseOpenAIModelInclude-->>OpenAIHandlers: "include={modalities:true}, needFullData=true"

    OpenAIHandlers->>convertModelToOpenAIExtended: model, include
    convertModelToOpenAIExtended->>ModelCard: "m.ModelCard != nil?"
    alt ModelCard present
        ModelCard-->>convertModelToOpenAIExtended: Modalities.Input, Modalities.Output
        Note over convertModelToOpenAIExtended: nil slice to []string{}
        convertModelToOpenAIExtended-->>OpenAIHandlers: OpenAIModel with Modalities Input Output
    else ModelCard nil
        convertModelToOpenAIExtended-->>OpenAIHandlers: OpenAIModel Modalities nil omitted
    end

    OpenAIHandlers-->>Client: JSON with modalities input output arrays
Loading

Reviews (3): Last reviewed commit: "style: fix gci import grouping - testify..." | Re-trigger Greptile

Comment thread internal/server/api/openai.go
Comment thread internal/server/api/openai_retrieve_test.go
MoshiCoCo added 2 commits June 3, 2026 19:29
…o-value ModelCard

- Add nil checks for Modalities.Input and Modalities.Output, initializing
  empty slices when nil to prevent JSON null serialization
- Add test case for ModelCard with zero-value Modalities (nil slices)
- Fix godot linter error: add period to comment
@looplj looplj merged commit c0899a6 into looplj:unstable Jun 3, 2026
4 checks passed
junjiangao pushed a commit to junjiangao/axonhub that referenced this pull request Jun 5, 2026
* feat: add modalities field to /v1/models API response

Add modalities (input/output types) to the extended model metadata in the
/v1/models endpoint. This allows clients to discover which input types
(text, image, audio, video, pdf) and output types (text) a model supports.

Changes:
- Add Modalities struct and field to OpenAIModel response
- Include modalities in extended fields list and model conversion logic
- Update API documentation (en/zh) with modalities field description
- Add and update test cases for modalities coverage

* fix: ensure modalities arrays serialize as [] instead of null for zero-value ModelCard

- Add nil checks for Modalities.Input and Modalities.Output, initializing
  empty slices when nil to prevent JSON null serialization
- Add test case for ModelCard with zero-value Modalities (nil slices)
- Fix godot linter error: add period to comment

* style: fix gci import grouping - testify before internal packages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants