feat: add modalities field to /v1/models API response#1772
Conversation
Add modalities (input/output types) to the extended model metadata in the /v1/models endpoint. This allows clients to discover which input types (text, image, audio, video, pdf) and output types (text) a model supports. Changes: - Add Modalities struct and field to OpenAIModel response - Include modalities in extended fields list and model conversion logic - Update API documentation (en/zh) with modalities field description - Add and update test cases for modalities coverage
Greptile SummaryThis PR adds a
Confidence Score: 5/5Safe to merge — the change is additive, all nil-slice edge cases are handled, and the new test verifies zero-value behaviour end-to-end. The conversion code explicitly guards every nil Input/Output slice before constructing the response struct, so the null-array problem that existed for zero-value ModelCardModalities is fully addressed. The new TestOpenAIHandlers_RetrieveModel_ReturnsEmptyModalitiesWhenZeroValue test validates this path end-to-end through the full HTTP handler stack. The change is purely additive (new optional field, omitted by default), so existing callers that do not request modalities are unaffected. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant OpenAIHandlers
participant parseOpenAIModelInclude
participant convertModelToOpenAIExtended
participant ModelCard
Client->>OpenAIHandlers: "GET /v1/models?include=modalities"
OpenAIHandlers->>parseOpenAIModelInclude: "includeParam=modalities, defaultIncludeAll=false"
parseOpenAIModelInclude-->>OpenAIHandlers: "include={modalities:true}, needFullData=true"
OpenAIHandlers->>convertModelToOpenAIExtended: model, include
convertModelToOpenAIExtended->>ModelCard: "m.ModelCard != nil?"
alt ModelCard present
ModelCard-->>convertModelToOpenAIExtended: Modalities.Input, Modalities.Output
Note over convertModelToOpenAIExtended: nil slice to []string{}
convertModelToOpenAIExtended-->>OpenAIHandlers: OpenAIModel with Modalities Input Output
else ModelCard nil
convertModelToOpenAIExtended-->>OpenAIHandlers: OpenAIModel Modalities nil omitted
end
OpenAIHandlers-->>Client: JSON with modalities input output arrays
Reviews (3): Last reviewed commit: "style: fix gci import grouping - testify..." | Re-trigger Greptile |
…o-value ModelCard - Add nil checks for Modalities.Input and Modalities.Output, initializing empty slices when nil to prevent JSON null serialization - Add test case for ModelCard with zero-value Modalities (nil slices) - Fix godot linter error: add period to comment
* feat: add modalities field to /v1/models API response Add modalities (input/output types) to the extended model metadata in the /v1/models endpoint. This allows clients to discover which input types (text, image, audio, video, pdf) and output types (text) a model supports. Changes: - Add Modalities struct and field to OpenAIModel response - Include modalities in extended fields list and model conversion logic - Update API documentation (en/zh) with modalities field description - Add and update test cases for modalities coverage * fix: ensure modalities arrays serialize as [] instead of null for zero-value ModelCard - Add nil checks for Modalities.Input and Modalities.Output, initializing empty slices when nil to prevent JSON null serialization - Add test case for ModelCard with zero-value Modalities (nil slices) - Fix godot linter error: add period to comment * style: fix gci import grouping - testify before internal packages
Summary
Add
modalitiesfield to the/v1/modelsAPI extended response, returning the supported input/output types for each model (e.g.text,image,audio,video,pdf).Background
Coding Agent like PI need to understand model capability boundaries before making API calls. While the
/v1/modelsendpoint already returnscapabilities(vision, tool_call, reasoning) andpricingmetadata, it lacks information about the input/output modality types a model supports.In typical agent scenarios, the Coding Agent must dynamically select the appropriate model based on the user's content type:
imagein itsinputmodalitiesimagein itsoutputmodalitiesWithout
modalities, Coding Agent can only hardcode model capabilities or rely on trial-and-error, making dynamic routing impossible.API Response
{ "id": "gpt-4o", "object": "model", "modalities": { "input": ["text", "image", "audio"], "output": ["text"] } }Behavior
The field follows the same rules as existing extended fields (
capabilities,pricing, etc.):default_model_api_include_all=false)?include=modalities?include=alldefault_model_api_include_all=trueData source: Read from the existing
ModelCard.Modalitiesfield — no database schema changes required.Files Changed
internal/server/api/openai.go— AddModalitiesstruct, extend field parsing and conversion logicinternal/server/api/openai_retrieve_test.go— Add modalities assertions to existing testsdocs/zh/api-reference/openai-api.md— Update Chinese documentationdocs/en/api-reference/openai-api.md— Update English documentation