Support for Nested Pydantic Models in Schemas by billpugh · Pull Request #107 · simonw/llm-gemini

billpugh · 2025-10-11T00:24:38Z

Fix: Support for Nested Pydantic Models in Schemas

Problem

When using Pydantic schemas with nested models (i.e., models that reference other models), the Gemini API would reject requests with errors like:

Invalid JSON payload received. Unknown name "$defs" at 'generation_config.response_schema': Cannot find field.
Invalid JSON payload received. Unknown name "$ref" at 'generation_config.response_schema.properties[0].value.items': Cannot find field.

This affected several common Pydantic patterns:

Direct model references: A model with a field that is another model

class Address(BaseModel):
    street: str
    city: str

class Person(BaseModel):
    name: str
    address: Address  # ❌ Would fail

Lists of models: A model containing a list of another model

class Dog(BaseModel):
    name: str

class Dogs(BaseModel):
    dogs: List[Dog]  # ❌ Would fail

Optional model fields: A model with an optional reference to another model

class Company(BaseModel):
    name: str

class Person(BaseModel):
    name: str
    employer: Optional[Company]  # ❌ Would fail

Deeply nested compositions: Multiple levels of model references

class Item(BaseModel):
    name: str

class Order(BaseModel):
    items: List[Item]

class Customer(BaseModel):
    orders: List[Order]  # ❌ Would fail

Root Cause

When Pydantic generates JSON schemas for nested models, it uses JSON Schema's $defs (definitions) and $ref (references) features for code reuse:

{
  "properties": {
    "dogs": {
      "items": {"$ref": "#/$defs/Dog"}
    }
  },
  "$defs": {
    "Dog": {
      "properties": {
        "name": {"type": "string"}
      }
    }
  }
}

The Gemini API does not support $defs or $ref - it requires schemas to be fully inlined.

Solution

Modified the cleanup_schema function in llm_gemini.py to resolve $ref references before sending the schema to Gemini:

Added a new helper function _resolve_refs() that recursively finds and replaces $ref references with their actual definitions
Updated cleanup_schema() to extract the $defs section (if present) and resolve all references using _resolve_refs()
Continue with the existing cleanup logic to remove other unsupported keys

Code Changes

File: llm_gemini.py (lines 206-249)

Added _resolve_refs() helper function:

def _resolve_refs(schema, defs):
    """Recursively resolve $ref references in schema using definitions."""
    if isinstance(schema, dict):
        if "$ref" in schema:
            ref_path = schema.pop("$ref")
            if ref_path.startswith("#/$defs/"):
                def_name = ref_path.split("/")[-1]
                if def_name in defs:
                    schema.update(copy.deepcopy(defs[def_name]))

        for value in schema.values():
            _resolve_refs(value, defs)
    elif isinstance(schema, list):
        for item in schema:
            _resolve_refs(item, defs)

Updated cleanup_schema() to use it:

def cleanup_schema(schema, in_properties=False):
    "Gemini supports only a subset of JSON schema"
    keys_to_remove = ("$schema", "additionalProperties", "title")

    # First pass: resolve $ref references using $defs
    if isinstance(schema, dict) and "$defs" in schema:
        defs = schema.pop("$defs")
        _resolve_refs(schema, defs)

    # Continue with existing cleanup logic...

The fix handles arbitrary nesting depth and uses copy.deepcopy() to avoid mutating the original definitions.

Tests Added

Unit Tests (all passing)

Added test_cleanup_schema_with_refs - a parametrized test with 4 test cases validating $ref resolution for each pattern:

Direct model reference (Person with Address)
List of models (Dogs with List[Dog])
Optional model field (Person with Optional[Company])
Nested composition (Customer → List[Order] → List[Item])

Integration Tests (skipped in CI, pass with real API key)

Added 4 integration tests using real Pydantic models with the Gemini API:

test_nested_model_direct_reference - Pattern 1
test_prompt_with_multiple_dogs - Pattern 2 (list of models)
test_nested_model_optional - Pattern 3
test_nested_model_deep_composition - Pattern 4

These tests validate that the schemas work end-to-end with the Gemini API.

VCR Cassette Recording Issue

Problem Encountered

When adding the new integration tests, we encountered an issue with pytest-recording not creating VCR cassettes for the new tests. The tests pass when run with a real API key (PYTEST_GEMINI_API_KEY set), but the cassettes are not being recorded to disk.

Attempted Solutions

Tried various record modes: --record-mode=once, --record-mode=new_episodes, --record-mode=rewrite
Verified VCR configuration in tests/conftest.py
Checked cassette file paths and permissions
Created empty cassette files manually (they were deleted during test runs)

Current Status

The unit tests for $ref resolution all pass ✅
The integration tests pass when run with a real API key ✅
The integration tests are marked with @pytest.mark.skip for CI until cassettes can be recorded
All existing tests continue to pass ✅

Workaround for Testing

Developers can verify the integration tests work by running:

PYTEST_GEMINI_API_KEY="$(llm keys get gemini)" pytest tests/test_gemini.py::test_nested_model_direct_reference -v

This appears to be a pytest-recording environment issue unrelated to the schema fix itself. The important validation is that:

The unit tests prove the schema transformation is correct
The tests pass with real API calls when run manually

Test Results

16 passed, 5 skipped in 0.57s

All existing tests pass
All new unit tests for schema transformation pass
Integration tests are skipped in CI but work with real API key

Breaking Changes

None. This is a backward-compatible bug fix that enables previously broken functionality.

Related Issues

Fixes issue with nested Pydantic models in schemas.

…erence other models), the Gemini API would reject requests with errors like: ``` Invalid JSON payload received. Unknown name "$defs" at 'generation_config.response_schema': Cannot find field. Invalid JSON payload received. Unknown name "$ref" at 'generation_config.response_schema.properties[0].value.items': Cannot find field. ``` Added `test_cleanup_schema_with_refs` - a parametrized test with 4 test cases validating `$ref` resolution for each pattern: 1. Direct model reference (Person with Address) 2. List of models (Dogs with List[Dog]) 3. Optional model field (Person with Optional[Company]) 4. Nested composition (Customer → List[Order] → List[Item]) Added 4 integration tests using real Pydantic models with the Gemini API: 1. `test_nested_model_direct_reference` - Pattern 1 2. `test_prompt_with_multiple_dogs` - Pattern 2 (list of models) 3. `test_nested_model_optional` - Pattern 3 4. `test_nested_model_deep_composition` - Pattern 4 These tests validate that the schemas work end-to-end with the Gemini API.

(i.e., models that reference other models). To do this, I modified the `cleanup_schema` function in `llm_gemini.py` to resolve `$ref` references before sending the schema to Gemini: 1. Added a new helper function `_resolve_refs()` that recursively finds and replaces `$ref` references with their actual definitions 2. Updated `cleanup_schema()` to extract the `$defs` section (if present) and resolve all references using `_resolve_refs()` 3. Continue with the existing cleanup logic to remove other unsupported keys

simonw · 2025-10-11T05:15:09Z

This is a great patch, thank you.

simonw · 2025-10-11T05:19:40Z

I'm going to land it and then add the VCR test.

simonw · 2025-10-11T05:28:44Z

My own demo of this fix. Before applying the change:

python -c 'import llm
from pydantic import BaseModel

class Dog(BaseModel):
    name: str

class Dogs(BaseModel):
    dogs: list[Dog]

model = llm.get_model("gemini-2.5-flash")

print(model.prompt("invent 3 dogs", schema=Dogs))
'

Outputs:

  File "/Users/simon/Dropbox/Development/llm-gemini/llm_gemini.py", line 549, in execute
    raise llm.ModelError(event["error"]["message"])
llm.errors.ModelError: Invalid JSON payload received. Unknown name "$defs" at 'generation_config.response_schema': Cannot find field.
Invalid JSON payload received. Unknown name "$ref" at 'generation_config.response_schema.properties[0].value.items': Cannot find field.

After applying the change:

{"dogs":[{"name":"Buddy"},{"name":"Max"},{"name":"Bella"}]}

Refs #107

werdnum · 2025-10-13T01:06:52Z

It looks like this fix needs to be applied to tool schemas too.

The fix from PR simonw#107 resolved $ref references in response schemas but tool schemas were still passing input_schema directly without cleanup. This applies cleanup_schema() to tool.input_schema, ensuring nested Pydantic models work correctly in tool parameters. Adds test_tools_with_nested_pydantic_models() to verify that tools with nested models (PersonInput containing Address) properly resolve $ref references and work with the Gemini API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

billpugh · 2025-10-13T13:54:53Z

I've got travel coming up, so it might take me a week to get to this. If someone else wants to jump on it, go ahead.

The fix from PR #107 resolved $ref references in response schemas but tool schemas were still passing input_schema directly without cleanup. This applies cleanup_schema() to tool.input_schema, ensuring nested Pydantic models work correctly in tool parameters. Adds test_tools_with_nested_pydantic_models() to verify that tools with nested models (PersonInput containing Address) properly resolve $ref references and work with the Gemini API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Andrew Garrett <andrewgarrett@google,com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Simon Willison <swillison@gmail.com>

Refs #107, #108, #109, #110, #112, #113, #114

billpugh added 2 commits October 10, 2025 20:14

simonw merged commit 0d7fead into simonw:main Oct 11, 2025
5 checks passed

simonw added a commit that referenced this pull request Oct 11, 2025

VCR tests for #107, plus ran Black

d2e8dae

simonw added a commit that referenced this pull request Oct 11, 2025

Release 0.26.1

7398500

Refs #107

werdnum mentioned this pull request Oct 13, 2025

Apply PR #107 fix to tool schemas for nested Pydantic models #110

Merged

3 tasks

simonw added a commit that referenced this pull request Nov 18, 2025

Release 0.27

27bd09e

Refs #107, #108, #109, #110, #112, #113, #114

NickCrews mentioned this pull request Nov 20, 2025

Inline schemas that use $defs using jsonref #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Nested Pydantic Models in Schemas#107

Support for Nested Pydantic Models in Schemas#107
simonw merged 2 commits intosimonw:mainfrom
billpugh:main

billpugh commented Oct 11, 2025

Uh oh!

simonw commented Oct 11, 2025

Uh oh!

simonw commented Oct 11, 2025

Uh oh!

Uh oh!

simonw commented Oct 11, 2025 •

edited

Loading

Uh oh!

werdnum commented Oct 13, 2025

Uh oh!

billpugh commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

billpugh commented Oct 11, 2025

Fix: Support for Nested Pydantic Models in Schemas

Problem

Root Cause

Solution

Code Changes

Tests Added

Unit Tests (all passing)

Integration Tests (skipped in CI, pass with real API key)

VCR Cassette Recording Issue

Problem Encountered

Attempted Solutions

Current Status

Workaround for Testing

Test Results

Breaking Changes

Related Issues

Uh oh!

simonw commented Oct 11, 2025

Uh oh!

simonw commented Oct 11, 2025

Uh oh!

Uh oh!

simonw commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

werdnum commented Oct 13, 2025

Uh oh!

billpugh commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonw commented Oct 11, 2025 •

edited

Loading