Skip to content

feat(gen): add type-based oneOf/anyOf discrimination#1584

Merged
ernado merged 1 commit intoogen-go:mainfrom
lanej:feature/type-based-discriminator-inference
Nov 27, 2025
Merged

feat(gen): add type-based oneOf/anyOf discrimination#1584
ernado merged 1 commit intoogen-go:mainfrom
lanej:feature/type-based-discriminator-inference

Conversation

@lanej
Copy link
Contributor

@lanej lanej commented Nov 24, 2025

Summary

Implements type-based discrimination as the 4th discrimination strategy for oneOf/anyOf schemas, enabling ogen to automatically distinguish sum type variants that share field names but differ in types (e.g., {id: string} vs {id: integer}).

This completes ogen's discrimination strategy suite:

  1. ✅ Explicit discriminator (OpenAPI discriminator field)
  2. ✅ JSON primitive type discrimination (anyOf with different primitive types)
  3. ✅ Field name discrimination (variants with different field names)
  4. Field type discrimination (same field name, different JSON types) — NEW

Motivation

Fixes two long-standing issues:

Previously, ogen would fail to generate code for schemas like:

oneOf:
  - type: object
    properties:
      id: { type: string }
  - type: object
    properties:
      id: { type: integer }

Now these schemas work seamlessly with automatic type-based discrimination.

Implementation

Core Mechanism

Field Signature Tracking: Changed from tracking just field names to tracking (name, typeID) pairs:

  • {id: string} → signature {name: "id", typeID: "string"}
  • {id: integer} → signature {name: "id", typeID: "int"}
  • {id: array[string]} → signature {name: "id", typeID: "array[string]"}

Runtime Type Checking: Uses jx.Decoder.Next() (O(1) operation) to peek at JSON type:

case "id":
    typ := d.Next()  // O(1) peek, no data consumed
    switch typ {
    case jx.String:  // Handle string variant
    case jx.Number:  // Handle integer variant
    }

Type Mapping: jxTypeForFieldType() maps IR type IDs to jx JSON types:

  • "string", "enum_*"jx.String
  • "int", "int32", "int64"jx.Number
  • "array[*]"jx.Array
  • "object", custom types → jx.Object
  • Generic nullable types (NilString, OptNilInt) detected via naming patterns
  • Pointer-based nullables handled separately

Robustness Features

Enum Handling: Enum types correctly map to jx.String for discrimination
Nullable Types: Both generic (NilT, OptNilT) and pointer-based nullables detected
Array Validation: Rejects unsupported array element type discrimination with clear errors
Value-Based Validation: Rejects same field+type with different enum values (out of scope)
Nested Support: Works correctly with nested oneOf structures

Additional Work

Also restores sum type parameter support that exists in upstream:

  • URI encode/decode template handling for sum type path/query parameters
  • Empty schema handling (allowed in responses as any, rejected in requests)
  • Regression tests for sum_type_params.yaml and empty_response_body.json

Testing

New Test Specifications (8)

  1. type_discriminated_fields.json — Basic type-based discrimination
  2. nullable_type_discrimination.json — Nullable field handling
  3. optional_type_discrimination.json — Optional field handling
  4. array_object_type_discrimination.json — Array vs object discrimination
  5. hybrid_discrimination.json — 3+ variants with mixed strategies
  6. mixed_discrimination.json — Field name + type discrimination combined
  7. nested_type_discrimination.json — Nested oneOf structures
  8. field_name_discrimination.json — Baseline regression test

Integration Tests (5 new test suites)

Complete client/server implementations generated for all new test specifications, validating end-to-end functionality.

Real-World Validation

GitHub API: 734/740 operations (99.2% success)

  • Up from 728/740 (98.4%) before this change
  • 6 additional operations now supported
  • 6 operations still skipped (truly complex anyOf patterns requiring future work)

Other APIs: Telegram, GoTD, and K8s examples benefit from improved discrimination

Test Results

All tests passing:

  • TestGenerate/Positive/* (all 8 new specs)
  • TestGenerate/Examples/api.github.com
  • TestGenerate/Positive/sum_type_params
  • TestGenerate/Positive/empty_response_body
  • ✅ All existing tests remain passing

Performance Impact

Zero Performance Overhead:

  • jx.Decoder.Next() is O(1) peek operation (no data consumed, no allocation)
  • No runtime reflection
  • Code-generated type checking in switch statements
  • Static discrimination logic determined at code generation time

Breaking Changes

None. Fully backward compatible:

  • Existing discrimination strategies unchanged
  • New strategy only activates when previous strategies don't apply
  • All existing generated code remains valid
  • No API changes

Documentation

  • ✅ README.md updated with discrimination strategy documentation
  • ✅ Clear error messages for unsupported patterns
  • ✅ Comprehensive test specifications serve as usage examples

Files Changed

  • 139 files changed (+15,284 / -4,725)
  • Core implementation: gen/schema_gen_sum.go (+408 lines)
  • Template updates: gen/_template/json/encoders_sum.tmpl (+51 lines)
  • IR metadata: gen/ir/type.go (+18 lines)
  • URI templates: gen/_template/uri/*.tmpl (+33 lines)
  • Test specifications: 8 new files in _testdata/positive/
  • Integration tests: 5 new directories with generated code
  • Examples: Regenerated to demonstrate real-world improvements

Known Limitations

These patterns remain unsupported (by design):

Value-based discrimination: Same field name and type, different enum values

  • Example: {status: "active"} vs {status: "inactive"}
  • Would require value inspection, not just type checking
  • Clear error message: "value-based discriminator not supported"

Array element type discrimination: Different array element types

  • Example: array[string] vs array[integer]
  • JSON type is Array for both, can't discriminate without inspecting elements
  • Clear error message: "array element type discrimination not supported"

These are fundamental limitations that would require significantly more complex runtime logic. The current implementation draws a clean line at JSON type-level discrimination, which covers the vast majority of real-world use cases.

Review Notes

For Maintainers:

  • This PR consolidates 8 previous commits into 1 clean commit for easier review
  • Large diff is primarily generated code (examples + integration tests)
  • Core logic is concentrated in gen/schema_gen_sum.go (~400 lines)
  • Template changes are minimal and straightforward
  • All changes are well-tested with comprehensive test coverage

Diff Organization:

  • Core implementation: gen/*.go files
  • Templates: gen/_template/ directory
  • Test specs: _testdata/positive/ directory
  • Generated tests: internal/integration/test_*/ directories
  • Generated examples: examples/ex_*/ directories

Closes

@ernado
Copy link
Member

ernado commented Nov 25, 2025

Please check why github example is now not generated.

+12,235 −863,684

Is strange.

@lanej
Copy link
Contributor Author

lanej commented Nov 25, 2025

@ernado yes ... that is strange. i'll get into it.

@lanej lanej force-pushed the feature/type-based-discriminator-inference branch from b890848 to 53549e4 Compare November 25, 2025 14:45
@lanej lanej marked this pull request as draft November 25, 2025 14:49
@lanej lanej force-pushed the feature/type-based-discriminator-inference branch from 53549e4 to 4a97e67 Compare November 25, 2025 15:10
@lanej lanej marked this pull request as ready for review November 25, 2025 18:31
@lanej lanej marked this pull request as draft November 25, 2025 18:41
@lanej
Copy link
Contributor Author

lanej commented Nov 25, 2025

I have some other changes that snuck in here that I'm working through.

@lanej lanej force-pushed the feature/type-based-discriminator-inference branch from f760edd to f2d3d9f Compare November 25, 2025 18:55
@lanej lanej closed this Nov 25, 2025
@lanej lanej force-pushed the feature/type-based-discriminator-inference branch from f2d3d9f to 4b3acf1 Compare November 25, 2025 20:09
@lanej lanej reopened this Nov 25, 2025
@lanej lanej changed the title feat: add type-based oneOf discrimination feat(gen): add type-based oneOf/anyOf discrimination Nov 25, 2025
lanej added a commit to lanej/ogen that referenced this pull request Nov 26, 2025
Fixes two critical issues preventing broken code generation for type-based
discrimination:

## Issue 1: Enum types not mapped to jxType

**Problem:**
Enum type IDs like 'enum_ChecksCreateReqSum0Status' fell through to the
default case in jxTypeForFieldType(), returning empty string. This caused
empty FieldType values in template data, resulting in empty switch statements
in generated decoders.

**Symptoms:**
- ChecksCreateReq decoder had empty switch with only default case
- Runtime error: "unable to detect sum type variant"

**Fix:**
Added enum handling in jxTypeForFieldType() (gen/schema_gen_sum.go:62-64):
  case strings.HasPrefix(typeID, "enum_"):
      return "jx.String"

Enums serialize as strings in JSON, so they map to jx.String for runtime
type checking.

## Issue 2: Array element types indistinguishable at runtime

**Problem:**
Arrays with different element types (e.g., []ValidationErrorErrorsItem vs
[]string) both map to jx.Array. The runtime can't distinguish them using
d.Next() alone, which only checks JSON type, not array contents.

Template deduplication based on FieldType would leave only one variant,
causing incorrect discrimination.

**Symptoms:**
- OrgsUpdateUnprocessableEntity decoder generated with broken logic
- ProjectsCreateCardUnprocessableEntity decoder generated with broken logic
- Tests failed with "unable to detect sum type variant"

**Fix:**
Added validation in oneOf() to detect when all variants for a field have
the same jxType after mapping (gen/schema_gen_sum.go:864-906). Returns
clear error instead of generating broken code:
  "field 'errors' cannot discriminate variants (requires unsupported
   discrimination): [ValidationError: array[object] ValidationErrorSimple:
   array[string]]"

This allows schemas to be properly skipped with ignore_not_implemented
configuration rather than generating broken decoders.

## Test Updates

Updated gen_test.go to include "type-based discrimination with same jxType"
in the GitHub API ignore list, as this error now properly catches the
unsupported array discrimination cases.

## Verification

- ✅ All unit tests passing
- ✅ All example tests passing
- ✅ GitHub API: 728 operations generated (12 properly skipped)
- ✅ No broken decoders in generated code
- ✅ Clear error messages for unsupported cases

Related to PR ogen-go#1584
@lanej lanej force-pushed the feature/type-based-discriminator-inference branch 3 times, most recently from 2ccbeb9 to 06a0fee Compare November 26, 2025 14:05
Implement type-based discrimination as the 4th discrimination strategy
for oneOf/anyOf schemas, enabling automatic variant selection based on
JSON type signatures without explicit discriminator fields.

Core Implementation:
- Field signature tracking using (name, typeID) pairs
- Runtime type checking via O(1) jx.Decoder.Next() operation
- IR metadata for FieldType and Nullable detection
- jxTypeForFieldType() mapping IR types to jx JSON types

Robustness Enhancements:
- Enum type handling (maps enum_* to jx.String)
- Nullable type detection (generic NilT and pointer-based)
- Array element discrimination validation
- Value-based discriminator validation with clear error messages

Sum Type Parameter Support:
- Restore sum type URI encode/decode capabilities
- Handle empty schemas appropriately for requests vs responses
- Add parameter handling for sum types

Testing:
- 8 new test specifications covering type discrimination scenarios
- Regression tests for sum_type_params and empty_response_body
- Examples regeneration showing GitHub API improvement (734/740 ops)

Impact:
- Fixes ogen-go#1013 (nested sum type discrimination)
- Fixes ogen-go#1185 (unique fields incorrectly rejected)
- GitHub API: 99.2% operation success (up from 98.4%)
- Telegram/GoTD APIs benefit from improved discrimination

Files changed: 160+ files, +16,000 lines
@lanej lanej force-pushed the feature/type-based-discriminator-inference branch from 06a0fee to 40e0719 Compare November 26, 2025 14:28
@lanej lanej marked this pull request as ready for review November 26, 2025 14:52
@lanej
Copy link
Contributor Author

lanej commented Nov 26, 2025

@ernado a lot of trashing on my end, but I think I got it now.

@ernado ernado merged commit da23baf into ogen-go:main Nov 27, 2025
15 checks passed
@lanej lanej deleted the feature/type-based-discriminator-inference branch November 27, 2025 16:37
lanej added a commit to lanej/ogen that referenced this pull request Nov 27, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
lanej added a commit to lanej/ogen that referenced this pull request Nov 27, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 27, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
lanej added a commit to lanej/ogen that referenced this pull request Nov 30, 2025
Enable automatic discrimination between oneOf variants that have array
fields with different element types (e.g., string[] vs integer[] vs
boolean[]).

This extends the type-based discrimination added in PR ogen-go#1584 to support
cases where variants share the same field name with array types but
differ in their element types.

Implementation:
- Add ArrayElementType and ArrayElementTypeID fields to UniqueFieldVariant
- Add getArrayElementTypeInfo() to extract element type from array type IDs
- Update validation to allow discrimination when array element types differ
- Generate decoder code that peeks into arrays using d.Capture() and
  d.ArrIter() to check first element type without consuming

Supported cases:
- Basic primitives: string[] vs integer[] vs boolean[]
- Object vs primitive: object[] vs string[]
- Mixed: array type combined with unique field discrimination

Limitations (future work):
- Nested arrays (array[array[string]] vs array[array[integer]])
- Complex object arrays (User[] vs Product[] with same object type)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discriminator Inference: Not generating despite completely unique fields Support discriminator inference for oneOf with a nested sum type

2 participants