Skip to content

fix: fix normalizing model ids#498

Merged
DoganK01 merged 2 commits intomasterfrom
normalize-model-ids
Dec 16, 2025
Merged

fix: fix normalizing model ids#498
DoganK01 merged 2 commits intomasterfrom
normalize-model-ids

Conversation

@DoganK01
Copy link
Copy Markdown
Member

  • Generalize Agent process by changing output_mode from "native" to "auto"
  • output_tools support for calling tools for final response (for example structured output cases using Bedrock)
  • normalize model ids for proper model id setting up

@claude
Copy link
Copy Markdown

claude bot commented Dec 16, 2025

Pull Request Review: Fix normalizing model ids

Thank you for this PR! I've conducted a thorough review of the changes. Overall, this is a solid improvement that addresses important functionality, but there are several issues that should be addressed before merging.


🎯 Summary

This PR introduces three major changes:

  1. Model ID normalization - Allows simplified model IDs (e.g., bedrock/claude-3-5-sonnet:v2) to be normalized to full IDs
  2. Output mode generalization - Changes from native to auto mode for structured outputs
  3. Output tools support - Adds tool-based structured output for models that don't support native JSON schemas (like Bedrock)

✅ Code Quality & Best Practices

Positives:

  • Well-structured normalization logic: The model ID normalization system with caching is well-designed
  • Good separation of concerns: The _build_output_tools method is appropriately extracted
  • Documentation: The normalization functions include helpful docstrings with examples
  • Backwards compatibility: The changes maintain compatibility with existing code

Issues:

1. Missing Test Coverage ❗ CRITICAL

The PR adds significant new functionality but includes zero tests. This is concerning for:

  • Model ID normalization logic (complex regex patterns, edge cases)
  • Output tools functionality
  • The auto mode behavior
  • Bedrock-specific model ID handling

Recommendation: Add comprehensive unit tests covering:

# Test cases needed:
- normalize_model_id('bedrock/claude-3-5-sonnet:v2') → correct full ID
- normalize_model_id('openai/gpt-4o:latest') → correct handling
- normalize_model_id('ollama/llama3.1:8b') → pass-through for unknown providers
- Edge cases: malformed IDs, regional prefixes, version handling
- Output tool extraction from structured responses
- Filtering of output tools from regular tool calls

2. Parameter Order Change ⚠️ BREAKING

In agent.py:1828, the parameter order for _extract_output is changed from:

def _extract_output(self, response: "ModelResponse", task: "Task") -> Any:

to:

def _extract_output(self, task: "Task", response: "ModelResponse") -> Any:

Issue: This is a breaking change if anyone has subclassed Direct and overridden this method. While it's likely an internal method, the change should be:

  • Documented in the PR description as a breaking change
  • OR use keyword arguments to maintain compatibility
  • OR keep the original order

3. Potential Bug in Tool Filtering 🐛

In agent.py:1226-1230:

# If all tool calls are output tools, return response directly (structured output)
if tool_calls and not regular_tool_calls:
    return response

Issue: This returns the raw response without extracting the structured output. The caller then needs to handle this case. Looking at the _extract_output method, it does handle this, but the flow is confusing.

Recommendation: Add a comment explaining why we return early here, or refactor to make the flow clearer.

4. Regex Complexity Without Tests ⚠️

The model ID normalization code in models/__init__.py:1655-1690 contains complex regex patterns:

base_match = re.match(r'^(.+?)(?:-\d{8})?(-v\d+)?(?::\d+)?$', model_part)

Issue: Complex regex patterns are error-prone without comprehensive test coverage.

Recommendation:

  • Add unit tests with various Bedrock model formats
  • Consider documenting the expected format patterns in comments

5. Global Cache Management ⚠️

_KNOWN_PROVIDERS_CACHE = None
_MODEL_ALIAS_INDEX_CACHE = None

Issue: Global mutable state without a clear invalidation strategy could cause issues in:

  • Testing (tests might affect each other)
  • Dynamic model registry updates
  • Long-running processes with model updates

Recommendation:

  • Add a cache invalidation function for testing
  • Consider using @lru_cache decorator instead of manual caching
  • Document cache lifetime expectations

🔒 Security Concerns

Low Risk Issues:

  1. No input validation on model IDs: The normalize_model_id function doesn't validate input format

    • Could cause unexpected behavior with malformed inputs
    • Recommendation: Add basic validation or handle exceptions gracefully
  2. Regex DoS potential: Complex regex patterns could be vulnerable to ReDoS attacks with crafted inputs

    • Risk is low since model IDs are typically controlled
    • Recommendation: Consider timeout or simpler patterns

⚡ Performance Considerations

Positives:

  • Caching strategy for model alias index is good for performance
  • Early return for output tool filtering avoids unnecessary processing

Minor Concerns:

  1. Cache building on first use: _build_model_alias_index() processes all known models on first call

    • Could cause a slight delay on first model inference
    • Impact: Minimal, but worth noting in docs
  2. Regex matching in hot path: Model ID normalization happens on every model inference

    • The caching helps, but the lookup still happens every time
    • Recommendation: Profile to ensure this doesn't become a bottleneck

🔍 Logic & Correctness Issues

Issue 1: Version Precedence Logic Unclear

In _build_model_alias_index, lines 1718-1727:

# Prefer non-prefixed models (direct models) over regional inference profiles
if version not in alias_index[alias_key]:
    alias_index[alias_key][version] = full_id
elif not has_region_prefix:
    # Non-prefixed model takes priority - overwrite prefixed one
    alias_index[alias_key][version] = full_id

Question: What happens when multiple non-prefixed models exist with the same simplified name? The last one wins, but is this intentional?

Issue 2: Error Handling Missing

The _extract_output method (line 1892) has a bare except Exception: pass:

try:
    args = tool_call.args_as_dict()
    if hasattr(task.response_format, 'model_validate'):
        return task.response_format.model_validate(args)
    return args
except Exception:
    pass

Issue: Silent failure could hide important errors. At minimum, log the exception.


📝 CLI Changes Review

The changes to CLI commands (init command) are good:

  • ✅ Simplifies project structure (removes src/ directory)
  • ✅ Adds pip to dependencies list
  • ✅ Improves error handling for pip installation

Minor suggestion: The _ensure_pip_available function could benefit from more specific error messages about what went wrong.


🧪 Test Coverage Assessment

Current test coverage: 0% for new functionality ❌

Required tests:

  1. Model ID normalization with various formats
  2. Output tools creation and filtering
  3. Structured output extraction via tools
  4. CLI dependency installation error scenarios
  5. Edge cases: malformed model IDs, missing providers, etc.

📋 Recommendations

Must Fix (Blocking):

  1. Add comprehensive unit tests for model ID normalization
  2. Add tests for output tools functionality
  3. ⚠️ Document the parameter order change in _extract_output as breaking (or revert it)
  4. 🐛 Improve error handling in _extract_output - don't silently catch all exceptions

Should Fix:

  1. Add cache invalidation function for testing
  2. Add input validation to normalize_model_id
  3. Document the regional prefix preference logic
  4. Add logging for debugging model ID normalization

Nice to Have:

  1. Performance profiling of normalization in hot path
  2. Consider @lru_cache instead of manual caching
  3. Add examples in docstrings for the new auto output mode

🎓 Architectural Notes

The auto mode design is smart - it allows the framework to choose the best structured output method based on the model's capabilities. However:

  1. The relationship between output_mode='auto' and ModelProfile.default_structured_output_mode should be documented
  2. The PR description mentions this is for Bedrock, but the implementation is generic - clarify the use case

✨ Conclusion

This PR introduces valuable functionality, but should not be merged without tests. The model ID normalization logic is complex enough that it will be difficult to maintain and debug without proper test coverage.

Overall Assessment:

  • Code quality: B+ (well-structured but missing tests)
  • Functionality: A- (solves real problems)
  • Risk: Medium (breaking changes, complex regex, no tests)

Recommendation: Request changes - add tests and address the critical issues above.


Review performed by Claude Code with adherence to Upsonic coding standards from CLAUDE.md

@claude
Copy link
Copy Markdown

claude bot commented Dec 16, 2025

Code Review for PR #498: Fix normalizing model ids

Summary

This PR introduces three major improvements:

  1. Model ID normalization - Simplifies model identifiers (e.g., bedrock/claude-3-5-sonnet:v2 → full Bedrock ARN)
  2. Output mode generalization - Changes from native to auto for better structured output handling
  3. Output tools support - Adds tool-based structured output as a fallback when native structured output isn't supported

✅ Strengths

  1. Well-designed model normalization system

    • The alias index approach in src/upsonic/models/__init__.py:1604-1720 is elegant and maintainable
    • Caching (_KNOWN_PROVIDERS_CACHE, _MODEL_ALIAS_INDEX_CACHE) prevents redundant computation
    • Good handling of Bedrock's complex model ID format with regional prefixes
  2. Improved flexibility for structured output

    • The auto output mode allows models to choose the best structured output method
    • Fallback to tool-based output when native JSON schema isn't supported (important for Bedrock)
    • Clean separation of output tools from regular tools in _handle_model_response
  3. CLI improvements

    • _ensure_pip_available() is a good defensive measure
    • Moving from src/main.py to main.py simplifies project structure

🐛 Potential Bugs

CRITICAL: Breaking change in _extract_output signature

Location: src/upsonic/agent/agent.py:1871

# OLD signature
def _extract_output(self, response: "ModelResponse", task: "Task") -> Any:

# NEW signature  
def _extract_output(self, task: "Task", response: "ModelResponse") -> Any:

Impact: The parameter order was swapped, which WILL BREAK existing unit tests:

  • tests/unit_tests/agent/test_direct.py:416 - calls direct._extract_output(response, task)
  • tests/unit_tests/agent/test_direct.py:434 - calls direct._extract_output(response, task)
  • tests/unit_tests/agent/test_direct.py:452 - calls direct._extract_output(response, task)

All three test calls use the old signature (response, task) but the method now expects (task, response).

Recommendation: Either revert the parameter order OR update all test files. The new order (task, response) is arguably more intuitive (task comes first semantically), but this is a breaking change that needs test updates.

Missing @dataclass decorator

Location: src/upsonic/tools/base.py:191

The ToolDefinition class uses field(default_factory=...) but the @dataclass decorator is missing in the visible code. If this was removed, it would cause runtime errors.

Verify: Check that line ~191 has @dataclass decorator above the class definition.

Potential KeyError in normalize_model_id

Location: src/upsonic/models/__init__.py:1774-1791

if alias_key in alias_index:
    versions = alias_index[alias_key]
    if version in versions:
        return versions[version]
    elif 'latest' in versions:
        return versions['latest']

Issue: If a user specifies a known provider with an unknown model variant (e.g., bedrock/claude-9000:v1), it returns the original ID. This is correct behavior, but it's not immediately clear.

Recommendation: Add a comment explaining this fallback behavior for clarity.

⚡ Performance Considerations

  1. Cache initialization timing

    • The _build_model_alias_index() function uses get_args(KnownModelName.__value__) which could be slow on first call
    • Consider lazy initialization on first use rather than at module import time
    • Current caching is good, but document the one-time startup cost
  2. Regex compilation

    • In _build_model_alias_index(), regex patterns are compiled on every call
    • Consider pre-compiling patterns at module level:
    _BEDROCK_MODEL_PATTERN = re.compile(r'^(.+?)(?:-\d{8})?(-v\d+)?(?::\d+)?$')
    _DATE_SUFFIX_PATTERN = re.compile(r'^(.+?)(?:-\d{4}-\d{2}-\d{2})?(-\d{8})?$')
  3. Tool filtering in _handle_model_response

    • List comprehensions at src/upsonic/agent/agent.py:1226-1227 are efficient
    • No performance concerns here

🔒 Security Considerations

  1. Pip installation in CLI (src/upsonic/cli/commands/shared/dependencies.py:50)

    • The _ensure_pip_available() function runs ensurepip which is safe
    • Good defensive programming against missing pip in venvs
    • No security concerns
  2. Model ID normalization

    • No user input validation issues - the function safely handles unexpected formats
    • Returns original ID for unknown providers (fail-safe behavior)
  3. Tool execution filtering

    • Properly filters output tools from execution (line 1226-1227)
    • No risk of unintended tool execution

🧪 Test Coverage

MISSING TEST COVERAGE:

  1. Model ID normalization - No tests found for:

    • normalize_model_id()
    • _build_model_alias_index()
    • _get_cached_known_providers()

    Recommended tests:

    def test_normalize_bedrock_model_ids():
        assert normalize_model_id("bedrock/claude-3-5-sonnet:v2") == "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"
        assert normalize_model_id("bedrock/claude-3-5-sonnet:latest") == "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"
    
    def test_normalize_unknown_provider():
        assert normalize_model_id("ollama/llama3.1:8b") == "ollama/llama3.1:8b"  # Pass-through
  2. Output tools - No tests for:

    • _build_output_tools()
    • Tool-based structured output extraction in _extract_output()
    • Filtering of output tools in _handle_model_response()
  3. CLI pip installation - No tests for:

    • _ensure_pip_available()

BROKEN TESTS (must fix):

The signature change in _extract_output breaks 3 existing tests:

  • tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_text
  • tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_structured
  • tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_invalid_json

Fix required: Update all calls from direct._extract_output(response, task) to direct._extract_output(task, response)

📋 Additional Recommendations

  1. Documentation

    • Add docstring examples to normalize_model_id() showing supported formats
    • Document the output_mode='auto' behavior and when it chooses tool vs native
  2. Version bump

    • Version bump from 0.69.0 → 0.69.1 is appropriate for a bug fix
    • Consider if the _extract_output signature change warrants 0.70.0 (breaking change)
  3. Error handling

    • Consider adding validation in _build_output_tools() for edge cases
    • Add try-except around task.response_format.model_validate(args) at line 1893
  4. Code organization

    • The 200+ line normalize_model_id and helper functions could be moved to a separate module like src/upsonic/models/normalization.py for better organization

🎯 Action Items

Must fix before merge:

  1. ✅ Update test calls in test_direct.py to use new _extract_output(task, response) signature
  2. ✅ Verify @dataclass decorator exists on ToolDefinition

Recommended for follow-up:

  1. Add unit tests for normalize_model_id() with various providers
  2. Add tests for output tools functionality
  3. Pre-compile regex patterns for performance
  4. Add docstring examples for model ID normalization

Overall Assessment

Quality: Good ⭐⭐⭐⭐☆
Risk Level: Medium (due to breaking test signature change)
Recommendation: Fix broken tests, then APPROVE

The core functionality is well-designed and solves real problems (Bedrock model IDs, structured output flexibility). The main issue is the breaking change in _extract_output that needs test updates.

@DoganK01 DoganK01 merged commit 0b6a9f5 into master Dec 16, 2025
4 checks passed
@DoganK01 DoganK01 deleted the normalize-model-ids branch December 16, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant