fix: fix normalizing model ids by DoganK01 · Pull Request #498 · Upsonic/Upsonic

DoganK01 · 2025-12-16T08:59:15Z

Generalize Agent process by changing output_mode from "native" to "auto"
output_tools support for calling tools for final response (for example structured output cases using Bedrock)
normalize model ids for proper model id setting up

claude · 2025-12-16T09:02:00Z

Pull Request Review: Fix normalizing model ids

Thank you for this PR! I've conducted a thorough review of the changes. Overall, this is a solid improvement that addresses important functionality, but there are several issues that should be addressed before merging.

🎯 Summary

This PR introduces three major changes:

Model ID normalization - Allows simplified model IDs (e.g., bedrock/claude-3-5-sonnet:v2) to be normalized to full IDs
Output mode generalization - Changes from native to auto mode for structured outputs
Output tools support - Adds tool-based structured output for models that don't support native JSON schemas (like Bedrock)

✅ Code Quality & Best Practices

Positives:

Well-structured normalization logic: The model ID normalization system with caching is well-designed
Good separation of concerns: The _build_output_tools method is appropriately extracted
Documentation: The normalization functions include helpful docstrings with examples
Backwards compatibility: The changes maintain compatibility with existing code

Issues:

1. Missing Test Coverage ❗ CRITICAL

The PR adds significant new functionality but includes zero tests. This is concerning for:

Model ID normalization logic (complex regex patterns, edge cases)
Output tools functionality
The auto mode behavior
Bedrock-specific model ID handling

Recommendation: Add comprehensive unit tests covering:

# Test cases needed:
- normalize_model_id('bedrock/claude-3-5-sonnet:v2') → correct full ID
- normalize_model_id('openai/gpt-4o:latest') → correct handling
- normalize_model_id('ollama/llama3.1:8b') → pass-through for unknown providers
- Edge cases: malformed IDs, regional prefixes, version handling
- Output tool extraction from structured responses
- Filtering of output tools from regular tool calls

2. Parameter Order Change ⚠️ BREAKING

In agent.py:1828, the parameter order for _extract_output is changed from:

def _extract_output(self, response: "ModelResponse", task: "Task") -> Any:

to:

def _extract_output(self, task: "Task", response: "ModelResponse") -> Any:

Issue: This is a breaking change if anyone has subclassed Direct and overridden this method. While it's likely an internal method, the change should be:

Documented in the PR description as a breaking change
OR use keyword arguments to maintain compatibility
OR keep the original order

3. Potential Bug in Tool Filtering 🐛

In agent.py:1226-1230:

# If all tool calls are output tools, return response directly (structured output)
if tool_calls and not regular_tool_calls:
    return response

Issue: This returns the raw response without extracting the structured output. The caller then needs to handle this case. Looking at the _extract_output method, it does handle this, but the flow is confusing.

Recommendation: Add a comment explaining why we return early here, or refactor to make the flow clearer.

4. Regex Complexity Without Tests ⚠️

The model ID normalization code in models/__init__.py:1655-1690 contains complex regex patterns:

base_match = re.match(r'^(.+?)(?:-\d{8})?(-v\d+)?(?::\d+)?$', model_part)

Issue: Complex regex patterns are error-prone without comprehensive test coverage.

Recommendation:

Add unit tests with various Bedrock model formats
Consider documenting the expected format patterns in comments

5. Global Cache Management ⚠️

_KNOWN_PROVIDERS_CACHE = None
_MODEL_ALIAS_INDEX_CACHE = None

Issue: Global mutable state without a clear invalidation strategy could cause issues in:

Testing (tests might affect each other)
Dynamic model registry updates
Long-running processes with model updates

Recommendation:

Add a cache invalidation function for testing
Consider using @lru_cache decorator instead of manual caching
Document cache lifetime expectations

🔒 Security Concerns

Low Risk Issues:

No input validation on model IDs: The normalize_model_id function doesn't validate input format
- Could cause unexpected behavior with malformed inputs
- Recommendation: Add basic validation or handle exceptions gracefully
Regex DoS potential: Complex regex patterns could be vulnerable to ReDoS attacks with crafted inputs
- Risk is low since model IDs are typically controlled
- Recommendation: Consider timeout or simpler patterns

⚡ Performance Considerations

Positives:

Caching strategy for model alias index is good for performance
Early return for output tool filtering avoids unnecessary processing

Minor Concerns:

Cache building on first use: _build_model_alias_index() processes all known models on first call
- Could cause a slight delay on first model inference
- Impact: Minimal, but worth noting in docs
Regex matching in hot path: Model ID normalization happens on every model inference
- The caching helps, but the lookup still happens every time
- Recommendation: Profile to ensure this doesn't become a bottleneck

🔍 Logic & Correctness Issues

Issue 1: Version Precedence Logic Unclear

In _build_model_alias_index, lines 1718-1727:

# Prefer non-prefixed models (direct models) over regional inference profiles
if version not in alias_index[alias_key]:
    alias_index[alias_key][version] = full_id
elif not has_region_prefix:
    # Non-prefixed model takes priority - overwrite prefixed one
    alias_index[alias_key][version] = full_id

Question: What happens when multiple non-prefixed models exist with the same simplified name? The last one wins, but is this intentional?

Issue 2: Error Handling Missing

The _extract_output method (line 1892) has a bare except Exception: pass:

try:
    args = tool_call.args_as_dict()
    if hasattr(task.response_format, 'model_validate'):
        return task.response_format.model_validate(args)
    return args
except Exception:
    pass

Issue: Silent failure could hide important errors. At minimum, log the exception.

📝 CLI Changes Review

The changes to CLI commands (init command) are good:

✅ Simplifies project structure (removes src/ directory)
✅ Adds pip to dependencies list
✅ Improves error handling for pip installation

Minor suggestion: The _ensure_pip_available function could benefit from more specific error messages about what went wrong.

🧪 Test Coverage Assessment

Current test coverage: 0% for new functionality ❌

Required tests:

Model ID normalization with various formats
Output tools creation and filtering
Structured output extraction via tools
CLI dependency installation error scenarios
Edge cases: malformed model IDs, missing providers, etc.

📋 Recommendations

Must Fix (Blocking):

✅ Add comprehensive unit tests for model ID normalization
✅ Add tests for output tools functionality
⚠️ Document the parameter order change in _extract_output as breaking (or revert it)
🐛 Improve error handling in _extract_output - don't silently catch all exceptions

Should Fix:

Add cache invalidation function for testing
Add input validation to normalize_model_id
Document the regional prefix preference logic
Add logging for debugging model ID normalization

Nice to Have:

Performance profiling of normalization in hot path
Consider @lru_cache instead of manual caching
Add examples in docstrings for the new auto output mode

🎓 Architectural Notes

The auto mode design is smart - it allows the framework to choose the best structured output method based on the model's capabilities. However:

The relationship between output_mode='auto' and ModelProfile.default_structured_output_mode should be documented
The PR description mentions this is for Bedrock, but the implementation is generic - clarify the use case

✨ Conclusion

This PR introduces valuable functionality, but should not be merged without tests. The model ID normalization logic is complex enough that it will be difficult to maintain and debug without proper test coverage.

Overall Assessment:

Code quality: B+ (well-structured but missing tests)
Functionality: A- (solves real problems)
Risk: Medium (breaking changes, complex regex, no tests)

Recommendation: Request changes - add tests and address the critical issues above.

Review performed by Claude Code with adherence to Upsonic coding standards from CLAUDE.md

claude · 2025-12-16T09:09:25Z

Code Review for PR #498: Fix normalizing model ids

Summary

This PR introduces three major improvements:

Model ID normalization - Simplifies model identifiers (e.g., bedrock/claude-3-5-sonnet:v2 → full Bedrock ARN)
Output mode generalization - Changes from native to auto for better structured output handling
Output tools support - Adds tool-based structured output as a fallback when native structured output isn't supported

✅ Strengths

Well-designed model normalization system
- The alias index approach in src/upsonic/models/__init__.py:1604-1720 is elegant and maintainable
- Caching (_KNOWN_PROVIDERS_CACHE, _MODEL_ALIAS_INDEX_CACHE) prevents redundant computation
- Good handling of Bedrock's complex model ID format with regional prefixes
Improved flexibility for structured output
- The auto output mode allows models to choose the best structured output method
- Fallback to tool-based output when native JSON schema isn't supported (important for Bedrock)
- Clean separation of output tools from regular tools in _handle_model_response
CLI improvements
- _ensure_pip_available() is a good defensive measure
- Moving from src/main.py to main.py simplifies project structure

🐛 Potential Bugs

CRITICAL: Breaking change in `_extract_output` signature

Location: src/upsonic/agent/agent.py:1871

# OLD signature
def _extract_output(self, response: "ModelResponse", task: "Task") -> Any:

# NEW signature  
def _extract_output(self, task: "Task", response: "ModelResponse") -> Any:

Impact: The parameter order was swapped, which WILL BREAK existing unit tests:

tests/unit_tests/agent/test_direct.py:416 - calls direct._extract_output(response, task)
tests/unit_tests/agent/test_direct.py:434 - calls direct._extract_output(response, task)
tests/unit_tests/agent/test_direct.py:452 - calls direct._extract_output(response, task)

All three test calls use the old signature (response, task) but the method now expects (task, response).

Recommendation: Either revert the parameter order OR update all test files. The new order (task, response) is arguably more intuitive (task comes first semantically), but this is a breaking change that needs test updates.

Missing `@dataclass` decorator

Location: src/upsonic/tools/base.py:191

The ToolDefinition class uses field(default_factory=...) but the @dataclass decorator is missing in the visible code. If this was removed, it would cause runtime errors.

Verify: Check that line ~191 has @dataclass decorator above the class definition.

Potential KeyError in normalize_model_id

Location: src/upsonic/models/__init__.py:1774-1791

if alias_key in alias_index:
    versions = alias_index[alias_key]
    if version in versions:
        return versions[version]
    elif 'latest' in versions:
        return versions['latest']

Issue: If a user specifies a known provider with an unknown model variant (e.g., bedrock/claude-9000:v1), it returns the original ID. This is correct behavior, but it's not immediately clear.

Recommendation: Add a comment explaining this fallback behavior for clarity.

⚡ Performance Considerations

Cache initialization timing
- The _build_model_alias_index() function uses get_args(KnownModelName.__value__) which could be slow on first call
- Consider lazy initialization on first use rather than at module import time
- Current caching is good, but document the one-time startup cost

Regex compilation

In _build_model_alias_index(), regex patterns are compiled on every call
Consider pre-compiling patterns at module level:

_BEDROCK_MODEL_PATTERN = re.compile(r'^(.+?)(?:-\d{8})?(-v\d+)?(?::\d+)?$')
_DATE_SUFFIX_PATTERN = re.compile(r'^(.+?)(?:-\d{4}-\d{2}-\d{2})?(-\d{8})?$')

Tool filtering in _handle_model_response
- List comprehensions at src/upsonic/agent/agent.py:1226-1227 are efficient
- No performance concerns here

🔒 Security Considerations

Pip installation in CLI (src/upsonic/cli/commands/shared/dependencies.py:50)
- The _ensure_pip_available() function runs ensurepip which is safe
- Good defensive programming against missing pip in venvs
- No security concerns
Model ID normalization
- No user input validation issues - the function safely handles unexpected formats
- Returns original ID for unknown providers (fail-safe behavior)
Tool execution filtering
- Properly filters output tools from execution (line 1226-1227)
- No risk of unintended tool execution

🧪 Test Coverage

MISSING TEST COVERAGE:

Model ID normalization - No tests found for:

normalize_model_id()
_build_model_alias_index()
_get_cached_known_providers()

Recommended tests:

def test_normalize_bedrock_model_ids():
    assert normalize_model_id("bedrock/claude-3-5-sonnet:v2") == "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"
    assert normalize_model_id("bedrock/claude-3-5-sonnet:latest") == "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"

def test_normalize_unknown_provider():
    assert normalize_model_id("ollama/llama3.1:8b") == "ollama/llama3.1:8b"  # Pass-through

Output tools - No tests for:
- _build_output_tools()
- Tool-based structured output extraction in _extract_output()
- Filtering of output tools in _handle_model_response()
CLI pip installation - No tests for:
- _ensure_pip_available()

BROKEN TESTS (must fix):

The signature change in _extract_output breaks 3 existing tests:

tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_text
tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_structured
tests/unit_tests/agent/test_direct.py::TestDirectExtractOutput::test_direct_extract_output_invalid_json

Fix required: Update all calls from direct._extract_output(response, task) to direct._extract_output(task, response)

📋 Additional Recommendations

Documentation
- Add docstring examples to normalize_model_id() showing supported formats
- Document the output_mode='auto' behavior and when it chooses tool vs native
Version bump
- Version bump from 0.69.0 → 0.69.1 is appropriate for a bug fix
- Consider if the _extract_output signature change warrants 0.70.0 (breaking change)
Error handling
- Consider adding validation in _build_output_tools() for edge cases
- Add try-except around task.response_format.model_validate(args) at line 1893
Code organization
- The 200+ line normalize_model_id and helper functions could be moved to a separate module like src/upsonic/models/normalization.py for better organization

🎯 Action Items

Must fix before merge:

✅ Update test calls in test_direct.py to use new _extract_output(task, response) signature
✅ Verify @dataclass decorator exists on ToolDefinition

Recommended for follow-up:

Add unit tests for normalize_model_id() with various providers
Add tests for output tools functionality
Pre-compile regex patterns for performance
Add docstring examples for model ID normalization

Overall Assessment

Quality: Good ⭐⭐⭐⭐☆
Risk Level: Medium (due to breaking test signature change)
Recommendation: Fix broken tests, then APPROVE

The core functionality is well-designed and solves real problems (Bedrock model IDs, structured output flexibility). The main issue is the breaking change in _extract_output that needs test updates.

fix: fix normalizing model ids

cd8b15a

fix: fix function calling parameters

a60d349

DoganK01 merged commit 0b6a9f5 into master Dec 16, 2025
4 checks passed

DoganK01 deleted the normalize-model-ids branch December 16, 2025 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix normalizing model ids#498

fix: fix normalizing model ids#498
DoganK01 merged 2 commits intomasterfrom
normalize-model-ids

DoganK01 commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DoganK01 commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025

Pull Request Review: Fix normalizing model ids

🎯 Summary

✅ Code Quality & Best Practices

Positives:

Issues:

1. Missing Test Coverage ❗ CRITICAL

2. Parameter Order Change ⚠️ BREAKING

3. Potential Bug in Tool Filtering 🐛

4. Regex Complexity Without Tests ⚠️

5. Global Cache Management ⚠️

🔒 Security Concerns

Low Risk Issues:

⚡ Performance Considerations

Positives:

Minor Concerns:

🔍 Logic & Correctness Issues

Issue 1: Version Precedence Logic Unclear

Issue 2: Error Handling Missing

📝 CLI Changes Review

🧪 Test Coverage Assessment

📋 Recommendations

Must Fix (Blocking):

Should Fix:

Nice to Have:

🎓 Architectural Notes

✨ Conclusion

Uh oh!

claude bot commented Dec 16, 2025

Code Review for PR #498: Fix normalizing model ids

Summary

✅ Strengths

🐛 Potential Bugs

CRITICAL: Breaking change in _extract_output signature

Missing @dataclass decorator

Potential KeyError in normalize_model_id

⚡ Performance Considerations

🔒 Security Considerations

🧪 Test Coverage

MISSING TEST COVERAGE:

BROKEN TESTS (must fix):

📋 Additional Recommendations

🎯 Action Items

Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CRITICAL: Breaking change in `_extract_output` signature

Missing `@dataclass` decorator