-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: azure asr fatal error #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code Review: PR #1946 - Fix Azure ASR Fatal ErrorOverviewThis PR introduces proper fatal error handling for Azure ASR by distinguishing between fatal errors (authentication failures, bad requests) that should stop retries and non-fatal errors that allow reconnection attempts. Strengths
Issues and Concerns1. Incomplete Fatal Error Coverage (Medium Priority)Location: const.py:15-18 The PR only handles two error codes: AuthenticationFailure and BadRequest. Consider adding other fatal error codes like Forbidden for authorization issues. Documentation on which specific Azure error codes should be fatal vs. retryable would help future maintainers. 2. Reconnection Loop Termination (Important)Location: extension.py:435 Setting self.stopped = True prevents reconnection, but there's a potential race condition. The canceled event handler may fire AFTER session_stopped has already initiated a reconnect attempt, since the order of events from Azure SDK is not guaranteed. Recommendation: Consider adding a check at the start of _handle_reconnect() to respect the stopped flag immediately and prevent race conditions. 3. Test Coverage Gap (Important)Location: tests/test_vendor_error.py:60 The existing test expects a NON_FATAL_ERROR (code 1000) for any cancellation. With this PR, fatal errors should return code -1000. Required: Update test_vendor_error.py or create test_fatal_vendor_error.py to verify fatal error codes produce FATAL_ERROR (-1000), non-fatal codes produce NON_FATAL_ERROR (1000), self.stopped flag prevents reconnection after fatal errors, and reconnection still works for non-fatal errors. 4. Logging Enhancement (Minor)Location: extension.py:433-435 The code sets self.stopped = True but doesn't explicitly log that reconnection has been disabled. Add a log message when stopping reconnection attempts due to fatal error to improve debugging. Security ConsiderationsNo security issues identified. The PR properly handles authentication failures and prevents resource exhaustion from infinite retries on auth failures. Performance ConsiderationsPositive performance impact: Eliminates infinite retry loops for unrecoverable errors, reduces unnecessary API calls to Azure when credentials are invalid, and properly fails fast for configuration errors. Additional Recommendations
SummaryThis is a good fix that addresses a real issue with infinite retry loops on fatal errors. The implementation is clean and follows existing patterns in the codebase. Before merging, please address:
Approval Status: Approve with recommendations for test coverage. |
Code Review: Azure ASR Fatal Error HandlingSummaryThis PR introduces fatal error handling for Azure ASR to prevent unnecessary reconnection attempts when encountering authentication failures or bad requests. The changes properly differentiate between fatal and non-fatal errors, with corresponding test coverage. ✅ Positive Aspects
🔍 Issues & Concerns1. Critical Bug: Incorrect Error Code in ReconnectManager
|
ai_agents/agents/integration_tests/asr_guarder/tests/test_reconnection.py
Outdated
Show resolved
Hide resolved
ai_agents/agents/integration_tests/asr_guarder/tests/test_reconnection.py
Outdated
Show resolved
Hide resolved
Code Review for PR #1946: Fix Azure ASR Fatal ErrorSummaryThis PR adds fatal error handling to the Azure ASR extension to prevent infinite retry loops on non-recoverable errors like authentication failures and bad requests. The changes include categorizing error codes, stopping retries for fatal errors, and adding comprehensive test coverage. Strengths
Code Quality Observationsextension.py:431-435 - The fatal error handling logic is clean and well-placed. Setting self.stopped = True is the correct mechanism to prevent reconnection attempts. const.py:14-18 - The constant definition is clear. Consider adding inline documentation explaining why each error code is considered fatal. Potential IssuesTest Logic Error (test_reconnection.py:396-399) - There appears to be a logic error in the non-fatal error validation. The current code filters for codes that are NOT NON_FATAL_ERROR and asserts this equals the total length, which seems backwards. Should verify that NO fatal errors are present instead. Missing Coverage
Security & PerformanceNo security issues identified. Positive performance impact - prevents wasteful infinite retry loops. RecommendationsMust Fix: Fix the test logic error in test_reconnection.py:396-399 Should Consider: Review more fatal error codes, add documentation, notify reconnect_manager Nice to Have: Improve error message clarity, add test verifying reconnection prevention Overall AssessmentThis is a solid fix that addresses a real problem with appropriate error handling. The main issue is a logic error in the test validation. Once corrected, this should be good to merge. |
PR Review: Fix Azure ASR Fatal ErrorSummaryThis PR fixes Azure ASR error handling by distinguishing between fatal and non-fatal errors, preventing unnecessary retry attempts for authentication/bad request failures, and adding connection delay metrics. Code Quality and Best PracticesStrengths:
Issues:
Potential Bugs
Performance Considerations
Security Concerns
Test CoverageExcellent test coverage:
Suggestions:
Additional Observations
RecommendationsCritical:
High Priority: Medium Priority: Low Priority: Overall AssessmentThis is a well-structured PR that addresses an important issue with fatal error handling. The code quality is good, tests are comprehensive, and the implementation is mostly sound. The main concern is the logic error in the test validation which should be fixed before merging. The other issues are minor and can be addressed in follow-up PRs if needed. Recommendation: Request changes for the critical logic error, then approve after fix. |
No description provided.