Add distributed tracing and structured error handling for extensions#6321
Conversation
wbreza
left a comment
There was a problem hiding this comment.
I worry that adding these tracing fields directly to all our messages will cause long term maintenance issues and require many updates and each new service will need to need to include the same values.
We should consider leveraging gRPC metadata that are passing along via HTTP2 headers similar to the approach we are taking with authorization.
Authorization is sent transparently to/from extensions via this method and so far has been easy to maintain/update.
56fbab3 to
eea682a
Compare
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Fixes #6292
This PR enhances the extension framework by adding support for distributed tracing propagation and improving error handling telemetry. It ensures that trace contexts are preserved across the gRPC boundary between
azdand extensions, and that service-specific errors from extensions are correctly captured in telemetry.A follow-up PR will be made to update the agents extension to use these new capabilities.
Changes
Distributed Tracing
TRACEPARENTandTRACESTATEenvironment variables when launching extension process and addedazdext.NewContext()helper to hydrate extension context fromTRACEPARENT/TRACESTATE.gRPC Metadata Propagation: ImplementedRealized that this wasn't needed to fix [ai agent extension] add result code to telemetry of azd deploy extension events #6292 and azure.ai.agents - Fix correlation id header propagation in agent_api #6316traceparent/tracestateheader injection and extraction using gRPC client and server interceptors.Structured Error Handling
azdext.ServiceErrorto capture detailed error information includingErrorCode,StatusCode, andServiceName.MapErrorto recognize these structured errors and emit precise telemetry signals (e.g.,ext.service.openai.429instead of generic errors).errors.protowithExtensionErrorandServiceErrorDetailmessage types, and updatedFrameworkServiceErrorMessageandServiceTargetErrorMessageto useExtensionError(not a breaking change).Bug Fixes
SendAndWaitandSendAndWaitWithProgressinMessageBrokerto prevent race conditions during concurrent stream writes.Validation
Tracing Comparison:
{"Status": {
"Code": "Error",
"Description": "UnknownError"
},
"Attributes": { ... }
...
}
{"Status": {
"Code": "Error",
"Description": "ext.service.ai.500"
},
"Attributes": {
"error.service.name": "ai",
"error.service.host": "services.ai.azure.com",
"error.service.statusCode": "500",
...
}
...
}