You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The UnknownError result code accounts for 2,352,993 errors per 28 days (6,403 users) — making it the single largest error bucket in azd telemetry. These errors provide zero diagnostic signal because they lack any classification, error code, or service attribution.
Telemetry Evidence (rolling 28 days ending Mar 18, 2026)
Result Code
Count
Users
% of All Failures
UnknownError
2,352,993
6,403
65.2%
auth.login_required
547,007
1,984
15.2%
internal.errors_errorString
462,344
16,978
12.8%
All other codes
249,656
—
6.8%
69.5% of auth token failures are classified as UnknownError. This inflates the unknown bucket and makes telemetry dashboards unreliable for error analysis.
Where It Happens
UnknownError originates when errors reach MapError() in cli/azd/internal/cmd/errors.go but do not match any:
Generic fallback (which produces internal.* codes, not UnknownError)
This means UnknownError is likely set beforeMapError() is called, possibly by middleware or the command framework itself when errors are not propagated through the telemetry path.
Root Cause Hypotheses
Errors bypassing MapError() — Some error paths may return errors before the telemetry middleware has a chance to classify them, resulting in the default UnknownError status.
MSAL/credential chain errors — Token acquisition failures from the MSAL library may produce composite error types (e.g., ChainedTokenCredentialError) that are not caught by any type assertion in MapError().
HTTP transport errors — Network timeouts, TLS failures, or connection resets that do not match the isNetworkError() patterns.
Expired cached tokens — The credential cache may return errors that are not ReLoginRequiredError or AuthFailedError, falling through without classification.
internal.errors_errorString overlap — The 462K errors_errorString entries suggest widespread use of bare errors.New() without typed sentinels. Some of these may also contribute to UnknownError when they bypass MapError() entirely.
Proposed Investigation
Phase 1: Sampling (understand what is in the bucket)
Add error type sampling — Temporarily log the Go error type chain (errorType(err)) for UnknownError events to surface the actual types hitting this bucket.
Query Kusto for error patterns — Analyze the UnknownError entries for common AzdErrorType values, command paths, and execution environments to identify clusters.
Trace the UnknownError origin — Find all code paths where the span status is set to UnknownError (or where MapError() is not called).
Phase 2: Classification (add handlers for top types)
Add typed handlers — For the top error types found in Phase 1, add corresponding branches in MapError() with proper error codes and ServiceName attributes.
Add test cases — Ensure each new handler has test coverage following the existing Test_MapError and Test_ClassifySuggestionType_MatchesMapError patterns.
Phase 3: Prevention (reduce errors_errorString)
Audit hot paths — Identify the highest-volume errors.New() call sites in auth token, provision, and deploy code paths.
Add typed sentinels — Replace bare errors with typed sentinels or structured error types.
Expand test enforcement — The test suite already has allowedCatchAll enforcement (errors_test.go line 791). Expand this pattern to prevent regressions.
Expected Impact
Telemetry clarity: Moving even 50% of UnknownError into proper categories would transform the error analysis dashboard
Actionable insights: Classified errors can trigger targeted fixes, UX improvements, and agent-specific optimizations
Problem
The
UnknownErrorresult code accounts for 2,352,993 errors per 28 days (6,403 users) — making it the single largest error bucket in azd telemetry. These errors provide zero diagnostic signal because they lack any classification, error code, or service attribution.Telemetry Evidence (rolling 28 days ending Mar 18, 2026)
UnknownErrorauth.login_requiredinternal.errors_errorString69.5% of
auth tokenfailures are classified asUnknownError. This inflates the unknown bucket and makes telemetry dashboards unreliable for error analysis.Where It Happens
UnknownErrororiginates when errors reachMapError()incli/azd/internal/cmd/errors.gobut do not match any:*azcore.ResponseError,*auth.AuthFailedError)classifySentinel()isNetworkError()internal.*codes, notUnknownError)This means
UnknownErroris likely set beforeMapError()is called, possibly by middleware or the command framework itself when errors are not propagated through the telemetry path.Root Cause Hypotheses
Errors bypassing
MapError()— Some error paths may return errors before the telemetry middleware has a chance to classify them, resulting in the defaultUnknownErrorstatus.MSAL/credential chain errors — Token acquisition failures from the MSAL library may produce composite error types (e.g.,
ChainedTokenCredentialError) that are not caught by any type assertion inMapError().HTTP transport errors — Network timeouts, TLS failures, or connection resets that do not match the
isNetworkError()patterns.Expired cached tokens — The credential cache may return errors that are not
ReLoginRequiredErrororAuthFailedError, falling through without classification.internal.errors_errorStringoverlap — The 462Kerrors_errorStringentries suggest widespread use of bareerrors.New()without typed sentinels. Some of these may also contribute toUnknownErrorwhen they bypassMapError()entirely.Proposed Investigation
Phase 1: Sampling (understand what is in the bucket)
errorType(err)) forUnknownErrorevents to surface the actual types hitting this bucket.UnknownErrorentries for commonAzdErrorTypevalues, command paths, and execution environments to identify clusters.UnknownErrororigin — Find all code paths where the span status is set toUnknownError(or whereMapError()is not called).Phase 2: Classification (add handlers for top types)
MapError()with proper error codes andServiceNameattributes.Test_MapErrorandTest_ClassifySuggestionType_MatchesMapErrorpatterns.Phase 3: Prevention (reduce
errors_errorString)errors.New()call sites in auth token, provision, and deploy code paths.allowedCatchAllenforcement (errors_test.go line 791). Expand this pattern to prevent regressions.Expected Impact
UnknownErrorinto proper categories would transform the error analysis dashboardErrorWithSuggestion), the agent error categorization (classifyError()), and the pre-flight validation work (Feature: Pre-flight auth validation for provision/deploy/up to prevent downstream failures #7234)Related Issues
unknowntoaad)errorType())