Ability to easily provide custom error messages/suggestions/docs for errors.#6827
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a pattern-based error suggestion system that transforms cryptic Azure errors into actionable, user-friendly guidance. The system matches error messages against patterns defined in resources/error_suggestions.yaml and displays helpful suggestions before falling back to AI-based error handling.
Changes:
- Introduces
ErrorSuggestionServicethat matches error messages against 30+ predefined patterns for common Azure errors (quota limits, authentication failures, Bicep errors, networking issues, container problems, and missing tools) - Adds pattern matching engine with support for simple substring matching and regex patterns
- Integrates error suggestion logic into
ErrorMiddlewareto wrap errors with user-friendly messages before AI processing - Enhances
UxMiddlewareto display structured error suggestions with message, actionable steps, and documentation links - Includes comprehensive test coverage for pattern matching, service integration, and UX display
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/resources/error_suggestions.yaml | Configuration file with 30+ error patterns mapping Azure errors to user-friendly messages and suggestions |
| cli/azd/resources/resources.go | Embeds error_suggestions.yaml into the binary |
| cli/azd/pkg/errorhandler/types.go | Defines ErrorSuggestionRule and MatchedSuggestion data structures |
| cli/azd/pkg/errorhandler/service.go | Service that loads YAML config and matches errors against patterns |
| cli/azd/pkg/errorhandler/matcher.go | Pattern matching engine with substring and regex support |
| cli/azd/pkg/errorhandler/matcher_test.go | Comprehensive tests for pattern matching logic |
| cli/azd/pkg/output/ux/error_with_suggestion.go | UX component for displaying errors with suggestions |
| cli/azd/pkg/output/ux/error_with_suggestion_test.go | Tests for error display formatting |
| cli/azd/cmd/middleware/error.go | Integrates ErrorSuggestionService to wrap errors before AI processing |
| cli/azd/cmd/middleware/error_test.go | Tests for error middleware integration including pattern matching |
| cli/azd/cmd/middleware/ux.go | Enhanced to display ErrorWithSuggestion with proper formatting |
| cli/azd/cmd/container.go | Registers ErrorSuggestionService as a singleton |
| cli/azd/internal/errors.go | Updated ErrorWithSuggestion to include Message and DocUrl fields |
| cli/azd/docs/error-suggestions.md | Comprehensive documentation for adding and maintaining error patterns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add ErrorHandlerPipeline that evaluates YAML rules with three matching strategies: text patterns, error type via reflection, and property dot-path matching - Add ErrorHandler interface for named IoC-registered handlers that compute dynamic suggestions - Move ErrorWithSuggestion to pkg/errorhandler for extension visibility - Add errorType, properties, and handler fields to YAML schema - Add ARM deployment error rules (soft-delete, quota, SKU, auth) - Update ErrorMiddleware to use pipeline instead of direct service - Comprehensive tests for reflection matching and pipeline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
, Azure#6801 Remove placeholder/generated error rules and replace with real error scenarios backed by telemetry data: - ARM soft-delete conflicts (PR Azure#6810): FlagMustBeSetForRestore, ConflictError, Conflict/RequestConflict with soft-delete keywords - ARM root-cause hints (PR Azure#6801): InsufficientQuota, SkuNotAvailable, SubscriptionIsOverQuotaForSku, LocationIsOfferRestricted, AuthorizationFailed, InvalidTemplate, ValidationError, ResourceNotFound - PowerShell hook failures (PR Azure#6804): ExitError type matching with stderr patterns for module loading, Az module, execution policy, and error action preference issues - Keep only validated text patterns: AADSTS, BCP codes, QuotaExceeded, azure.yaml parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Property values now support the same matching conventions as patterns: case-insensitive substring by default, or regex: prefix for regular expressions. This enables filtering ExitError by Cmd field using regex:(?i)pwsh|powershell to avoid false positives on non-PowerShell commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 'regex: true' boolean field on rules instead of 'regex:' prefix convention on individual patterns. Cleaner YAML, consistent behavior across patterns and properties. - Consolidate soft-delete Conflict/RequestConflict keyword rules into single regex rules covering all keywords from PR Azure#6810: soft delete, soft-delete, purge, deleted vault, deleted resource, recover or purge - Add missing PowerShell hook scenarios from PR Azure#6804: Connect-AzAccount auth expired, login token expired - Update docs, tests, and matcher to use useRegex flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make AzureDeploymentError and DeploymentErrorLine idiomatic Go errors:
- DeploymentErrorLine now implements error interface (Error() string)
- DeploymentErrorLine.Unwrap() []error returns Inner children,
enabling errors.As to traverse the full ARM error tree
- AzureDeploymentError.Unwrap() []error returns both Inner error
and Details tree for complete error chain traversal
- findErrorByTypeName now supports multi-unwrap (Unwrap() []error)
via depth-first stack traversal
This means YAML rules like:
errorType: DeploymentErrorLine
properties: { Code: FlagMustBeSetForRestore }
now match error codes buried 3-4 levels deep in ARM deployment
error trees without any special traversal logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switch YAML rules from 'errorType: AzureDeploymentError' with 'Details.Code' to 'errorType: DeploymentErrorLine' with 'Code'. Since DeploymentErrorLine now implements Unwrap() []error, the pipeline's findErrorByTypeName traverses the full ARM error tree and finds DeploymentErrorLine nodes at any depth. This means error codes like FlagMustBeSetForRestore buried 3-4 levels deep are now matched correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 7 integration tests using real AzureDeploymentError JSON that verify the full pipeline finds DeploymentErrorLine codes at any depth in the ARM error tree: - FlagMustBeSetForRestore 3 levels deep - InsufficientQuota under DeploymentFailed - Conflict code + soft-delete keyword in message - No match when code differs - First matching rule wins with multiple codes - Matching through fmt.Errorf wrapper - ValidationError 4 levels deep Also fix findErrorByTypeName to check properties during traversal rather than only on the first type match. This ensures that when multiple DeploymentErrorLine nodes exist in the tree (some with empty Code due to DeploymentFailed stripping), the search continues until it finds one where both type AND properties match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add ordering header explaining the specificity principle - Move bare 'Conflict' code rule AFTER Conflict + keyword rules - Move broad text patterns (AADSTS, quota) to very bottom - Remove overly broad 'OperationNotAllowed' text pattern (too generic, could match non-quota errors) - Group text patterns: specific first, broad/generic last - Typed error rules (errorType + properties) naturally come first since they are more specific than text-only patterns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Demonstrates the named ErrorHandler pattern: - Handler registered in IoC as 'skuNotAvailableHandler' - YAML rule references it via handler: 'skuNotAvailableHandler' - Handler dynamically includes current AZURE_LOCATION in suggestion and provides az CLI command to list available SKUs - Falls back to generic guidance when no location is set - Unit tests for both with/without AZURE_LOCATION scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Update examples to use DeploymentErrorLine instead of AzureDeploymentError - Update property paths from Details.Code to Code - Add real SkuNotAvailableHandler example with code - Add specificity ordering best practice - Add sku_handler.go to file layout table - Note multi-unwrap traversal in architecture section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Query ARM Providers API for available regions per resource type - Extract resource type from error message via regex - Move ARM SDK implementation to pkg/azapi/resource_type_locations.go - Use ResourceTypeLocationResolver interface to avoid import cycles - Both SkuNotAvailable and LocationIsOfferRestricted use new handler - Add comprehensive tests for resource type extraction and suggestions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dback - Remove cmd/middleware/soft_delete_hint.go and tests (superseded by YAML rules) - Fix concurrency: add sync.RWMutex to PatternMatcher regex cache - Fix misleading test comment about LLM feature disabled vs no-prompt mode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0fc6ce9 to
d5b8736
Compare
- Add YAML rule matching DeploymentErrorLine with Code LocationNotAvailableForResourceType using resourceNotAvailableHandler - Add integration test with real ARM validation error JSON - Add unit test with mock resolver for the full handler flow - Covers the exact error returned when deploying Static Web Apps to an unsupported region (e.g. eastus) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When BeginValidateAtSubscriptionScope fails immediately (before polling), the error is an azcore.ResponseError wrapped in fmt.Errorf — not an AzureDeploymentError. Add a rule matching ResponseError.ErrorCode so the handler fires for both code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
extractResourceType now first looks for 'resource type Microsoft.X/Y' in the error message before falling back to bare matches. This prevents matching Microsoft.Resources/deployments from the ARM URL instead of the actual resource type like Microsoft.Web/staticSites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
### Packages impacted by this PR - `@azure/identity` ### Issues associated with this PR Fixes Azure/azure-dev#7857 (parent: Azure/azure-dev#7728) ### Describe the problem that is addressed by this PR Starting with [azd v1.23.7](https://github.com/Azure/azure-dev/releases/tag/azure-dev-cli_1.23.7) (PR [Azure/azure-dev#6827](Azure/azure-dev#6827)), `azd auth token` changed its stderr error format from the legacy `consoleMessage` JSON to a structured `{"error":"..."}` JSON object. The stderr output may also include an extraneous empty `consoleMessage` line preceding the error (fixed in v1.24.0 via [Azure/azure-dev#7701](Azure/azure-dev#7701)). This PR updates `AzureDeveloperCliCredential` error parsing to handle all three formats: | azd version | stderr format | |---|---| | pre-v1.23.7 | `{"type":"consoleMessage","data":{"message":"..."}}` | | v1.23.7 – v1.23.15 | `{"type":"consoleMessage",...}\n{"error":"..."}` (two lines) | | v1.24.0+ | `{"error":"..."}` (single line) | `AzureDeveloperCliCredential.parseAzdStderr` previously only handled the legacy single-line `consoleMessage` shape. On azd v1.23.7+ it would fail to extract the message and surface the raw JSON blob in the credential's error message instead of the underlying AAD error. The parser splits stderr by newline, prefers the structured `error` field, and falls back to the first non-empty `data.message` from a legacy `consoleMessage` line. If neither is found the raw text is returned unchanged, preserving existing behaviour for plain-text and malformed output. ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? Considered approaches: 1. **Single-pass split-by-newline parser (chosen).** Splits stderr on `\n`, attempts `JSON.parse` per line, prefers the structured `error` field, and falls back to the first non-empty `data.message`. Handles all three formats in one pass with minimal allocations. Returns the raw stderr unchanged when neither field is present, preserving existing behaviour for plain-text/malformed output. 2. **Try the new format only and drop legacy support.** Rejected — `@azure/identity` supports users on older azd installs; silently regressing them is unacceptable. The chosen design also matches the equivalent fixes in [`azure-sdk-for-go`](Azure/azure-sdk-for-go#26590) and [`azure-sdk-for-net`](Azure/azure-sdk-for-net#58616), keeping behaviour consistent across the three SDKs. ### Are there test cases added in this PR? _(If not, why?)_ Yes, added unit tests in `sdk/identity/identity/test/internal/node/azureDeveloperCliCredential.spec.ts` Also manually validated with small Node sample app: ```js const credential = new AzureDeveloperCliCredential({ // Well-formed UUID so client-side validation passes; AAD will reject it. tenantId: "00000000-0000-0000-0000-000000000001", }); console.log("Requesting token for scope: https://management.azure.com/.default"); try { const token = await credential.getToken("https://management.azure.com/.default"); console.log("Unexpected success:", token); } catch (err) { console.log("---"); console.log("Caught:", err.name); console.log(err.message); } ``` #### Without changes (azd v1.23.7+) ``` Caught: CredentialUnavailableError {"type":"consoleMessage","timestamp":"2026-05-04T16:48:58.6753928-07:00","data":{"message":"\n"}} {"error":"fetching token: failed to authenticate:\n(invalid_tenant) AADSTS90002: Tenant '00000000-0000-0000-0000-000000000001' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant. Trace ID: 7397ba12-327a-4555-8fa2-9accb81d9200 Correlation ID: c050f408-3a84-4359-b403-07f072c6e1b0 Timestamp: 2026-05-04 23:48:58Z\n","links":[{"title":"azd auth login reference","url":"https://learn.microsoft.com/azure/developer/azure-developer-cli/reference#azd-auth-login"}],"message":"Authentication with Azure failed.","suggestion":"Run 'azd auth login' to sign in again."} ``` #### With changes ``` Caught: CredentialUnavailableError (invalid_tenant) AADSTS90002: Tenant '00000000-0000-0000-0000-000000000001' not found. Check to make sure you have the correct tenant ID and are signing into the correct cloud. Check with your subscription administrator, this may happen if there are no active subscriptions for the tenant. Trace ID: f8110e9d-7fe8-4367-9731-5c0c512daa00 Correlation ID: 65c635ef-5963-4d83-9cac-bd99197f1d0d Timestamp: 2026-05-04 23:47:44Z ``` ### Provide a list of related PRs _(if any)_ - [Azure/azure-sdk-for-go#26590](Azure/azure-sdk-for-go#26590) — equivalent Go fix - [Azure/azure-sdk-for-net#58616](Azure/azure-sdk-for-net#58616) — equivalent .NET fix ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ N/A ### Checklists - [x] Added impacted package name to the issue description - [x] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ — N/A, hand-written client code, no codegen involved. - [x] Added a changelog (if necessary)
Summary
Adds a YAML-driven error handling pipeline that matches raw Azure errors against well-known patterns and wraps them with user-friendly messages, actionable suggestions, and reference links. The goal: anyone can improve the error experience by editing a single YAML file.
For errors that need runtime context (like querying Azure for available regions), the pipeline supports named handlers registered in the IoC container that compute suggestions dynamically while still pulling static data (links, etc.) from the YAML rule that matched.
This PR also removes
soft_delete_hint.go(from #6810) since those scenarios are now expressed declaratively as YAML rules.How It Works
Error middleware flow:
YAML Rule Format
Rules live in
resources/error_suggestions.yaml(embedded at build time). Each rule can use text patterns, typed error matching, or both.Matching logic:
patterns- OR (any pattern matches); case-insensitive substring by default, regex whenregex: trueproperties- AND (all must match); resolved via reflection on the matched error typeResponse fields:
message- user-friendly explanationsuggestion- actionable next stepslinks- list of{url, title?}rendered as hyperlinked bulletshandler- name of a registered ErrorHandler for dynamic suggestions (receives the matching rule)Custom Handlers
When a rule specifies
handler, the pipeline resolves the named ErrorHandler from the IoC container and invokes it with the error and the matching rule. The handler can use rule data (e.g., links) or ignore it.Built-in: ResourceNotAvailableHandler handles LocationNotAvailableForResourceType:
Example
ARM Error Refactoring
DeploymentErrorLine now implements the Go error interface and Unwrap() []error, enabling:
Covered Scenarios
Files Changed
Related PRs
Subsumes error handling from: