Skip to content

[BUG] authentication context canceled #7170

Description

@hemarina

Bug Report

Follow up #6883 (review)

Description

When running the error handling workflow with the new Copilot SDK changes, the retry of provision step fails with an authentication context cancellation error. The authentication request to Microsoft identity platform is canceled before it can complete.

Error Message

ERROR: error executing step command 'provision': failed to authenticate: server response error:
Get "https://login.microsoftonline.com/common/discovery/instance?api-version=1.1&authorization_endpoint=https%3A%2F%2Flogin.microsoftonline.com%2Forganizations%2Foauth2%2Fv2.0%2Fauthorize": context canceled

Steps to Reproduce

  1. Enable the LLM alpha feature: azd config set alpha.llm on
  2. Run azd up (or a command that triggers provisioning)
  3. Trigger the error handling workflow (e.g., a failing provision step that engages the Copilot agent)
  4. After the Copilot agent generate the trouble shoot and ask user opts to retry, the retried provisioning command fails with the authentication context canceled error

Expected Behavior

The retry loop in the error handling middleware should properly re-establish authentication context. The authentication HTTP request to login.microsoftonline.com should complete successfully without context cancellation.

Actual Behavior

The authentication request is canceled mid-flight, suggesting the context passed to the authentication layer during the retry is already canceled or gets canceled prematurely.

Analysis

The error originates from the MSAL HTTP client making a discovery request to login.microsoftonline.com. The context canceled is an HTTP-level cancellation, not a user-initiated Ctrl+C.

Potential root causes:

  1. Context lifecycle in retry loop — In cmd/middleware/error.go, when the error middleware retries the original command (line ~287: actionResult, err = next(ctx)), the context may inherit cancellation from the previous failed attempt or the agent session teardown.
  2. Agent session context leakage — The azdAgent.Stop() is deferred, but the Copilot agent session may share or affect the parent context used for the retry. If the agent's internal context propagates cancellation upstream, the retried command's auth flow would see a canceled context.
  3. shouldSkipErrorAnalysis classification — The error contains the string context canceled but is wrapped as an auth/HTTP error, not as context.Canceled. This means errors.Is(err, context.Canceled) in shouldSkipErrorAnalysis() may not match, causing the middleware to attempt AI analysis on a fundamentally broken context, compounding the issue.
  4. classifyError handling — If the error is not wrapped as *auth.AuthFailedError or *auth.ReLoginRequiredError, it may be classified as AzureContextAndOtherError instead of UserContextError, leading to an inappropriate automated fix attempt.

Environment

  • OS: Windows
  • Feature flags: alpha.llm on
  • Command: azd up (provision step)

Relevant Code

  • cli/azd/cmd/middleware/error.go — Error handling middleware and retry loop
  • cli/azd/pkg/auth/ — Authentication flow and token acquisition
  • cli/azd/internal/agent/copilot_agent.go — Copilot agent session management

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Fields

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions