Enhance MSAL cache logging and add tenant-specific re-authentication guidance#7578
Conversation
d9f63e4 to
5ec8884
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds opt-in MSAL cache metadata tracing to help diagnose refresh-token/cache issues (issue #7541), and improves refresh-token-expired error guidance to be tenant-specific (including classifying AADSTS700082 as re-login-required).
Changes:
- Introduces
AZD_DEBUG_MSAL_CACHEtracing that logs MSAL cache metadata snapshots before/after interactive/device-code login and after the first silent token acquisitions. - Enhances AADSTS70043/AADSTS700082 handling to provide tenant-specific re-authentication guidance while preserving existing
ErrorWithSuggestionfields. - Updates documentation and expands unit test coverage for the new tracing and error-handling paths.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/auth/manager.go | Wires a shared MSAL cache store into both MSAL and the new cache tracer; logs snapshots around login flows; passes tracer into credentials. |
| cli/azd/pkg/auth/final_coverage_test.go | Updates credential construction to include the new tracer parameter. |
| cli/azd/pkg/auth/errors.go | Treats AADSTS700082 as “login expired” alongside AADSTS70043. |
| cli/azd/pkg/auth/claims_test.go | Adds coverage ensuring AADSTS700082 results in “login expired” suggestion text. |
| cli/azd/pkg/auth/cache_windows.go | Refactors MSAL cache creation into newMsalCacheStore for Windows so it can be shared with tracer. |
| cli/azd/pkg/auth/cache_unix.go | Adds newMsalCacheStore on Unix builds for shared cache + tracer usage. |
| cli/azd/pkg/auth/cache_debug.go | Adds the msalCacheTracer implementation and AZD_DEBUG_MSAL_CACHE env-var parsing. |
| cli/azd/pkg/auth/cache_debug_test.go | Adds test to ensure cache tracing hashes token secrets and logs only once per phase. |
| cli/azd/pkg/auth/azd_credential.go | Logs cache snapshots once after first silent token success/failure (tenant and non-tenant variants). |
| cli/azd/pkg/auth/azd_credential_test.go | Adds coverage for logging behavior after failure then success; updates constructor calls. |
| cli/azd/pkg/auth/additional_coverage_test.go | Updates credential construction to include the new tracer parameter. |
| cli/azd/pkg/account/credentials.go | Enhances AADSTS refresh-token-expired errors with tenant-specific ErrorWithSuggestion message/suggestion (preserving fields if already wrapped). |
| cli/azd/pkg/account/credentials_test.go | Adds/extends tests validating tenant-specific messages and wrapper preservation. |
| cli/azd/docs/environment-variables.md | Documents AZD_DEBUG_MSAL_CACHE behavior and what metadata is logged/hashed. |
jongio
left a comment
There was a problem hiding this comment.
Issues to address:
- cache_debug.go:161 -
sortedContractKeysreimplementsslices.Sorted(maps.Keys(m)), already used in 14+ files across the codebase
The shared cache store design, mutex addition to memoryCache, and AADSTS700082 classification are all clean.
|
is there a reason why we could not use the same DEBUG flag we already have in azd? or introcuce a debug-level config (info/error/etc). ?? |
jongio
left a comment
There was a problem hiding this comment.
Solid PR - the nil-safe tracer design, thread-safe memoryCache, and tenant-specific re-auth guidance all look correct. One minor doc nit below.
Yeah the debug flag is meant to be temporary until we can get more information and land a proper fix |
wbreza
left a comment
There was a problem hiding this comment.
Code Review — PR #7578
Enhance MSAL cache logging and add tenant-specific re-authentication guidance by @JeffreyCA
Summary
Clean, well-tested auth debugging infrastructure. The MSAL cache tracer correctly hashes token secrets via SHA-256, uses proper strconv.ParseBool for opt-in enablement, and the RWMutex addition on memoryCache cleanly resolves the prior race condition concern. AADSTS700082 classification as "login expired" is correct. All 5 prior review items resolved. One supplementary finding below.
Prior Review Status
| # | Prior Finding | Author | Status |
|---|---|---|---|
| 1 | Race condition in memoryCache | @copilot-bot | ✅ Resolved (RWMutex) |
| 2 | sortedContractKeys reimplements slices.Sorted(maps.Keys(m)) |
@jongio | ✅ Resolved |
| 3 | Hash comment mismatch (8 vs 12 hex chars) | @jongio | ✅ Resolved |
| 4 | t.Context() / errors.AsType in tests | @copilot-bot + @jongio | ✅ Resolved |
🟡 Medium (1 finding)
- Debug log output should include a PII warning banner — When
AZD_DEBUG_MSAL_CACHE=true, the tracer logsHomeAccountIDandUsernamein plain text. While intended for debugging and documented in environment-variables.md, the log output itself has no warning. Users piping to files or sharing sessions may not realize PII is included. Add a one-time startup warning:"WARNING: MSAL cache tracing enabled — output contains account identifiers. Do not share without redaction."
✅ What Looks Good
- Correct AADSTS700082 classification as "login expired" alongside AADSTS70043
strconv.ParseBoolprevents accidental tracing enablement- SHA-256 hashing on cache keys and token values before logging
- RWMutex on memoryCache cleanly resolves race condition
LogSnapshotOncemutex-protected deduplication is correct- Tenant-specific error guidance preserves wrapped ErrorWithSuggestion fields
- Comprehensive table-driven tests for error classification and tracer lifecycle
- All 5 prior review items fully resolved
| Priority | Count |
|---|---|
| Medium | 1 |
| Total | 1 |
Overall Assessment: Comment — clean implementation with no bugs found. The PII warning finding is defense-in-depth.
Review performed with GitHub Copilot CLI
jongio
left a comment
There was a problem hiding this comment.
All previous inline feedback has been addressed - nice work on the mutex fix, the slices.Sorted(maps.Keys()) cleanup, errors.AsType usage, and the shortDigest comment/code alignment.
The code itself looks solid:
- Nil-safe tracer design with proper secret hashing
- Thread-safe
memoryCachewithRWMutex - Clean deduplication of tenant-specific
--tenant-idguidance in error wrapping - AADSTS700082 correctly classified alongside 70043
However, CI is failing across the board - golangci-lint reports a typecheck error in credentials_test.go (undefined: subscriptionTenantResolverFunc at lines 144 and 168), go-fix wants code modernization, and all platform builds fail. The type IS defined in the file (around line 210+), so this might be a cascading failure from another issue - worth investigating locally with golangci-lint run ./pkg/account/... to see the full error chain.
Once CI is green, this is ready for another look.
…guidance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c035a89 to
25fa779
Compare
Fixed some conflicts after #7549 got merged. Added a PII warning as suggested. |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
jongio
left a comment
There was a problem hiding this comment.
All previous findings resolved. Code is clean - RWMutex on memoryCache correctly prevents races, nil-safe tracer design with SHA-256 hashing, tenant-ID deduplication in error wrapping works correctly, AADSTS700082 classification is correct. Tests are thorough. CI is green. No new issues found.
- Add language-hooks.md documenting JavaScript and TypeScript hook executor support (Phase 2 and Phase 3). Includes configuration reference, examples for JS/TS hooks with auto-detection, package.json dependency installation, explicit kind, dir overrides, and platform overrides. Also documents the Prepare→Execute→Cleanup lifecycle and known limitations (PR Azure#7626). - Add AZD_DEBUG_MSAL_CACHE debug variable to environment-variables.md. When enabled with --debug, logs MSAL cache metadata to help diagnose stale refresh token issues (PR Azure#7578). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR adds opt-in MSAL cache tracing to help investigate issue #7541 and improves tenant-specific guidance for expired refresh-token errors.
What changed
AZD_DEBUG_MSAL_CACHEto log MSAL cache metadata before/after login and around the first silent token acquisitionsAADSTS70043/AADSTS700082handling to return tenant-specific re-authentication guidance while preserving wrapped error message metadataAADSTS700082as a re-login-required auth errorWhy
We need better diagnostics for stale or unexpectedly changing MSAL refresh tokens without changing the core login flow, and we want subscription credential failures to point users to the correct tenant-specific remediation.
User-facing instructions
Important
Do not run
azd auth logoutor delete~/.azdfirst. We want to preserve the existing cached auth state.Install PR build from
<TBD>Reproduce login with MSAL cache tracing enabled:
Immediately run the command that fails for you:
Also collect these follow-up silent token traces:
Save the terminal output from those commands, but do not share the raw token value returned by
azd auth token. If needed, redact the"token"field or the final access token string before sending the logs.Notes:
AZD_DEBUG_MSAL_CACHE=trueenables extra MSAL cache metadata tracing.--debugis required so those log lines are printed to the terminal.