Make MountHelper.Unmount resilient to expected cleanup errors in mount volume tests#125625
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io |
|
@copilot can you double-check that the changes will fix the failed scenarios. Here's the console logs for both issues: |
The Helix log SAS tokens in your comment have expired (1-hour window) so I can't fetch them, but here's my code-level analysis: Log 1 — The 4 Log 2 — Scenarios 3.3–3.5 used My fix removes the Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot the allowlist was updated, can you try #125625 (comment) again. |
After fetching the first log, I found the root cause: the When Fix (commit Changed 11
|
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Done in commit
All call sites updated to use |
…irectory path Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/0396228f-d7e0-4954-8a31-983a8744e5b5 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM assuming the tests are going to pass
|
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
ReparsePointUtilities.cs is some Windows pinvokes and some cross platform code in a single file. This is why we have raw numbers here and can't use constants out of Interop.Errors.cs. That should be fixed at some point. Meanwhile I have added error 2 which is morally equivalent to error 3 in this context. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 Copilot Code Review — PR #125625Note This review was generated by Copilot and validated across multiple models (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.2). Holistic AssessmentMotivation: The PR addresses test flakiness in mount volume tests (Fixes #125624) where Approach: Centralizing the "unmount + delete directory" pattern into Summary: Detailed Findings
|
|
@copilot is there an issue that ought to have been closed by this fix. |
…126660) > [!NOTE] > This PR was created with Copilot assistance. ## Fix deterministic MountVolume test failures on ARM64 Helix machines Fixes #125295, fixes #125624, fixes #126627 ### Problem `Directory_Delete_MountVolume.RunTest` and `Directory_ReparsePoints_MountVolume.runTest` fail deterministically (~100% of the time, ~750ms duration) on the `Windows.11.Arm64.Open` Helix machine pool. This is **not timing-related** and was not addressed by the delay/polling fixes in #125914 or the Unmount resilience fix in #125625 (those PRs fixed real timing issues -- pre-fix failures on other configurations have since expired from AzDO retention, so we can't verify directly, but there is no evidence they were ineffective for their intended purpose). **Root cause**: The ARM64 Helix machines have an E:\ drive (likely an Azure resource/temp disk) that passes all `DriveInfo` checks -- `DriveType=Fixed`, `DriveFormat=NTFS`, `IsReady=True` -- but `GetVolumeNameForVolumeMountPoint` fails with `ERROR_INVALID_PARAMETER` (87). The drive has no volume GUID and doesn't support volume mount point operations. `IOServices.GetNtfsDriveOtherThanCurrent()` returns this drive, and the test crashes trying to use it. Some ARM64 Helix machines have only C:\ and a CD-ROM (no second drive at all). On those machines, the cross-drive scenarios already skip gracefully and only same-drive scenarios 3.x run. ### Evidence Analyzed Helix console logs from 5 post-fix builds (all `arm64-NativeAOT-Win11`, same C:\ volume GUID). Every failure shows the identical pattern: - Scenario 1: `GetVolumeNameForVolumeMountPoint("E:\")` -> error 87 - Scenario 2: `SetVolumeMountPoint` onto E:\ succeeds but path traversal through the mount point fails with `DirectoryNotFoundException` - Scenarios 3.x (same-drive): Always pass Reproduced locally by removing the real E: drive letter and creating `SUBST E:` which exhibits identical error 87 behavior. ### Changes 1. **`IOServices.GetNtfsDriveOtherThan()`**: After the existing Fixed/Ready/NTFS checks, also verify the drive has a volume GUID via `GetVolumeNameForVolumeMountPoint`. Drives without one (SUBST drives, Azure resource disks) are skipped. 2. **`DumpDriveInformation` diagnostic test**: New Helix-only test (following the `DescriptionNameTests.DumpRuntimeInformationToConsole` pattern) that dumps all drives with their volume GUIDs to the console log. Makes future drive-related CI issues immediately diagnosable from the same Helix work item log. 3. **`GetVolumeNameForVolumeMountPoint` P/Invoke in DllImports.cs**: Uses `char[]` (not `StringBuilder`) because this file uses `LibraryImport` which does not support `StringBuilder`. ### Local validation | Scenario | Before fix | After fix | |---|---|---| | SUBST E: (no volume GUID) | Error 87 / DirectoryNotFoundException | Pass (SUBST filtered, scenarios 3.x run) | | Real NTFS E: | Pass (all scenarios) | Pass (all scenarios) | | Single-drive machine | Scenarios 1/2 skip, 3.x pass | Same -- no change | --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The
finallyblocks inReparsePoints_MountVolume.runTest()unconditionally calledMountHelper.Unmount(mountedDirName), which throws Win32 error 4390 ("not a reparse point") or error 3 ("path not found") when the directory was never successfully mounted or is on a drive that is no longer accessible. The exception propagated to the scenario'scatchblock, sets_pass = false, and failed the test — even though the actual test logic succeeded.Description
ReparsePointUtilities.csMountHelper.Unmountnow accepts an optionalbool deleteDirectory = falseparameter and silently ignores the two expected cleanup-time Win32 errors instead of throwing:ERROR_NOT_A_REPARSE_POINT): directory exists but was never successfully made a mount point, or the mount binding was already removed.ERROR_PATH_NOT_FOUND): the mount point path is on a drive that is no longer accessible (e.g., the other NTFS drive used in tests was removed).When
deleteDirectory: trueis passed, the method also deletes the directory after unmounting. Directory removal errors are not suppressed and propagate normally.This consolidates the repetitive cleanup pattern into the shared helper so callers don't need to repeat
if (Directory.Exists) { try { Unmount } catch { } DeleteDir }everywhere.ReparsePoints_MountVolume.csAll 4
finallycleanup blocks simplified from:to:
Delete_MountVolume.csThe same simplification applied to scenarios 1, 2, 3.1, and 3.2
finallycleanup blocks (which previously usedif (Directory.Exists) { Unmount; DeleteDir }). Scenarios 3.3–3.5 are unchanged as their cleanup is intertwined withEvalassertions andmountedDirNamemay be null if inner setup did not complete.Delete.Windows.csThe
Delete_VolumeMountPointcleanup block simplified in the same way.