Fix navigation data race that crashes pages on ARM64#212
Merged
Conversation
ViewModel- and page-initiated navigation called NavigateTo directly, running NavigateToInternal on whatever thread the caller was on (often an async continuation on the thread pool). That mutated _currentPage concurrently with the render loop and published a not-yet-bound page, which the render loop could render before OnBound() had run. x86-64's TSO memory model masks the unsafe publication; ARM64's weak memory model exposes it as a NullReferenceException in the page's BuildLayout() (root cause of netclaw-dev/netclaw#1069 — netclaw init crash on Apple Silicon). Route VM/page navigation through the event channel via RequestNavigation so NavigateToInternal — and every read/write of _currentPage — runs only on the render-loop thread. This mirrors the existing NavigationRequested plumbing already used for input-driven navigation. Also add macos-latest (Apple Silicon) to the Test matrix for ARM64 baseline coverage.
Aaronontheweb
commented
May 18, 2026
Aaronontheweb
left a comment
Owner
Author
There was a problem hiding this comment.
Design makes sense - make the navigation part of the event model we use for bubbling everything else
netclaw pins macos-26 (Apple Silicon) across its workflows, including the Native Smoke (macOS) leg that surfaced the navigation race. Pin the same image here instead of the floating macos-latest for reproducible ARM64 coverage aligned with netclaw.
The AOT job's osx-arm64 leg still used the floating macos-latest. Pin it to macos-26 like the Test matrix for consistent ARM64 coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ViewModel- and page-initiated navigation called
NavigateTodirectly, runningNavigateToInternalon whatever thread the caller was on — commonly an asynccontinuation on the thread pool. That mutated
_currentPageconcurrently with therender loop and published a not-yet-bound page, which the render loop could
render (
BuildLayout()) beforeOnBound()had run.x86-64's TSO memory model masks the unsafe publication, so it works by accident on
Linux/Windows. ARM64's weak memory model exposes it as a
NullReferenceExceptionin the page's
BuildLayout()— the root cause ofnetclaw-dev/netclaw#1069
(
netclaw initcrashes on Apple Silicon when the wizard hands off to the chat page).Fix
RequestNavigationhelper, which posts a
NavigationRequestedevent.NavigateToInternal— andevery read/write of
_currentPage/_currentViewModel/_layoutRoot— nowruns only on the render-loop thread. This mirrors the
NavigationRequestedplumbing already used for input-driven navigation, so the race is eliminated
structurally (no locks, no
volatile, no memory-model reasoning).NavigateTois unchanged and still used for the initial startupnavigation (single-threaded, before the render loop exists).
macos-latest(Apple Silicon) to the Test matrix for ARM64 baseline coverage.Behavior change
VM/page navigation is now processed on the next event-loop iteration rather than
synchronously. Every caller is fire-and-forget, so this is the correct model.
Tests
NavigationThreadSafetyTests— 2 deterministic tests asserting VM- andpage-initiated navigation is posted to the event channel and not run
synchronously on the caller's thread.
Follow-up
#211 — add a navigation concurrency stress test (deterministic tests cannot catch
race regressions; that needs a stress test run on the Apple Silicon leg).
Test plan
dotnet test— 1004 passedmacos-latestlegTerminain netclawDirectory.Packages.propsand confirm theNative Smoke (macOS)init-wizardleg passes