gpui_wgpu: Add surface lifecycle methods for mobile platforms#50815
Conversation
a77e098 to
5ef9c1b
Compare
| // Prefer Mailbox (triple-buffering) to avoid blocking in | ||
| // get_current_texture() during mobile lifecycle transitions | ||
| // (e.g. Android rotation, background/foreground). Fifo blocks | ||
| // on VSync and can deadlock if the compositor is frozen. | ||
| present_mode: if surface_caps | ||
| .present_modes | ||
| .contains(&wgpu::PresentMode::Mailbox) | ||
| { | ||
| wgpu::PresentMode::Mailbox | ||
| } else if surface_caps | ||
| .present_modes | ||
| .contains(&wgpu::PresentMode::AutoNoVsync) | ||
| { | ||
| wgpu::PresentMode::AutoNoVsync | ||
| } else { | ||
| wgpu::PresentMode::Fifo | ||
| }, |
There was a problem hiding this comment.
I think we should let the caller decide on the preferred modes as this can have performance and energy consumption considerations and I do not think we want to use mailbox by default
There was a problem hiding this comment.
@Veykril Good point — updated to let the caller decide. Added preferred_present_mode: Option<wgpu::PresentMode> to WgpuSurfaceConfig. When None (all existing callers), it defaults to Fifo — no behavior change for desktop. Mobile platforms can pass Some(PresentMode::Mailbox) to opt into triple-buffering.
Also rebased on latest main and adapted to the WgpuResources refactor + device_lost field.
9a41450 to
f5097d5
Compare
Add `unconfigure_surface()` and `replace_surface()` to `WgpuRenderer` to support Android's native window lifecycle where the surface can be destroyed and recreated (e.g. background/foreground transitions, orientation changes) without losing the GPU device, atlas, or pipelines. Without these methods, Android apps using gpui_wgpu must destroy the entire renderer when the native window is terminated, losing all cached AtlasTextureIds. When GPUI's scene cache references those stale IDs on resume, it causes index-out-of-bounds panics in the atlas. Changes: - `unconfigure_surface()`: Marks the surface as unconfigured so `draw()` skips rendering, but keeps the renderer alive. - `replace_surface()`: Creates a new wgpu surface from fresh window handles and reconfigures it, preserving all GPU state. - Prefer PresentMode::Mailbox (triple-buffering) over Fifo to avoid blocking in get_current_texture() during lifecycle transitions. - Early return in draw() when surface is unconfigured to prevent blocking on some drivers (e.g. Adreno).
Address review feedback: instead of hardcoding Mailbox as the default present mode, add `preferred_present_mode` to `WgpuSurfaceConfig` so the caller can choose. Defaults to Fifo (VSync) when None, preserving existing desktop behavior. Mobile platforms can pass Mailbox to avoid blocking during lifecycle transitions. Also add the missing `surface_configured` field to the WgpuRenderer struct.
Head branch was pushed to by a user without write access
f5097d5 to
7c86685
Compare
|
@Veykril Rebased on latest main, fixed cargo fmt. Formatting verified on all crates for the failing tests. Ready for another look when you get a chance! |
|
@Veykril can you please take another look and merge this? |
…dustries#50815) ## Summary - Add `unconfigure_surface()` and `replace_surface()` methods to `WgpuRenderer` for mobile platform window lifecycle management - Prefer `PresentMode::Mailbox` (triple-buffering) over `Fifo` to avoid blocking during lifecycle transitions - Early return in `draw()` when surface is unconfigured to prevent driver hangs ## Motivation On Android, the native window (`ANativeWindow`) is destroyed when the app goes to the background and recreated when it returns to the foreground. The same happens during orientation changes. Without surface lifecycle methods, the only option is to destroy the entire `WgpuRenderer` and create a new one on resume. The problem: GPUI's scene cache holds `AtlasTextureId` references from the old renderer's atlas. A new renderer has an empty atlas, so those cached IDs cause index-out-of-bounds panics. The fix: Keep the renderer (device, queue, atlas, pipelines) alive across surface destruction. Only the wgpu `Surface` needs to be replaced. ### `unconfigure_surface()` Marks the surface as unconfigured so `draw()` skips rendering via the existing `surface_configured` guard. Drops intermediate textures that reference the old surface dimensions. The renderer stays fully alive. ### `replace_surface()` Creates a new `wgpu::Surface` from fresh window handles using the **same** `wgpu::Instance` that created the original adapter/device. Reconfigures the surface and marks it as configured so rendering resumes. All cached atlas textures remain valid. ### PresentMode::Mailbox `Fifo` (VSync) blocks in `get_current_texture()` and can deadlock if the compositor is frozen during a lifecycle transition (e.g. `TerminateWindow` → `InitWindow` on Android). Mailbox (triple-buffering) avoids this. Falls back to `AutoNoVsync` → `Fifo` if unsupported. ### draw() early return Some drivers (notably Adreno) block indefinitely when acquiring a texture from an unconfigured surface. The early return prevents this. ## Context This is needed by [gpui-mobile](https://github.com/itsbalamurali/gpui-mobile), a project bringing GPUI to Android and iOS. The Android implementation needs these methods to handle: 1. **Background/foreground transitions** — `TerminateWindow` destroys the native window, `InitWindow` recreates it 2. **Orientation changes** — Surface is destroyed and recreated with new dimensions 3. **Split-screen transitions** — Similar surface recreation Without this change, we maintain a local fork of `gpui_wgpu` with just these additions. Upstreaming them would let mobile platform implementations use the official crate directly. ## Test plan - [x] Tested on Android (Motorola, Adreno 720 GPU) — 3 consecutive background/foreground cycles, zero panics, atlas textures preserved - [x] Tested orientation changes (portrait→landscape→portrait) — surface replacement completed in <40ms per rotation - [x] Verified `draw()` correctly skips rendering when surface is unconfigured - [x] Verified no regression on desktop — methods are additive, existing code paths unchanged - [x] PresentMode fallback chain works on devices that don't support Mailbox ## Release Notes - N/A
The WgpuRenderer defaults to VK_PRESENT_MODE_FIFO_KHR (vsync), which blocks vkQueuePresentKHR until the compositor releases a buffer via wl_surface.frame. On some Wayland compositor+driver combinations (notably NVIDIA proprietary + Hyprland, but also observed on KDE/GNOME + AMD RADV), these frame callbacks can be delayed or lost, stalling the entire calloop event loop for tens of seconds. This manifests as multi-second UI freezes, all background tasks piling up, and eventual "Broken pipe" crashes. VK_PRESENT_MODE_MAILBOX_KHR does not block on vblank — it replaces the pending frame in a single-entry queue. This avoids the stall entirely. The renderer already falls back to Fifo automatically if Mailbox is unsupported by the driver. The WgpuSurfaceConfig has had a preferred_present_mode field since zed-industries#50815 (added for Android lifecycle transitions with the same rationale). This commit sets it to Mailbox in the Wayland window creation path only. X11 is not affected — X11 does not use wl_surface.frame for presentation pacing. Refs: zed-industries#50229, zed-industries#55345, zed-industries#38497, zed-industries#52009, zed-industries#50195, zed-industries#50283, zed-industries#50734, zed-industries#52403, zed-industries#50574, zed-industries#52403, zed-industries#49961, zed-industries#47750, zed-industries#46203, zed-industries#42164, zed-industries#39097, zed-industries#39156, zed-industries#39234, zed-industries#35948, zed-industries#32618
The WgpuRenderer defaults to VK_PRESENT_MODE_FIFO_KHR (vsync), which blocks vkQueuePresentKHR until the compositor releases a buffer via wl_surface.frame. On some Wayland compositor+driver combinations (notably NVIDIA proprietary + Hyprland, but also observed on KDE/GNOME + AMD RADV), these frame callbacks can be delayed or lost, stalling the entire calloop event loop for tens of seconds. VK_PRESENT_MODE_MAILBOX_KHR does not block on vblank — it replaces the pending frame in a single-entry queue. This avoids the stall entirely. The renderer already falls back to Fifo automatically if Mailbox is unsupported by the driver. The WgpuSurfaceConfig has had a preferred_present_mode field since zed-industries#50815 (added for Android lifecycle transitions with the same rationale). This commit sets it to Mailbox in the Wayland window creation path only. X11 is not affected. Closes: zed-industries#50229 Closes: zed-industries#55345 Closes: zed-industries#39097 Closes: zed-industries#50734 Refs: zed-industries#38497, zed-industries#52009, zed-industries#52403, zed-industries#50574, zed-industries#49961, zed-industries#47750, zed-industries#46203, zed-industries#50195, zed-industries#50283, zed-industries#42164, zed-industries#39156, zed-industries#39234, zed-industries#35948, zed-industries#32618 Release Notes: - Fixed multi-second UI freezes on Linux Wayland when using certain GPU/driver combinations (NVIDIA + Hyprland, AMD RADV + Mutter, and similar). Blocks in VK_PRESENT_MODE_FIFO_KHR on delayed wl_surface.frame callbacks caused the entire event loop to stall.
…d-industries#57077) The WgpuRenderer defaults to VK_PRESENT_MODE_FIFO_KHR (vsync), which blocks vkQueuePresentKHR until the compositor releases a buffer via wl_surface.frame. On some Wayland compositor+driver combinations (notably NVIDIA proprietary + Hyprland, but also observed on KDE/GNOME + AMD RADV), these frame callbacks can be delayed or lost, stalling the entire calloop event loop for tens of seconds. VK_PRESENT_MODE_MAILBOX_KHR does not block on vblank: it replaces the pending frame in a single-entry queue. This avoids the stall entirely. The renderer already falls back to Fifo automatically if Mailbox is unsupported by the driver. The WgpuSurfaceConfig has had a preferred_present_mode field since zed-industries#50815 (added for Android lifecycle transitions with the same rationale). This commit sets it to Mailbox in the Wayland window creation path only. X11 is not affected. Self-Review Checklist: - [x] I've reviewed my own diff for quality, security, and reliability - [x] Unsafe blocks (if any) have justifying comments - [x] The content is consistent with the [UI/UX checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist) - [x] Tests cover the new/changed behavior - [x] Performance impact has been considered and is acceptable Note on tests: This change is in the Wayland platform's window creation path (WaylandWindowState::new). The surface configuration is delegated to WgpuRenderer which already has test coverage for preferred_present_mode fallback logic. A full integration test would require a running Wayland compositor in CI. Verified manually and tested against the renderer's unwrap_or(Fifo) safety net by inspecting surface_caps.present_modes on both NVIDIA proprietary and Mesa RADV drivers. Closes: zed-industries#50229 Closes: zed-industries#55345 Closes: zed-industries#39097 Closes: zed-industries#50734 Refs: zed-industries#38497, zed-industries#52009, zed-industries#52403, zed-industries#50574, zed-industries#49961, zed-industries#47750, zed-industries#46203, zed-industries#50195, zed-industries#50283, zed-industries#42164, zed-industries#39156, zed-industries#39234, zed-industries#35948, zed-industries#32618 Release Notes: - Fixed UI freezes on Linux (Wayland) when on certain GPU/driver combinations --------- Co-authored-by: Neel <neel@zed.dev>
…d-industries#57077) The WgpuRenderer defaults to VK_PRESENT_MODE_FIFO_KHR (vsync), which blocks vkQueuePresentKHR until the compositor releases a buffer via wl_surface.frame. On some Wayland compositor+driver combinations (notably NVIDIA proprietary + Hyprland, but also observed on KDE/GNOME + AMD RADV), these frame callbacks can be delayed or lost, stalling the entire calloop event loop for tens of seconds. VK_PRESENT_MODE_MAILBOX_KHR does not block on vblank: it replaces the pending frame in a single-entry queue. This avoids the stall entirely. The renderer already falls back to Fifo automatically if Mailbox is unsupported by the driver. The WgpuSurfaceConfig has had a preferred_present_mode field since zed-industries#50815 (added for Android lifecycle transitions with the same rationale). This commit sets it to Mailbox in the Wayland window creation path only. X11 is not affected. Self-Review Checklist: - [x] I've reviewed my own diff for quality, security, and reliability - [x] Unsafe blocks (if any) have justifying comments - [x] The content is consistent with the [UI/UX checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist) - [x] Tests cover the new/changed behavior - [x] Performance impact has been considered and is acceptable Note on tests: This change is in the Wayland platform's window creation path (WaylandWindowState::new). The surface configuration is delegated to WgpuRenderer which already has test coverage for preferred_present_mode fallback logic. A full integration test would require a running Wayland compositor in CI. Verified manually and tested against the renderer's unwrap_or(Fifo) safety net by inspecting surface_caps.present_modes on both NVIDIA proprietary and Mesa RADV drivers. Closes: zed-industries#50229 Closes: zed-industries#55345 Closes: zed-industries#39097 Closes: zed-industries#50734 Refs: zed-industries#38497, zed-industries#52009, zed-industries#52403, zed-industries#50574, zed-industries#49961, zed-industries#47750, zed-industries#46203, zed-industries#50195, zed-industries#50283, zed-industries#42164, zed-industries#39156, zed-industries#39234, zed-industries#35948, zed-industries#32618 Release Notes: - Fixed UI freezes on Linux (Wayland) when on certain GPU/driver combinations --------- Co-authored-by: Neel <neel@zed.dev>
…d-industries#57077) The WgpuRenderer defaults to VK_PRESENT_MODE_FIFO_KHR (vsync), which blocks vkQueuePresentKHR until the compositor releases a buffer via wl_surface.frame. On some Wayland compositor+driver combinations (notably NVIDIA proprietary + Hyprland, but also observed on KDE/GNOME + AMD RADV), these frame callbacks can be delayed or lost, stalling the entire calloop event loop for tens of seconds. VK_PRESENT_MODE_MAILBOX_KHR does not block on vblank: it replaces the pending frame in a single-entry queue. This avoids the stall entirely. The renderer already falls back to Fifo automatically if Mailbox is unsupported by the driver. The WgpuSurfaceConfig has had a preferred_present_mode field since zed-industries#50815 (added for Android lifecycle transitions with the same rationale). This commit sets it to Mailbox in the Wayland window creation path only. X11 is not affected. Self-Review Checklist: - [x] I've reviewed my own diff for quality, security, and reliability - [x] Unsafe blocks (if any) have justifying comments - [x] The content is consistent with the [UI/UX checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist) - [x] Tests cover the new/changed behavior - [x] Performance impact has been considered and is acceptable Note on tests: This change is in the Wayland platform's window creation path (WaylandWindowState::new). The surface configuration is delegated to WgpuRenderer which already has test coverage for preferred_present_mode fallback logic. A full integration test would require a running Wayland compositor in CI. Verified manually and tested against the renderer's unwrap_or(Fifo) safety net by inspecting surface_caps.present_modes on both NVIDIA proprietary and Mesa RADV drivers. Closes: zed-industries#50229 Closes: zed-industries#55345 Closes: zed-industries#39097 Closes: zed-industries#50734 Refs: zed-industries#38497, zed-industries#52009, zed-industries#52403, zed-industries#50574, zed-industries#49961, zed-industries#47750, zed-industries#46203, zed-industries#50195, zed-industries#50283, zed-industries#42164, zed-industries#39156, zed-industries#39234, zed-industries#35948, zed-industries#32618 Release Notes: - Fixed UI freezes on Linux (Wayland) when on certain GPU/driver combinations --------- Co-authored-by: Neel <neel@zed.dev>
Summary
unconfigure_surface()andreplace_surface()methods toWgpuRendererfor mobile platform window lifecycle managementPresentMode::Mailbox(triple-buffering) overFifoto avoid blocking during lifecycle transitionsdraw()when surface is unconfigured to prevent driver hangsMotivation
On Android, the native window (
ANativeWindow) is destroyed when the app goes to the background and recreated when it returns to the foreground. The same happens during orientation changes. Without surface lifecycle methods, the only option is to destroy the entireWgpuRendererand create a new one on resume.The problem: GPUI's scene cache holds
AtlasTextureIdreferences from the old renderer's atlas. A new renderer has an empty atlas, so those cached IDs cause index-out-of-bounds panics.The fix: Keep the renderer (device, queue, atlas, pipelines) alive across surface destruction. Only the wgpu
Surfaceneeds to be replaced.unconfigure_surface()Marks the surface as unconfigured so
draw()skips rendering via the existingsurface_configuredguard. Drops intermediate textures that reference the old surface dimensions. The renderer stays fully alive.replace_surface()Creates a new
wgpu::Surfacefrom fresh window handles using the samewgpu::Instancethat created the original adapter/device. Reconfigures the surface and marks it as configured so rendering resumes. All cached atlas textures remain valid.PresentMode::Mailbox
Fifo(VSync) blocks inget_current_texture()and can deadlock if the compositor is frozen during a lifecycle transition (e.g.TerminateWindow→InitWindowon Android). Mailbox (triple-buffering) avoids this. Falls back toAutoNoVsync→Fifoif unsupported.draw() early return
Some drivers (notably Adreno) block indefinitely when acquiring a texture from an unconfigured surface. The early return prevents this.
Context
This is needed by gpui-mobile, a project bringing GPUI to Android and iOS. The Android implementation needs these methods to handle:
TerminateWindowdestroys the native window,InitWindowrecreates itWithout this change, we maintain a local fork of
gpui_wgpuwith just these additions. Upstreaming them would let mobile platform implementations use the official crate directly.Test plan
draw()correctly skips rendering when surface is unconfiguredRelease Notes