Avoid panicking in wgpu rendering during resize#50169
Conversation
Fixes Zed-5AW Fixes Zed-5AP
|
Looks like that could work, I can take a closer look this evening (CET). |
zortax
left a comment
There was a problem hiding this comment.
I left a few comments. Other than that, if creating the textures after ensuring the surface is valid fixes the issue, I think its fine, though I'm not entirely sure why it even happens (in my initial PR I didn't come across that when testing on X11). Disclaimer: I have neither reproduced the issue nor tested this PR myself.
| let (path_intermediate_texture, path_intermediate_view) = Self::create_path_intermediate( | ||
| &device, | ||
| surface_format, | ||
| config.size.width.0 as u32, | ||
| config.size.height.0 as u32, | ||
| ); | ||
| let (path_intermediate_texture, path_intermediate_view) = { | ||
| let (t, v) = Self::create_path_intermediate( | ||
| &device, | ||
| surface_format, | ||
| config.size.width.0 as u32, | ||
| config.size.height.0 as u32, | ||
| ); | ||
| (Some(t), Some(v)) | ||
| }; |
There was a problem hiding this comment.
The second stacktrace you posted says the panic happens inside this create_path_intermediate(...) call, so this PR won't fix that I think. Maybe you should start with both intermediates set to None here and let first actual draw call handle the creation.
| fn ensure_intermediate_textures(&mut self) { | ||
| if self.path_intermediate_texture.is_some() { | ||
| return; | ||
| } | ||
|
|
||
| let (path_intermediate_texture, path_intermediate_view) = { | ||
| let (t, v) = Self::create_path_intermediate( | ||
| &self.device, | ||
| self.surface_config.format, | ||
| self.surface_config.width, | ||
| self.surface_config.height, | ||
| self.rendering_params.path_sample_count, | ||
| ) | ||
| .map(|(t, v)| (Some(t), Some(v))) | ||
| .unwrap_or((None, None)); | ||
| self.path_msaa_texture = path_msaa_texture; | ||
| self.path_msaa_view = path_msaa_view; | ||
| } | ||
| ); | ||
| (Some(t), Some(v)) | ||
| }; | ||
| self.path_intermediate_texture = path_intermediate_texture; | ||
| self.path_intermediate_view = path_intermediate_view; | ||
|
|
||
| let (path_msaa_texture, path_msaa_view) = Self::create_msaa_if_needed( | ||
| &self.device, | ||
| self.surface_config.format, | ||
| self.surface_config.width, | ||
| self.surface_config.height, | ||
| self.rendering_params.path_sample_count, | ||
| ) | ||
| .map(|(t, v)| (Some(t), Some(v))) | ||
| .unwrap_or((None, None)); | ||
| self.path_msaa_texture = path_msaa_texture; | ||
| self.path_msaa_view = path_msaa_view; | ||
| } |
There was a problem hiding this comment.
Maybe its worth it to use something like device.push_error_scope(...) to handle any validation errors more gracefully instead of panicking?
There was a problem hiding this comment.
yeah. I looked at it, but it wasn't obvious how to get the error out (they give you a future, and it seemed unwise to wait on the future in the success case).
If we can't figure out some longer tail stuff, that seems like the next path though
There was a problem hiding this comment.
I'd have to look into this but I believe we can actually just block on this with pollster, the future should just immediately complete. If I remember correctly the API is this way, because on some platforms this may require a GPU roundtrip (webgpu?). But probably we should just prevent the error from happening in the first place...
There was a problem hiding this comment.
yeah, Claude agrees with you.
Going to try and cook up a patch that uses this (and the device_lost_callback) to try and increase the reliability.
We see a bunch of errors right now (mostly "out of memory" but occasional "Texture with label is invalid").
One thing I don't know is:
- If we see out of memory, should we assume that will resolve? How long do we wait before crashing/making it clear to the user (and our telemetry) that Zed is broken.
- For invalid labels, sounds like those won't resolve? How do we recover usefully..
Not sure if you have any ideas or can link me to apps that use wgpu and have good error handling
There was a problem hiding this comment.
Going to try and cook up a patch that uses this (and the device_lost_callback) to try and increase the reliability.
Looking at this again, if the panic still happens, it would be very weird if that would help with actual recovery. That would mean the device successfully acquired the frame without errors, but then fails to create the textures for the surface, that would likely be a bug in the vulkan driver itself, unless I fundamentally miss something here. But if you have the telemetry for this, please let me know if your PR fixed this issue at least.
We see a bunch of errors right now (mostly "out of memory" but occasional "Texture with label is invalid").
For vulkan OOM errors, I do believe that guarding against them causing a panic is worth it. On a resize when we rapidly reconfigure the surface again and again and constantly create new textures for it, a spike in VRAM usage is kinda expected, and it is a reasonable assumption that these errors are transient and we can render frames again a few ms later. It would be interesting to see though if this is just an issue of spikes in VRAM usage in certain scenarios, or if we leak something somewhere or a growing buffers too large.
I unfortunately won't be able to play around with this until maybe end of coming week, but if you try and cook something, don't forget to guard other resource allocations points (e.g. grow_instance_buffer()) too. :)
For invalid labels, sounds like those won't resolve? How do we recover usefully..
As I said, if this PR doesn't fully resolve this, we can of course try to use an error scope to handle it gracefully and try again next frame, but if my understanding is correct, that would mean that wgpu/vulkan behaves fully out of spec. Maybe just trying again works, maybe that means we fully lost the device and the only chance to recover is fully reinitialize the full rendering context? Since I couldn't reproduce this yet, only thing I can really recommend is to look closely at any telemetry you get, maybe try the error scope thing and monitor if retrying helps recovering.
Not sure if you have any ideas or can link me to apps that use wgpu and have good error handling
If I find the time I can try and find the parts where/if e.g. Iced or Bevy handle this kinda stuff (although I believe the Bevy renderer is much more complicated).
| }; | ||
|
|
||
| let Some(path_intermediate_view) = self.path_intermediate_view.as_ref() else { | ||
| return false; |
There was a problem hiding this comment.
I think the semantics are wrong here. Returning false means instance buffer overflow and would cause the calling side to grow the instance buffer and retry the frame. I believe this should just return true if don't have path_intermediate_view to skip paths.
| }); | ||
|
|
||
| let Some(path_intermediate_view) = self.path_intermediate_view.as_ref() else { | ||
| return false; |
|
/cherry-pick preview |
Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize
Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize
… to preview) (#50343) Cherry-pick of #50169 to preview ---- Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
… to stable) (#50344) Cherry-pick of #50169 to stable ---- Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize
Fixes Zed-5AW Fixes Zed-5AP Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax? The panic is: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::update_drawable_size (wgpu_renderer.rs:784) gpui::platform::linux::x11::window::X11WindowStatePtr::set_bounds (window.rs:1169) gpui::platform::linux::x11::client::X11Client::handle_event (client.rs:902) ``` or: ``` wgpu error: Validation Error Caused by: In Texture::create_view Texture with 'path_intermediate' label is invalid gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::create_path_intermediate (wgpu_renderer.rs:742) gpui::platform::wgpu::wgpu_renderer::WgpuRenderer::new (wgpu_renderer.rs:274) gpui::platform::linux::x11::window::X11WindowState::new::{{closure}} (window.rs:698) gpui::platform::linux::x11::window::X11WindowState::new (window.rs:488) gpui::platform::linux::x11::window::X11Window::new (window.rs:814) gpui::platform::linux::x11::client::X11Client::open_window (client.rs:1514) gpui::platform::linux::platform::<T>::open_window (platform.rs:289) gpui::window::Window::new (window.rs:1119) gpui::app::App::open_window::{{closure}} (app.rs:1025) gpui::app::App::update (app.rs:835) gpui::app::App::open_window (app.rs:1022) ``` I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet) Release Notes: - Linux: Fixed a panic in the new WPGU renderer during resize
Fixes Zed-5AW
Fixes Zed-5AP
Claude believes this is the right fix, but would love someone who knows more about graphics than me to take a look: @reflectronic / @zortax?
The panic is:
or:
I haven't seen a Wayland equivalent (not sure if that's because it doesn't happen on Wayalnd or because I havent' seen it yet)
Release Notes: