Skip to content

Render Recovery#22761

Merged
alice-i-cecile merged 19 commits intobevyengine:mainfrom
atlv24:ad/render-error-handling
Feb 5, 2026
Merged

Render Recovery#22761
alice-i-cecile merged 19 commits intobevyengine:mainfrom
atlv24:ad/render-error-handling

Conversation

@atlv24
Copy link
Copy Markdown
Contributor

@atlv24 atlv24 commented Feb 1, 2026

Objective

Solution

  • Use wgpu::Device::set_device_lost_callback and wgpu::Device::on_uncaptured_error to listen for errors.
  • Add a state machine for the renderer
  • Update it on error
  • Add a RenderErrorHandler to let users specify behavior on error by returning a specific RenderErrorPolicy
  • This lets us for example ignore validation errors, delete responsible entities, or reload the renderer if the device was lost.

Testing

    .insert_resource(bevy_render::error_handler::RenderErrorHandler(|_, _, _| {
        bevy_render::error_handler::RenderErrorPolicy::StopRendering
    }))
    .insert_resource(bevy_render::error_handler::RenderErrorHandler(|_, _, _| {
        bevy_render::error_handler::RenderErrorPolicy::Recover(default())
    }))

Note: no release note yet, as recovery does not exactly work well: this PR gets us to the point of being able to care about it, but we currently instantly crash on recover due to gpu resources not existing anymore. We need to build more resilience before publicizing imo.

@atlv24 atlv24 added A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 1, 2026
@atlv24 atlv24 force-pushed the ad/render-error-handling branch from b896165 to c848450 Compare February 1, 2026 01:27
pub(crate) fn update(main_world: &mut World, render_world: &mut World) -> bool {
match render_world.resource::<RenderState>() {
RenderState::Initializing => {
render_world.insert_resource(RenderState::Ready);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really just instantly transition from Initializing into Ready?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the transition into Ready should be done by RenderStartup itself (in case it has its own fallibility).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to keep all the state-machine-y things in the function together

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with keeping this together for now, but I suspect we may need to refactor this later.

@IQuick143 IQuick143 added C-Feature A new feature, making something new possible P-Crash A sudden unexpected crash D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Feb 1, 2026
Co-authored-by: Kristoffer Søholm <k.soeholm@gmail.com>
@valaphee
Copy link
Copy Markdown
Contributor

valaphee commented Feb 3, 2026

This should also solve #21407

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Feb 5, 2026
Merged via the queue into bevyengine:main with commit 71dd9ea Feb 5, 2026
38 checks passed
@atlv24 atlv24 mentioned this pull request Feb 18, 2026
github-merge-queue bot pushed a commit that referenced this pull request Mar 16, 2026
# Objective

- Reduce panics in bevy renderer
- Continuation of Render Recovery efforts #22761 (can't reinitialize if
we panic. I hit every single one of these on the device lost path.)

## Solution

- Use early outs and error!s instead of hard panics.
- If the panics are needed to maintain some kind of invariant, that
should be documented. I don't see any comment saying they do, so I am
assuming they do not. Correct me if I am wrong.

## Testing

- CI

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Brennan Paciorek <50780403+BrennanPaciorek@users.noreply.github.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 20, 2026
# Objective

- Continue render recovery effort #22761
- Part of goal #23029
- Reload gpu resources easily

## Solution

- Add an extension trait to insert resources on RenderStartup using
from_world
- replace almost every usage of init_resource that holds anything
derived from a RenderDevice with init_gpu_resource so that it may be
reinitialized on recovery
- Note: "almost every" because there is a handful of slightly more
involved cases I will leave to a follow ups.

## Testing

- render recovery example still crashes, i have a branch with it working
that i am just pulling out reviewable bits from
- we should also verify that this doesnt break examples, as it does
slightly modify behavior: gpu resources are initialized slightly later
than they used to be, because they wait until RenderStartup instead of
doing it immediately.
- i mostly want to get the largest part of the change out of the way
first
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2026
# Objective

- Fallback image samplers need to be recreated on RenderDevice reset.
- Continue Render Recovery efforts #23350 #22761
- Part of goal #23029

## Solution

- They have a bit more involved ordering requirements, thats why I split
them out to this PR.

## Testing

- ran some examples, and ambiguity detection
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2026
# Objective

- Continue Render Recovery efforts #23350 #22761 #23433 #23458
- Part of goal #23029
- make indirect parameter buffers and batched instance buffers
reinitalize on recovery

## Solution

- split out the stuff that shouldnt be reinitialized from them
- use init_gpu_resource

## Testing

- examples run
- in combination with the rest of the fixes and a couple other local
changes i havent PRd yet, render_recovery example works.
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2026
# Objective

- Continue Render Recovery efforts #23350 #22761 #23433
- Part of goal #23029

## Solution

- Clear view upscaling pipelines on reload

## Testing

- existing examples run fine. render_recovery example does not work yet,
but in combination with the other fixes and some not-yet-PR'd fixes this
is a load bearing change needed for it to work

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2026
# Objective

- Continue Render Recovery efforts #23350 #22761 #23433 #23458 #23459
- Part of goal #23029
- Make shaders work after reload

## Solution

- This is a kinda ugly hack. I explored like 5 different ways of doing
this, none of them are satisfying and they are all much larger diffs
than this. The crux of the problem is that the composer's capabilities
may differ on the new device, due to switching from dedicated to
integrated GPU. This means that we cannot even retain the composed
modules. The ShaderCache and PipelineCache are quite annoyingly tangled,
and I have some glimpses of how to fix it in the future but I dont want
to block render recovery on it.
- For now, we just do the kinda brute force thing and reinsert all the
shaders, recompose etc

## Testing

- examples run
- in combination with the rest of the fixes and a couple other local
changes i havent PRd yet, render_recovery example works.
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2026
# Objective

- Continue Render Recovery efforts #23350 #22761 #23433 #23458 #23459
#23461
- Part of goal #23029
- Make render assets exist again after reload

## Solution

- We re-extract everything from the main world. This assumes things
*exist* on the main world, which is not actually always true. This is
enough for most examples and simple usage to be recoverable, but it's
really butting up against bevy_asset deficiencies. The next step to
making this truly production grade is asset streaming, which will
probably be my next goal.

## Testing

- examples run
- in combination with the rest of the fixes, render_recovery example
works.
github-merge-queue bot pushed a commit that referenced this pull request Mar 23, 2026
# Objective

- Completes goal and closes #23029
- Culmination of #22761, #23350, #23349, #23433, #23458, #23444, #23459,
#23461, #23463, #22714, #22759, #16481

## Solution

- Add a release note.
- Re-export a wgpu type that you need to match on to handle errors.

## Testing

- cargo run --example render_recovery with all the other PRs merged in.
Press 5 and then V, the app will not crash. Note that D for "destroy
device" will still crash: this is a WGPU problem resolved by
gfx-rs/wgpu#9281.

# Note

I opted not to change the default recovery behavior yet. I believe we
need testing in user projects and just general trodding of this code
path before committing to a new default. It works in a simple example,
it might not work in a complex project. We need to field test this and
likely iterate to really call this ready IMO.
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
…gine#23349)

# Objective

- Reduce panics in bevy renderer
- Continuation of Render Recovery efforts bevyengine#22761 (can't reinitialize if
we panic. I hit every single one of these on the device lost path.)

## Solution

- Use early outs and error!s instead of hard panics.
- If the panics are needed to maintain some kind of invariant, that
should be documented. I don't see any comment saying they do, so I am
assuming they do not. Correct me if I am wrong.

## Testing

- CI

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Brennan Paciorek <50780403+BrennanPaciorek@users.noreply.github.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Continue render recovery effort bevyengine#22761
- Part of goal bevyengine#23029
- Reload gpu resources easily

## Solution

- Add an extension trait to insert resources on RenderStartup using
from_world
- replace almost every usage of init_resource that holds anything
derived from a RenderDevice with init_gpu_resource so that it may be
reinitialized on recovery
- Note: "almost every" because there is a handful of slightly more
involved cases I will leave to a follow ups.

## Testing

- render recovery example still crashes, i have a branch with it working
that i am just pulling out reviewable bits from
- we should also verify that this doesnt break examples, as it does
slightly modify behavior: gpu resources are initialized slightly later
than they used to be, because they wait until RenderStartup instead of
doing it immediately.
- i mostly want to get the largest part of the change out of the way
first
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Fallback image samplers need to be recreated on RenderDevice reset.
- Continue Render Recovery efforts bevyengine#23350 bevyengine#22761
- Part of goal bevyengine#23029

## Solution

- They have a bit more involved ordering requirements, thats why I split
them out to this PR.

## Testing

- ran some examples, and ambiguity detection
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Continue Render Recovery efforts bevyengine#23350 bevyengine#22761 bevyengine#23433 bevyengine#23458
- Part of goal bevyengine#23029
- make indirect parameter buffers and batched instance buffers
reinitalize on recovery

## Solution

- split out the stuff that shouldnt be reinitialized from them
- use init_gpu_resource

## Testing

- examples run
- in combination with the rest of the fixes and a couple other local
changes i havent PRd yet, render_recovery example works.
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Continue Render Recovery efforts bevyengine#23350 bevyengine#22761 bevyengine#23433
- Part of goal bevyengine#23029

## Solution

- Clear view upscaling pipelines on reload

## Testing

- existing examples run fine. render_recovery example does not work yet,
but in combination with the other fixes and some not-yet-PR'd fixes this
is a load bearing change needed for it to work

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Continue Render Recovery efforts bevyengine#23350 bevyengine#22761 bevyengine#23433 bevyengine#23458 bevyengine#23459
- Part of goal bevyengine#23029
- Make shaders work after reload

## Solution

- This is a kinda ugly hack. I explored like 5 different ways of doing
this, none of them are satisfying and they are all much larger diffs
than this. The crux of the problem is that the composer's capabilities
may differ on the new device, due to switching from dedicated to
integrated GPU. This means that we cannot even retain the composed
modules. The ShaderCache and PipelineCache are quite annoyingly tangled,
and I have some glimpses of how to fix it in the future but I dont want
to block render recovery on it.
- For now, we just do the kinda brute force thing and reinsert all the
shaders, recompose etc

## Testing

- examples run
- in combination with the rest of the fixes and a couple other local
changes i havent PRd yet, render_recovery example works.
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Continue Render Recovery efforts bevyengine#23350 bevyengine#22761 bevyengine#23433 bevyengine#23458 bevyengine#23459
bevyengine#23461
- Part of goal bevyengine#23029
- Make render assets exist again after reload

## Solution

- We re-extract everything from the main world. This assumes things
*exist* on the main world, which is not actually always true. This is
enough for most examples and simple usage to be recoverable, but it's
really butting up against bevy_asset deficiencies. The next step to
making this truly production grade is asset streaming, which will
probably be my next goal.

## Testing

- examples run
- in combination with the rest of the fixes, render_recovery example
works.
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
# Objective

- Completes goal and closes bevyengine#23029
- Culmination of bevyengine#22761, bevyengine#23350, bevyengine#23349, bevyengine#23433, bevyengine#23458, bevyengine#23444, bevyengine#23459,
bevyengine#23461, bevyengine#23463, bevyengine#22714, bevyengine#22759, bevyengine#16481

## Solution

- Add a release note.
- Re-export a wgpu type that you need to match on to handle errors.

## Testing

- cargo run --example render_recovery with all the other PRs merged in.
Press 5 and then V, the app will not crash. Note that D for "destroy
device" will still crash: this is a WGPU problem resolved by
gfx-rs/wgpu#9281.

# Note

I opted not to change the default recovery behavior yet. I believe we
need testing in user projects and just general trodding of this code
path before committing to a new default. It works in a simple example,
it might not work in a complex project. We need to field test this and
likely iterate to really call this ready IMO.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes P-Crash A sudden unexpected crash S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants