Flush commands after every mutation in `WorldEntityMut` by nakedible · Pull Request #16219 · bevyengine/bevy

nakedible · 2024-11-03T16:01:11Z

Objective

Currently adding observers spawns an entity which implicitly flushes the command queue, which can cause undefined behaviour if the WorldEntityMut is used after this
The reason WorldEntityMut attempted to (unsuccessfully) avoid flushing commands until finished was that such commands may move or despawn the entity being referenced, invalidating the cached location.
With the introduction of hooks and observers, this isn't sensible anymore as running the commands generated by hooks immediately is required to maintain correct ordering of operations and to not expose the world in an inconsistent state
Objective is to make command flushing deterministic and fix the related issues
Fixes observe on WorldEntityMut may cause the reference to be invalidated #16212
Fixes Commands apply at unexpected times with exclusive World access #14621
Fixes Commands queued by component hooks are not applied during exclusive world access #16034

Solution

Allow WorldEntityMut to exist even when it refers to a despawned entity by allowing EntityLocation to be marked invalid
Add checks to all methods to panic if trying to access a despawned entity
Flush command queue after every operation that might trigger hooks or observers
Update entity location always after flushing command queue

Testing

Added test cases for currently broken behaviour
Added test cases that flushes happen in all operations
Added test cases to ensure hooks and commands are run exactly in correct order when nested

Todo:

Write migration guide
Add tests that using EntityWorldMut on a despawned entity panics
Add tests that commands are flushed after every operation that is supposed to flush them
Add tests that hooks, observers and their spawned commands are run in the correct order when nested

Migration Guide

Previously EntityWorldMut triggered command queue flushes in unpredictable places, which could interfere with hooks and observers. Now the command queue is flushed always immediately after any call in EntityWorldMut that spawns or despawns an entity, or adds, removes or replaces a component. This means hooks and observers will run their commands in the correct order.

As a side effect, there is a possibility that a hook or observer could despawn the entity that is being referred to by EntityWorldMut. This could already currently happen if an observer was added while keeping an EntityWorldMut referece and would cause unsound behaviour. If the entity has been despawned, calling any methods which require the entity location will panic. This matches the behaviour that Commands will panic if called on an already despawned entity. In the extremely rare case where taking a new EntityWorldMut reference or otherwise restructuring the code so that this case does not happen is not possible, there's a new is_despawned method that can be used to check if the referred entity has been despawned.

alice-i-cecile · 2024-11-03T16:09:37Z

Yes please, I'd like a small Migration Guide note here.

maniwani

Yes, this is pretty much what I was thinking. I assume branch prediction should hide any overhead from this since the condition should basically never be true.

I think hitting this case should issue a warning. And we might want to have a #[track_caller] on the affected EntityWorldMut methods so that anyone who sees the warning knows where to start looking.

Something like...

#[inline(never)]
#[cold]
fn on_entity_invalid(entity: Entity, caller: &'static Location) {
    warn!("error[B<insert-error-number-here>]: {caller} tried to modify the entity {:?}, but it no longer exists. See: https://bevyengine.org/learn/errors/b<insert-error-number-here>", entity);
}

maniwani · 2024-11-03T18:12:40Z

Ah, wait. Since EntityWorldMut can be converted into the other kinds of entity references, we may need to add the invalid location check to them as well (or make the conversion fallible).

Some of the UnsafeEntityCell methods those references use probably assume its location is valid.

nakedible · 2024-11-04T10:20:44Z

warn or warn_once?

maximetinu · 2024-11-05T23:32:58Z

Hello! 👋

I lack context about the changes in this PR, but by petition of @alice-i-cecile , I checked out this branch to see if I can still reproduce the crash of flush() needs to be called before this operation is legal that I was hitting in my code

The answer is that no, I cannot.

The thing is, I can't repro it in bevy main neither, so this bug must have been fixed at some point between Bevy 0.14.2 (the version where I encountered it) and current Bevy main atow

Here, for reference:

this is my minimal reproduceable example, on top of Bevy 0.14.2:

https://github.com/Maximetinu/bevy/commits/minimal-reproduceable-bug-on-remove-hook/

see changes in latest commit

running that modified example does reproduce the bug

notice that the cause is just spawning an empty entity during a on_remove hook

This is my minimal reproduceable example cherry-picked to this branch:

https://github.com/Maximetinu/bevy/commits/world-entity-mut-flushes-checking-if-panic-keeps-happening/

see changes in latest commit, they are the same as in the branch above

running that modified example does not reproduce the bug

so, this branch works 🥳

However, bevy main also works:

https://github.com/Maximetinu/bevy/commits/reproducing-on-remove-hook-panic-on-bevy-main/

which means that, well, this branch does not fix per se this bug, because it is already fixed on Bevy main

but at least it means that it's not re-introducing it neither 🙂

Maybe my minimal reproduceable example can help to write a test covering from this specific scenario

I hope this helps!

alice-i-cecile · 2024-11-05T23:34:25Z

I'd love to see that turned into a test, either in this PR or separately :)

maximetinu · 2024-11-06T00:16:01Z

I'd love to see that turned into a test, either in this PR or separately :)

I have made a test for this, that fails at 0.14.2 but passes in current bevy main, so it should cover from this regression

#[test]
fn test_spawn_entities_during_on_remove_hook() {
    #[derive(Component)]
    struct MyComponent;

    App::new()
        .add_plugins(MinimalPlugins)
        .add_systems(Startup, |world: &mut World| {
            world.register_component_hooks::<MyComponent>().on_remove(
                |mut world, _entity, _component_id| {
                    world.commands().spawn_empty();
                },
            );
        })
        .add_systems(
            Update,
            |mut commands: Commands,
                my_component_q: Query<Entity, With<MyComponent>>,
                mut exit_event: EventWriter<AppExit>,
                mut after_first_frame: Local<bool>| {
                for entity in &my_component_q {
                    commands.entity(entity).despawn();
                }

                commands.spawn(MyComponent);

                if *after_first_frame {
                    exit_event.send(AppExit::Success);
                }

                *after_first_frame = true;
            },
        )
        .run();
}

It's a bit of an unusual test (I'm not even sure if MinimalPlugins can be used in a test like this). It's more an integration test rather than a unit test. I don't know if there is some place inside Bevy repo where more of these tests exist. If so, I can open a PR to add it.

alice-i-cecile · 2024-11-06T13:17:44Z

Let's spin this out into a new issue :)

nakedible · 2024-11-09T10:04:38Z

@maniwani Updated design (just quick notes for now):

Handle also UnsafeEntityCell
Instead of setting location entirely to invalid, instead set both archetype_id and table_id to "empty", but keep archetype_row and table_row as invalid. That allows us to behave more like an empty entity for most APIs.

nakedible · 2024-11-17T13:03:05Z

I changed this pull to instead panic rather than warn and try to behave if nothing is wrong. The reason for this is that if we only try to warn, there's a quite wide ranging knock-off effect into UnsafeEntityCell and a lot of other places which store EntityLocation. If we instead panic, the effects (and the extra runtime check) are limited to just EntityWorldMut. Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned (https://bevyengine.org/learn/errors/b0003/) so this is similar behaviour. There should be nobody depending on this behaviour currently as calling any of the methods after the entity has been despawned by hooks would have already been unsound in the past – so if it happens, then the panics will reveal places where there has been already been a bug.

nakedible · 2024-11-17T13:19:02Z

@alice-i-cecile @maniwani I'd love to get feedback if the new panicing approach is good or bad. Also, is it reasonable to add a panics section to all the methods which might panic for this. It should truly be a corner case when this happens, so I'm not sure if it's sensible to spam it everywhere (and it cannot be put into some of the non-fallible From conversions).

nakedible · 2024-11-17T13:36:20Z

Implementation of panic is still placeholderish, just to note.

alice-i-cecile · 2024-11-17T13:53:50Z

Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned

For what it's worth, this behavior is widely hated :)

I think I'm broadly on board with panicking here though, especially as a first pass. This really is an edge case, and I think that trying to figure out better ways to prevent this bad state from ever arising is more useful than bubbling the error up.

nakedible · 2024-11-25T01:21:46Z

If #16499 gets merged before this one, one of the test cases will break, as it should when the ordering of hooks and observers is changed.

nakedible · 2024-11-25T17:53:41Z

@alice-i-cecile -S-Waiting-on-Author +S-Needs-Review 🙇

alice-i-cecile

Excellent tests and good analysis. I'd like to clean up the (very niche) panics here by swapping to results, but that's best left for follow-up.

chescock

I don't entirely understand what world.flush() does, but I do believe this PR calls assert_not_despawned(); before every read or write of the entity or its components and calls world.flush() and update_location() after every write, so self.location should always be correct.

The one exception is that the world_scope() method calls update_location() but not world.flush(). Should that call world.flush() as well?

chescock · 2024-12-05T18:30:45Z

crates/bevy_ecs/src/world/entity_ref.rs

+    #[inline(always)]
+    #[track_caller]
+    fn assert_not_despawned(&self) {
+        if self.location.archetype_id == ArchetypeId::INVALID {


This could use the is_despawned() method if you wanted.

Suggested change

if self.location.archetype_id == ArchetypeId::INVALID {

if self.is_despawned() {

hymm · 2024-12-05T18:50:48Z

An alternative to checking in every method if the entity has been despawned would be to panic after calling world.flush instead of checking in each method. This would disallow despawning the entity in hooks, but I think that makes sense that you wouldn't be able to despawn the EntityWorldMut while holding it.

nakedible · 2024-12-05T19:11:42Z

I don't entirely understand what world.flush() does, but I do believe this PR calls assert_not_despawned(); before every read or write of the entity or its components and calls world.flush() and update_location() after every write, so self.location should always be correct.

The one exception is that the world_scope() method calls update_location() but not world.flush(). Should that call world.flush() as well?

World flush runs any Commands that have been queued. Commands are queued by hooks and observers if they want to do archetype changes, as they only get a DeferredWorld.

The world_scope function just allows a function to run with the world, but it doesn't necessarily queue any commands. If it does, it's the responsibility of the function to flush if necessary.

There has been talk of having commands apply immediately when holding a &mut World, like in this world_scope function, which would solve the whole issue of asking the user to flush.

nakedible · 2024-12-05T19:17:48Z

An alternative to checking in every method if the entity has been despawned would be to panic after calling world.flush instead of checking in each method. This would disallow despawning the entity in hooks, but I think that makes sense that you wouldn't be able to despawn the EntityWorldMut while holding it.

It would be an alternative, but instead the aim is to go to another direction, to make the methods fallible instead.

Also, there are two other reasons.

First, there aren't any other ways to make archetype changes to entities but to take WorldEntityMut via world.entity_mut(ent). So if despawning the current entity would be totally disallowed, then there would not be a way to safely trigger such a change as you can't not hold a reference while doing it.

Second, you can't really prevent hooks and observers from making these types of changes in some special cases. So there isn't a workaround for that. Where as just dropping a WorldEntityMut after an archetype mutation operation is a trivial thing to do by the user.

NthTensor · 2024-12-05T19:46:23Z

FWIW i think panicking is fine here; there's similar stuff in the engine that already works this way. We can reconsider it in a broader review of bevy's error handling. But especially for guarding invariant going into unsafe code, panicking is probably the way to go.

…6219) # Objective - Currently adding observers spawns an entity which implicitly flushes the command queue, which can cause undefined behaviour if the `WorldEntityMut` is used after this - The reason `WorldEntityMut` attempted to (unsuccessfully) avoid flushing commands until finished was that such commands may move or despawn the entity being referenced, invalidating the cached location. - With the introduction of hooks and observers, this isn't sensible anymore as running the commands generated by hooks immediately is required to maintain correct ordering of operations and to not expose the world in an inconsistent state - Objective is to make command flushing deterministic and fix the related issues - Fixes bevyengine#16212 - Fixes bevyengine#14621 - Fixes bevyengine#16034 ## Solution - Allow `WorldEntityMut` to exist even when it refers to a despawned entity by allowing `EntityLocation` to be marked invalid - Add checks to all methods to panic if trying to access a despawned entity - Flush command queue after every operation that might trigger hooks or observers - Update entity location always after flushing command queue ## Testing - Added test cases for currently broken behaviour - Added test cases that flushes happen in all operations - Added test cases to ensure hooks and commands are run exactly in correct order when nested --- Todo: - [x] Write migration guide - [x] Add tests that using `EntityWorldMut` on a despawned entity panics - [x] Add tests that commands are flushed after every operation that is supposed to flush them - [x] Add tests that hooks, observers and their spawned commands are run in the correct order when nested --- ## Migration Guide Previously `EntityWorldMut` triggered command queue flushes in unpredictable places, which could interfere with hooks and observers. Now the command queue is flushed always immediately after any call in `EntityWorldMut` that spawns or despawns an entity, or adds, removes or replaces a component. This means hooks and observers will run their commands in the correct order. As a side effect, there is a possibility that a hook or observer could despawn the entity that is being referred to by `EntityWorldMut`. This could already currently happen if an observer was added while keeping an `EntityWorldMut` referece and would cause unsound behaviour. If the entity has been despawned, calling any methods which require the entity location will panic. This matches the behaviour that `Commands` will panic if called on an already despawned entity. In the extremely rare case where taking a new `EntityWorldMut` reference or otherwise restructuring the code so that this case does not happen is not possible, there's a new `is_despawned` method that can be used to check if the referred entity has been despawned.

nakedible force-pushed the world-entity-mut-flushes branch from 90f42bc to 82340bf Compare November 3, 2024 16:03

alice-i-cecile requested review from alice-i-cecile and maniwani November 3, 2024 16:09

alice-i-cecile added this to the 0.16 milestone Nov 3, 2024

alice-i-cecile added C-Bug An unexpected or incorrect behavior A-ECS Entities, components, systems, and events P-High This is particularly urgent, and deserves immediate attention M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 3, 2024

alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed P-High This is particularly urgent, and deserves immediate attention labels Nov 3, 2024

maniwani reviewed Nov 3, 2024

View reviewed changes

alice-i-cecile mentioned this pull request Nov 6, 2024

Add regression test for spawning entities during remove hooks #16260

Open

alice-i-cecile mentioned this pull request Nov 13, 2024

ChildBuilder does not respect required components #16380

Closed

nakedible force-pushed the world-entity-mut-flushes branch 4 times, most recently from 12d4e58 to ef4fad4 Compare November 17, 2024 13:02

nakedible force-pushed the world-entity-mut-flushes branch from 59f2798 to 5075420 Compare November 24, 2024 18:56

nakedible force-pushed the world-entity-mut-flushes branch from edc5cd2 to eddf012 Compare November 24, 2024 22:40

nakedible marked this pull request as ready for review November 24, 2024 22:40

nakedible force-pushed the world-entity-mut-flushes branch from eddf012 to aca60b6 Compare November 24, 2024 22:47

Add test for EntityWorldMut despawned entity causing panic

4668d66

nakedible force-pushed the world-entity-mut-flushes branch 2 times, most recently from b90ed0b to 4668d66 Compare November 24, 2024 23:38

nakedible requested a review from maniwani November 25, 2024 17:57

alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Nov 25, 2024

alice-i-cecile requested a review from chescock December 5, 2024 18:17

alice-i-cecile approved these changes Dec 5, 2024

View reviewed changes

alice-i-cecile requested a review from NthTensor December 5, 2024 18:20

alice-i-cecile added the P-Unsound A bug that results in undefined compiler behavior label Dec 5, 2024

hymm self-requested a review December 5, 2024 18:21

chescock approved these changes Dec 5, 2024

View reviewed changes

alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 5, 2024

alice-i-cecile added this pull request to the merge queue Dec 5, 2024

Merged via the queue into bevyengine:main with commit 76d610d Dec 5, 2024

chescock mentioned this pull request Dec 10, 2024

Added modify_component to EntityWorldMut, DeferredWorld, and World #16668

Merged

alice-i-cecile mentioned this pull request Dec 11, 2024

Panic when calling EntityWorldMut::insert after adding two observers #16760

Closed

nicopap mentioned this pull request Aug 3, 2025

Allow freezing of command execution or capturing commands #20395

Open

	if self.location.archetype_id == ArchetypeId::INVALID {
	if self.is_despawned() {

Uh oh!

Conversation

nakedible commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Testing

Migration Guide

Uh oh!

alice-i-cecile commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maniwani left a comment

Choose a reason for hiding this comment

Uh oh!

maniwani commented Nov 3, 2024

Uh oh!

nakedible commented Nov 4, 2024

Uh oh!

maximetinu commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alice-i-cecile commented Nov 5, 2024

Uh oh!

maximetinu commented Nov 6, 2024

Uh oh!

alice-i-cecile commented Nov 6, 2024

Uh oh!

nakedible commented Nov 9, 2024

Uh oh!

nakedible commented Nov 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nakedible commented Nov 17, 2024

Uh oh!

nakedible commented Nov 17, 2024

Uh oh!

alice-i-cecile commented Nov 17, 2024

Uh oh!

nakedible commented Nov 25, 2024

Uh oh!

nakedible commented Nov 25, 2024

Uh oh!

alice-i-cecile left a comment

Choose a reason for hiding this comment

Uh oh!

chescock left a comment

Choose a reason for hiding this comment

Uh oh!

chescock Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

hymm commented Dec 5, 2024

Uh oh!

nakedible commented Dec 5, 2024

Uh oh!

nakedible commented Dec 5, 2024

Uh oh!

NthTensor commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

nakedible commented Nov 3, 2024 •

edited

Loading

alice-i-cecile commented Nov 3, 2024 •

edited

Loading

maximetinu commented Nov 5, 2024 •

edited

Loading

nakedible commented Nov 17, 2024 •

edited

Loading

NthTensor commented Dec 5, 2024 •

edited

Loading