Skip to content

Flush commands after every mutation in WorldEntityMut#16219

Merged
alice-i-cecile merged 10 commits intobevyengine:mainfrom
nakedible:world-entity-mut-flushes
Dec 5, 2024
Merged

Flush commands after every mutation in WorldEntityMut#16219
alice-i-cecile merged 10 commits intobevyengine:mainfrom
nakedible:world-entity-mut-flushes

Conversation

@nakedible
Copy link
Copy Markdown
Contributor

@nakedible nakedible commented Nov 3, 2024

Objective

Solution

  • Allow WorldEntityMut to exist even when it refers to a despawned entity by allowing EntityLocation to be marked invalid
  • Add checks to all methods to panic if trying to access a despawned entity
  • Flush command queue after every operation that might trigger hooks or observers
  • Update entity location always after flushing command queue

Testing

  • Added test cases for currently broken behaviour
  • Added test cases that flushes happen in all operations
  • Added test cases to ensure hooks and commands are run exactly in correct order when nested

Todo:

  • Write migration guide
  • Add tests that using EntityWorldMut on a despawned entity panics
  • Add tests that commands are flushed after every operation that is supposed to flush them
  • Add tests that hooks, observers and their spawned commands are run in the correct order when nested

Migration Guide

Previously EntityWorldMut triggered command queue flushes in unpredictable places, which could interfere with hooks and observers. Now the command queue is flushed always immediately after any call in EntityWorldMut that spawns or despawns an entity, or adds, removes or replaces a component. This means hooks and observers will run their commands in the correct order.

As a side effect, there is a possibility that a hook or observer could despawn the entity that is being referred to by EntityWorldMut. This could already currently happen if an observer was added while keeping an EntityWorldMut referece and would cause unsound behaviour. If the entity has been despawned, calling any methods which require the entity location will panic. This matches the behaviour that Commands will panic if called on an already despawned entity. In the extremely rare case where taking a new EntityWorldMut reference or otherwise restructuring the code so that this case does not happen is not possible, there's a new is_despawned method that can be used to check if the referred entity has been despawned.

@nakedible nakedible force-pushed the world-entity-mut-flushes branch from 90f42bc to 82340bf Compare November 3, 2024 16:03
@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Nov 3, 2024
@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-ECS Entities, components, systems, and events P-High This is particularly urgent, and deserves immediate attention M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 3, 2024
@alice-i-cecile
Copy link
Copy Markdown
Member

alice-i-cecile commented Nov 3, 2024

Yes please, I'd like a small Migration Guide note here.

@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed P-High This is particularly urgent, and deserves immediate attention labels Nov 3, 2024
Copy link
Copy Markdown
Contributor

@maniwani maniwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is pretty much what I was thinking. I assume branch prediction should hide any overhead from this since the condition should basically never be true.

I think hitting this case should issue a warning. And we might want to have a #[track_caller] on the affected EntityWorldMut methods so that anyone who sees the warning knows where to start looking.

Something like...

#[inline(never)]
#[cold]
fn on_entity_invalid(entity: Entity, caller: &'static Location) {
    warn!("error[B<insert-error-number-here>]: {caller} tried to modify the entity {:?}, but it no longer exists. See: https://bevyengine.org/learn/errors/b<insert-error-number-here>", entity);
}

@maniwani
Copy link
Copy Markdown
Contributor

maniwani commented Nov 3, 2024

Ah, wait. Since EntityWorldMut can be converted into the other kinds of entity references, we may need to add the invalid location check to them as well (or make the conversion fallible).

Some of the UnsafeEntityCell methods those references use probably assume its location is valid.

@nakedible
Copy link
Copy Markdown
Contributor Author

warn or warn_once?

@maximetinu
Copy link
Copy Markdown
Contributor

maximetinu commented Nov 5, 2024

Hello! 👋

I lack context about the changes in this PR, but by petition of @alice-i-cecile , I checked out this branch to see if I can still reproduce the crash of flush() needs to be called before this operation is legal that I was hitting in my code

The answer is that no, I cannot.

The thing is, I can't repro it in bevy main neither, so this bug must have been fixed at some point between Bevy 0.14.2 (the version where I encountered it) and current Bevy main atow

Here, for reference:


this is my minimal reproduceable example, on top of Bevy 0.14.2:

https://github.com/Maximetinu/bevy/commits/minimal-reproduceable-bug-on-remove-hook/

see changes in latest commit

running that modified example does reproduce the bug

notice that the cause is just spawning an empty entity during a on_remove hook


This is my minimal reproduceable example cherry-picked to this branch:

https://github.com/Maximetinu/bevy/commits/world-entity-mut-flushes-checking-if-panic-keeps-happening/

see changes in latest commit, they are the same as in the branch above

running that modified example does not reproduce the bug

so, this branch works 🥳


However, bevy main also works:

https://github.com/Maximetinu/bevy/commits/reproducing-on-remove-hook-panic-on-bevy-main/

which means that, well, this branch does not fix per se this bug, because it is already fixed on Bevy main

but at least it means that it's not re-introducing it neither 🙂


Maybe my minimal reproduceable example can help to write a test covering from this specific scenario

I hope this helps!

@alice-i-cecile
Copy link
Copy Markdown
Member

I'd love to see that turned into a test, either in this PR or separately :)

@maximetinu
Copy link
Copy Markdown
Contributor

I'd love to see that turned into a test, either in this PR or separately :)

I have made a test for this, that fails at 0.14.2 but passes in current bevy main, so it should cover from this regression

#[test]
fn test_spawn_entities_during_on_remove_hook() {
    #[derive(Component)]
    struct MyComponent;

    App::new()
        .add_plugins(MinimalPlugins)
        .add_systems(Startup, |world: &mut World| {
            world.register_component_hooks::<MyComponent>().on_remove(
                |mut world, _entity, _component_id| {
                    world.commands().spawn_empty();
                },
            );
        })
        .add_systems(
            Update,
            |mut commands: Commands,
                my_component_q: Query<Entity, With<MyComponent>>,
                mut exit_event: EventWriter<AppExit>,
                mut after_first_frame: Local<bool>| {
                for entity in &my_component_q {
                    commands.entity(entity).despawn();
                }

                commands.spawn(MyComponent);

                if *after_first_frame {
                    exit_event.send(AppExit::Success);
                }

                *after_first_frame = true;
            },
        )
        .run();
}

It's a bit of an unusual test (I'm not even sure if MinimalPlugins can be used in a test like this). It's more an integration test rather than a unit test. I don't know if there is some place inside Bevy repo where more of these tests exist. If so, I can open a PR to add it.

@alice-i-cecile
Copy link
Copy Markdown
Member

Let's spin this out into a new issue :)

@nakedible
Copy link
Copy Markdown
Contributor Author

@maniwani Updated design (just quick notes for now):

  • Handle also UnsafeEntityCell
  • Instead of setting location entirely to invalid, instead set both archetype_id and table_id to "empty", but keep archetype_row and table_row as invalid. That allows us to behave more like an empty entity for most APIs.

@nakedible nakedible force-pushed the world-entity-mut-flushes branch 4 times, most recently from 12d4e58 to ef4fad4 Compare November 17, 2024 13:02
@nakedible
Copy link
Copy Markdown
Contributor Author

nakedible commented Nov 17, 2024

I changed this pull to instead panic rather than warn and try to behave if nothing is wrong. The reason for this is that if we only try to warn, there's a quite wide ranging knock-off effect into UnsafeEntityCell and a lot of other places which store EntityLocation. If we instead panic, the effects (and the extra runtime check) are limited to just EntityWorldMut. Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned (https://bevyengine.org/learn/errors/b0003/) so this is similar behaviour. There should be nobody depending on this behaviour currently as calling any of the methods after the entity has been despawned by hooks would have already been unsound in the past – so if it happens, then the panics will reveal places where there has been already been a bug.

@nakedible
Copy link
Copy Markdown
Contributor Author

@alice-i-cecile @maniwani I'd love to get feedback if the new panicing approach is good or bad. Also, is it reasonable to add a panics section to all the methods which might panic for this. It should truly be a corner case when this happens, so I'm not sure if it's sensible to spam it everywhere (and it cannot be put into some of the non-fallible From conversions).

@nakedible
Copy link
Copy Markdown
Contributor Author

Implementation of panic is still placeholderish, just to note.

@alice-i-cecile
Copy link
Copy Markdown
Member

Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned

For what it's worth, this behavior is widely hated :)

I think I'm broadly on board with panicking here though, especially as a first pass. This really is an edge case, and I think that trying to figure out better ways to prevent this bad state from ever arising is more useful than bubbling the error up.

@nakedible nakedible force-pushed the world-entity-mut-flushes branch from 59f2798 to 5075420 Compare November 24, 2024 18:56
@nakedible nakedible force-pushed the world-entity-mut-flushes branch from edc5cd2 to eddf012 Compare November 24, 2024 22:40
@nakedible nakedible marked this pull request as ready for review November 24, 2024 22:40
@nakedible nakedible force-pushed the world-entity-mut-flushes branch from eddf012 to aca60b6 Compare November 24, 2024 22:47
@nakedible nakedible force-pushed the world-entity-mut-flushes branch 2 times, most recently from b90ed0b to 4668d66 Compare November 24, 2024 23:38
@nakedible
Copy link
Copy Markdown
Contributor Author

If #16499 gets merged before this one, one of the test cases will break, as it should when the ordering of hooks and observers is changed.

@nakedible
Copy link
Copy Markdown
Contributor Author

@alice-i-cecile -S-Waiting-on-Author +S-Needs-Review 🙇

@nakedible nakedible requested a review from maniwani November 25, 2024 17:57
@alice-i-cecile alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Nov 25, 2024
Copy link
Copy Markdown
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent tests and good analysis. I'd like to clean up the (very niche) panics here by swapping to results, but that's best left for follow-up.

@alice-i-cecile alice-i-cecile added the P-Unsound A bug that results in undefined compiler behavior label Dec 5, 2024
@hymm hymm self-requested a review December 5, 2024 18:21
Copy link
Copy Markdown
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely understand what world.flush() does, but I do believe this PR calls assert_not_despawned(); before every read or write of the entity or its components and calls world.flush() and update_location() after every write, so self.location should always be correct.

The one exception is that the world_scope() method calls update_location() but not world.flush(). Should that call world.flush() as well?

#[inline(always)]
#[track_caller]
fn assert_not_despawned(&self) {
if self.location.archetype_id == ArchetypeId::INVALID {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use the is_despawned() method if you wanted.

Suggested change
if self.location.archetype_id == ArchetypeId::INVALID {
if self.is_despawned() {

@hymm
Copy link
Copy Markdown
Contributor

hymm commented Dec 5, 2024

An alternative to checking in every method if the entity has been despawned would be to panic after calling world.flush instead of checking in each method. This would disallow despawning the entity in hooks, but I think that makes sense that you wouldn't be able to despawn the EntityWorldMut while holding it.

@nakedible
Copy link
Copy Markdown
Contributor Author

I don't entirely understand what world.flush() does, but I do believe this PR calls assert_not_despawned(); before every read or write of the entity or its components and calls world.flush() and update_location() after every write, so self.location should always be correct.

The one exception is that the world_scope() method calls update_location() but not world.flush(). Should that call world.flush() as well?

World flush runs any Commands that have been queued. Commands are queued by hooks and observers if they want to do archetype changes, as they only get a DeferredWorld.

The world_scope function just allows a function to run with the world, but it doesn't necessarily queue any commands. If it does, it's the responsibility of the function to flush if necessary.

There has been talk of having commands apply immediately when holding a &mut World, like in this world_scope function, which would solve the whole issue of asking the user to flush.

@nakedible
Copy link
Copy Markdown
Contributor Author

An alternative to checking in every method if the entity has been despawned would be to panic after calling world.flush instead of checking in each method. This would disallow despawning the entity in hooks, but I think that makes sense that you wouldn't be able to despawn the EntityWorldMut while holding it.

It would be an alternative, but instead the aim is to go to another direction, to make the methods fallible instead.

Also, there are two other reasons.

First, there aren't any other ways to make archetype changes to entities but to take WorldEntityMut via world.entity_mut(ent). So if despawning the current entity would be totally disallowed, then there would not be a way to safely trigger such a change as you can't not hold a reference while doing it.

Second, you can't really prevent hooks and observers from making these types of changes in some special cases. So there isn't a workaround for that. Where as just dropping a WorldEntityMut after an archetype mutation operation is a trivial thing to do by the user.

@NthTensor
Copy link
Copy Markdown
Contributor

NthTensor commented Dec 5, 2024

FWIW i think panicking is fine here; there's similar stuff in the engine that already works this way. We can reconsider it in a broader review of bevy's error handling. But especially for guarding invariant going into unsafe code, panicking is probably the way to go.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 5, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 5, 2024
Merged via the queue into bevyengine:main with commit 76d610d Dec 5, 2024
ecoskey pushed a commit to ecoskey/bevy that referenced this pull request Jan 6, 2025
…6219)

# Objective

- Currently adding observers spawns an entity which implicitly flushes
the command queue, which can cause undefined behaviour if the
`WorldEntityMut` is used after this
- The reason `WorldEntityMut` attempted to (unsuccessfully) avoid
flushing commands until finished was that such commands may move or
despawn the entity being referenced, invalidating the cached location.
- With the introduction of hooks and observers, this isn't sensible
anymore as running the commands generated by hooks immediately is
required to maintain correct ordering of operations and to not expose
the world in an inconsistent state
- Objective is to make command flushing deterministic and fix the
related issues
- Fixes bevyengine#16212
- Fixes bevyengine#14621 
- Fixes bevyengine#16034

## Solution

- Allow `WorldEntityMut` to exist even when it refers to a despawned
entity by allowing `EntityLocation` to be marked invalid
- Add checks to all methods to panic if trying to access a despawned
entity
- Flush command queue after every operation that might trigger hooks or
observers
- Update entity location always after flushing command queue

## Testing

- Added test cases for currently broken behaviour
- Added test cases that flushes happen in all operations
- Added test cases to ensure hooks and commands are run exactly in
correct order when nested

---

Todo:

- [x] Write migration guide
- [x] Add tests that using `EntityWorldMut` on a despawned entity panics
- [x] Add tests that commands are flushed after every operation that is
supposed to flush them
- [x] Add tests that hooks, observers and their spawned commands are run
in the correct order when nested

---

## Migration Guide

Previously `EntityWorldMut` triggered command queue flushes in
unpredictable places, which could interfere with hooks and observers.
Now the command queue is flushed always immediately after any call in
`EntityWorldMut` that spawns or despawns an entity, or adds, removes or
replaces a component. This means hooks and observers will run their
commands in the correct order.

As a side effect, there is a possibility that a hook or observer could
despawn the entity that is being referred to by `EntityWorldMut`. This
could already currently happen if an observer was added while keeping an
`EntityWorldMut` referece and would cause unsound behaviour. If the
entity has been despawned, calling any methods which require the entity
location will panic. This matches the behaviour that `Commands` will
panic if called on an already despawned entity. In the extremely rare
case where taking a new `EntityWorldMut` reference or otherwise
restructuring the code so that this case does not happen is not
possible, there's a new `is_despawned` method that can be used to check
if the referred entity has been despawned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide P-Unsound A bug that results in undefined compiler behavior S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

None yet

7 participants