Skip to content

Compute the commit order (inserts/deletes) on the entity level#10547

Merged
greg0ire merged 35 commits intodoctrine:2.16.xfrom
mpdude:commit-order-entity-level
Aug 1, 2023
Merged

Compute the commit order (inserts/deletes) on the entity level#10547
greg0ire merged 35 commits intodoctrine:2.16.xfrom
mpdude:commit-order-entity-level

Conversation

@mpdude
Copy link
Copy Markdown
Contributor

@mpdude mpdude commented Feb 27, 2023

Fixes some long-standing issues that have to do with commit order computation and cyclic associations.

When performing inserts or deletes of entities, the ORM tries to find a sequence of operations so that all foreign key constraints can be satisfied. Basically, before inserting an entity, all other entities it refers to have to be inserted. And before deleting an entity, all other entities that refer to it have to be deleted, or database-level SET NULL has to be used.

The current implementation for this works at the entity class level. It checks which entity class refers to which other class(es), and based on this information, computes an ordering of entity classes. Then, all entities of the first class will be processed, then those of the second class and so on. The same ordering is used for deletions.

In the PRs #10531 and #10532, two examples are given why this is not sufficient: There may be references between entities of the same class, so that we can only insert those in the database by looking at their associations at the entity level. Or, we may have references between entities of two different classes that require us to go back and forth – insert one entity of the first class, before we can insert one for the second class, before we can go back to the first class for a third entity.

Then, there is a shortcoming in the current CommitOrderCalculator: It does not detect cycles at all. This refers to situations where we cannot find a commit order because every single entity to be processed requires another one to come first. Cycles can be broken at NULLable foreign keys, where a NULL value can temporarily be used and the value updated at a later stage. This is called "extra updates". Places where such extra updates can be used to save the day can be found by backtracking in the commit order computation, in case cycles are found.

Last, the commit order for INSERTs and DELETEs needs to be based on different criteria: INSERTs can make use of extra updates and require NULLable foreign key definitions for this to work. DELETEs, however, currently do not make use of such extra updates. For them, whether the database can use "SET NULL" on foreign key deletions is the important factor.

To make it possible to review the changes, they were broken down into a series of independent PRs against a temporary branch. This PR here does a final merge of the result into 2.16.x.

The individual PRs are:

(Potentially) fixed issues/closed PRs:

Not fixed issues/PRs:

(Probably) related issues/PR with unclear state and/or missing reproducers:

Definitely too big for this PR:

  • Performance: Persisters no longer being able to batch inserts
  • Performance: Optimizing the DFS to find a minimal feedback arc set/minimize the number of scheduled extra updates

@rvanlaak
Copy link
Copy Markdown
Contributor

rvanlaak commented Feb 28, 2023

I am a bit rusty on the topic regarding the debugging of the commit order, but based on looking at the CommitOrderCalculator your suggested changes look extremely promising in solving the long standing commit order bug. 👏🏻

Copy link
Copy Markdown
Member

@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll need to review this with more time than I have right now. Awesome work. 🤯

@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Mar 1, 2023

@derrabus Yes, this PR is much bigger than what I'd like to see to make reviews easy.

I did not find a good way to break it down into chunks that would make sense on their own and could be merged. Computing the order on the entity level with a broken CommitOrderCalculator does not work. Fixing the COC will break use cases that currently work as long as we don't consider individual entities.

What I could offer is that I try to break this down into pieces and guide you and everyone interested through reviews in dedicated PRs, with smaller scope each.

We'd need to merge all those PRs into a dedicated branch when they are approved. Tests for individual features probably can be working, but the full test suite won't pass until we're done.

In the end, the temporary branch could be merged and should roughly be equivalent to this PR here.

WDYT – would that help, or do you have other suggestions?

@derrabus
Copy link
Copy Markdown
Member

derrabus commented Mar 1, 2023

I could also offer to setup a Google Meet call where you can guide me and maybe @greg0ire and @beberlei through the changes. That might be more effective.

jhony1104
jhony1104 previously approved these changes Mar 7, 2023
Copy link
Copy Markdown

@jhony1104 jhony1104 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no complete review, but since @mpdude asked if I could look at it:

Only looking at the CommitOrderCalculator, unlike #8349, #7866, #8067, #7180, #9377 and others this PR correctly deals with circles using backtracing and if needed ignores nullable relations. Unlike some of the listet PRs it does not only work reliably for (simple) testcases, but I have also testet it in our product with ~130 tables (where for example #8349 didn't work).

Merging this would allow us to start using unmodified doctrine.

derrabus pushed a commit that referenced this pull request Jul 8, 2023
…hen using application-provided IDs (#10735)

This change improves scheduling of extra updates in the `BasicEntityPersister`.

Extra updates can be avoided when
* the referred-to entity has already been inserted during the current insert batch/transaction
* we have a self-referencing entity with application-provided ID values (the `NONE` generator strategy).

As a corollary, with this change applications that provide their own IDs can define self-referencing associations as not NULLable.

I am considering this a bugfix since the ORM previously executed additional queries that were not strictly necessary, and that required users to work with NULLable columns where conceptually a non-NULLable column would be valid and more expressive.

One caveat, though:

In the absence of entity-level commit ordering (#10547), it is not guaranteed that entities with self-references (at the class level) will be inserted in a suitable order. The order depends on the sequence in which the entities were added with `persist()`.

Fixes #7877, closes #7882.

Co-authored-by: Sylvain Fabre <sylvain.fabre@assoconnect.com>
@sylfabre
Copy link
Copy Markdown
Contributor

Hello @mpdude

Thank you for addressing this long-standing issue 👍

I tried your branch as per your comments on #10714 & #10716
Unfortunately, I get an exception in Statement.php with this message:

 [PDOException (42000)] SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '?, ?, ?, ?, ?)' at line 1

This comes from the insert statement INSERT INTO payment_settings (updated_at, retribution_method, id, created_at, organization_id) VALUES (?, ?, ?, ?, ?) which is done with an empty params array 😬

The same code works using doctrine/orm v2.15.3

So I cannot confirm if your PR solves my issues #10713 & #10715

Let me know how I may help you.
Our database contains over 250 tables so it can be a strong real-world test!

@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Jul 16, 2023

@sylfabre Can you provide a test (as a PR against the entity-level-commit-order branch) that demonstrates the issue?

The test outlined in #10713 works directly with the CommitOrderCalculator, a class replaced by this PR here.

@sylfabre
Copy link
Copy Markdown
Contributor

Hello @mpdude

I'm sorry my last message was a false positive 👌

After checking my codebase, I discovered a postPersist listener triggering a flush operation. It leads to trying to insert two times the same entity: the first time works, the second fails because the entity changeset is now empty so the insert query is missing a lot of params. I guess flushing within a postPersist listener is a bad practice and we are just lucky it works today!

Thank you again for your work

…cation-provided IDs

This excludes such associations from the commit order computation, since the foreign key constraint will be satisfied when inserting the row.

See doctrine#10735 for more details about this edge case.
@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Aug 1, 2023

Added f01d107 to address a new edge case supported in 2.16 since #10735.

@greg0ire greg0ire merged commit fd7a14a into doctrine:2.16.x Aug 1, 2023
@greg0ire
Copy link
Copy Markdown
Member

greg0ire commented Aug 1, 2023

Congrats on this huge piece of work, and welcome back!

@rvanlaak
Copy link
Copy Markdown
Contributor

rvanlaak commented Aug 1, 2023

Congrats! This is an awesome step in the reliability of the commit order.

@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Aug 1, 2023

Thanks to everyone involved and supporting this over the last months with many reviews in a very constructive and responsive manner – especially @greg0ire, @SenseException and @derrabus!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

7 participants