-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Context
I'm currently working on a codebase where I identified a repeated INSERT operation being performed.
As you may know, I've been a long time advocate for using UUIDs (and specifically, UUID5 / UUID6) for synthetic entity identifiers.
Due to recent advancements in profiling technology (Sentry, Tideways, etc.), I've managed to find a typical N+1 query:
INSERT INTO foo (bar, baz) VALUES (?, ?);
INSERT INTO foo (bar, baz) VALUES (?, ?);
-- ...
-- repeats until entities of the same type are goneWhat can be done?
When AUTO_INCREMENT, SERIAL or post-insert identifiers in general aren't necessary, we can perform bulk / batch INSERT operations: this reduces the number of roundtrips to the DB, which is mostly network time.
Current behavior
I've traced the operation as follows:
UnitOfWorkcomputes insert order:Lines 1038 to 1040 in 05c8c5f
private function executeInserts(): void { $entities = $this->computeInsertExecutionOrder(); computeInsertExecutionOrderproduces a sorted list ofINSERToperations:Lines 1191 to 1246 in 05c8c5f
private function computeInsertExecutionOrder(): array { $sort = new TopologicalSort(); // First make sure we have all the nodes foreach ($this->entityInsertions as $entity) { $sort->addNode($entity); } // Now add edges foreach ($this->entityInsertions as $entity) { $class = $this->em->getClassMetadata($entity::class); foreach ($class->associationMappings as $assoc) { // We only need to consider the owning sides of to-one associations, // since many-to-many associations are persisted at a later step and // have no insertion order problems (all entities already in the database // at that time). if (! $assoc->isToOneOwningSide()) { continue; } $targetEntity = $class->getFieldValue($entity, $assoc->fieldName); // If there is no entity that we need to refer to, or it is already in the // database (i. e. does not have to be inserted), no need to consider it. if ($targetEntity === null || ! $sort->hasNode($targetEntity)) { continue; } // An entity that references back to itself _and_ uses an application-provided ID // (the "NONE" generator strategy) can be exempted from commit order computation. // See https://github.com/doctrine/orm/pull/10735/ for more details on this edge case. // A non-NULLable self-reference would be a cycle in the graph. if ($targetEntity === $entity && $class->isIdentifierNatural()) { continue; } // According to https://www.doctrine-project.org/projects/doctrine-orm/en/2.14/reference/annotations-reference.html#annref_joincolumn, // the default for "nullable" is true. Unfortunately, it seems this default is not applied at the metadata driver, factory or other // level, but in fact we may have an undefined 'nullable' key here, so we must assume that default here as well. // // Same in \Doctrine\ORM\Tools\EntityGenerator::isAssociationIsNullable or \Doctrine\ORM\Persisters\Entity\BasicEntityPersister::getJoinSQLForJoinColumns, // to give two examples. $joinColumns = reset($assoc->joinColumns); $isNullable = ! isset($joinColumns->nullable) || $joinColumns->nullable; // Add dependency. The dependency direction implies that "$entity depends on $targetEntity". The // topological sort result will output the depended-upon nodes first, which means we can insert // entities in that order. $sort->addEdge($entity, $targetEntity, $isNullable); } } return $sort->sort(); } EntityPersister#addInsert()is called in a loop:Lines 1038 to 1048 in 05c8c5f
private function executeInserts(): void { $entities = $this->computeInsertExecutionOrder(); $eventsToDispatch = []; foreach ($entities as $entity) { $oid = spl_object_id($entity); $class = $this->em->getClassMetadata($entity::class); $persister = $this->getEntityPersister($class->name); $persister->addInsert($entity); - in the same loop,
EntityPersister#executeInserts()is called:Lines 1048 to 1052 in 05c8c5f
$persister->addInsert($entity); unset($this->entityInsertions[$oid]); $persister->executeInserts(); EntityPersister#executeInserts()creates a prepared statement, and fires it at the DB:orm/src/Persisters/Entity/BasicEntityPersister.php
Lines 237 to 285 in 05c8c5f
public function executeInserts(): void { if (! $this->queuedInserts) { return; } $uow = $this->em->getUnitOfWork(); $idGenerator = $this->class->idGenerator; $isPostInsertId = $idGenerator->isPostInsertGenerator(); $stmt = $this->conn->prepare($this->getInsertSQL()); $tableName = $this->class->getTableName(); foreach ($this->queuedInserts as $key => $entity) { $insertData = $this->prepareInsertData($entity); if (isset($insertData[$tableName])) { $paramIndex = 1; foreach ($insertData[$tableName] as $column => $value) { $stmt->bindValue($paramIndex++, $value, $this->columnTypes[$column]); } } $stmt->executeStatement(); if ($isPostInsertId) { $generatedId = $idGenerator->generateId($this->em, $entity); $id = [$this->class->identifier[0] => $generatedId]; $uow->assignPostInsertId($entity, $generatedId); } else { $id = $this->class->getIdentifierValues($entity); } if ($this->class->requiresFetchAfterChange) { $this->assignDefaultVersionAndUpsertableValues($entity, $id); } // Unset this queued insert, so that the prepareUpdateData() method knows right away // (for the next entity already) that the current entity has been written to the database // and no extra updates need to be scheduled to refer to it. // // In \Doctrine\ORM\UnitOfWork::executeInserts(), the UoW already removed entities // from its own list (\Doctrine\ORM\UnitOfWork::$entityInsertions) right after they // were given to our addInsert() method. unset($this->queuedInserts[$key]); } }
The plan
The idea is to either change UnitOfWork#executeInserts() or UnitOfWork#computeInsertExecutionOrder() to produce a multidimensional array of micro-batches, where batches are only possible:
- for the same entity type
- for entities with an already assigned identifier
I would then extract the EntityPersister#executeInserts() call, so that it does not get performed in a loop (where applicable), and therefore internally reuses the same prepared statement.
I can then also optimize the EntityPersister#executeInserts() logic to generate SQL for a full batch (single statement).
The plan does NOT involve any DBAL API addition/change.
The question
Is there interest in this optimization?
Specifically, can I work on it, or is the plan flawed? This is mostly probing whether there is interest in this.