Skip to content

Add EAGER_BATCHED fetchmode to OneToMany association#1569

Closed
woutervanvliet wants to merge 4 commits intodoctrine:masterfrom
woutervanvliet:EAGER_BATCHED
Closed

Add EAGER_BATCHED fetchmode to OneToMany association#1569
woutervanvliet wants to merge 4 commits intodoctrine:masterfrom
woutervanvliet:EAGER_BATCHED

Conversation

@woutervanvliet
Copy link
Copy Markdown
Contributor

In regular EAGER mode, doctrine adds a JOIN for a OneToMany association
forcing the database to load more data in memory than essentially
needed. In a classic Article => Tag concept, each article would be
loaded one extra time for every addition tag.

LAZY mode issues a query for every article to load all associated tags,
also adding more overhead to the database than needed.

This new fetchmode addreses this issue by loading all tags associated to
an article into one additional query.

In regular EAGER mode, doctrine adds a JOIN for a OneToMany association
forcing the database to load more data in memory than essentially
needed. In a classic Article => Tag concept, each article would be
loaded one extra time for every addition tag.

LAZY mode issues a query for every article to load all associated tags,
also adding more overhead to the database than needed.

This new fetchmode addreses this issue by loading all tags associated to
an article into one additional query.
@doctrinebot
Copy link
Copy Markdown

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DDC-4017

We use Jira to track the state of pull requests and the versions they got
included in.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use tabs

@EVODelavega
Copy link
Copy Markdown

Please follow the coding standards: indent with 4 spaces, replace Array() with array() or [], and put opening brackets for methods/classes on a separate line. (check doctrine coding standards for details)

    public function scheduleCollectionForBatchLoading(PersistentCollection $collection) {
        $mapping = $collection->getMapping();
        $name = $mapping['sourceEntity'].$mapping['fieldName'];

        if (!isset($this->batchedEagerLoadingEntities[$name])) {
            $this->batchedEagerLoadingEntities[$name] = Array(
                'items'     => Array(),
                'mapping'   => $mapping,
            );
        }

        $this->batchedEagerLoadingEntities[$name]['items'][] = $collection;
    }

Should be:

    //indentation changed to spaces
    public function scheduleCollectionForBatchLoading(PersistentCollection $collection)
    {//<-- new line
        $mapping = $collection->getMapping();
        //surround with spaces
        $name = $mapping['sourceEntity'] . $mapping['fieldName'];

        if (!isset($this->batchedEagerLoadingEntities[$name])) {
            $this->batchedEagerLoadingEntities[$name] = array(//array or [
                'items'     => array(),//array or [],
                'mapping'   => $mapping,
            );
        }

        $this->batchedEagerLoadingEntities[$name]['items'][] = $collection;
    }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing doc-block

And other responses to the PR
@woutervanvliet
Copy link
Copy Markdown
Contributor Author

@EVODelavega Thanks! Now I also understand what you meant with your comment on Array, back on Stackoverflow. I've amended as good as possible.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$tags needs to be initialized in the constructor to a new instance of the Doctrine\Common\Collections\ArrayCollection class. See doctrine documentation for details, especially 5.13 as to why this is required

@EVODelavega
Copy link
Copy Markdown

I must say that the test-case isn't really a good fit here. I'd expect Articles and Tags to be either a unidirectional one-to-many relation, or a bidirectional many-to-many relation (A collection of tags, zero or more of which can be linked to articles, tags not being linked to articles OR articles being linked to zero or more tags, each tag being linked to one or more articles).
The idea that each tag can have only one article linked to it seems to me to be quite absurd.

This in turn begs the question, if we apply this EAGER_BATCHED mode to many-to-many relation entities, what the possible headaches would be when cascading things like DETACH:

$article1 = $em->find('Article', 1);
$article2 = $em->find('Article', 2);
//if $article2 shares tags with $article1, this might throw an exception:

$em->detach($article1);
//if this tag is shared, and detach cascades, wouldn't the tag instance be unmanaged here?
$article2->getTags()->get(1)->setName('new tag name');
$em->persist($article2);
$em->flush();//if the altered tag is indeed unmanaged, this throws an exception

@woutervanvliet
Copy link
Copy Markdown
Contributor Author

The idea that each tag can have only one article linked to it seems to me to be quite absurd.

That depends on how one defines Tag. To me, a Tag is a word linked to an Article. The same word may be linked to more articles, in which case it is their own Tag instance. Hence the shared primary key, over $name and $article. And, as you can see, the test case actually creates 100 articles all with the exact same 7 words as tag.

As per the doctrine documentation you linked, one-to-many has to be bidirectional.

@EVODelavega
Copy link
Copy Markdown

@interpotential One-to-many doesn't have to be bidirectional (5.6). But aside of that, the current way your entities are set up, you might end up with a series of tags like this:

Name article
foo 1
foo 2
foo 3

These are the exact same words, but the're unique keys (because of the article id). That, to me, seems wasteful. I'd probably use a relational table just storing an article id + tag id like so:

article_id tag_id
1 1
1 2
2 1
2 3

To avoid tag duplication, but this is turning into a DB design discussion, instead of a review of a PR

@woutervanvliet
Copy link
Copy Markdown
Contributor Author

True - though a discussion about DB design is also interesting ;-).

Btw, with all your very useful feedback I'm surprised you overlooked me having misnamed the unit test. Update coming up, once travis-ci has told me that it's ok.

@bas-noorlander
Copy link
Copy Markdown

Any updates on this?

@Koc
Copy link
Copy Markdown
Contributor

Koc commented Jul 23, 2017

ping @Ocramius

@Ocramius
Copy link
Copy Markdown
Member

@Koc @Laniax I didn't look into this at all, sorry.

@natebrunette
Copy link
Copy Markdown

Is there a plan to review this? It would be very useful for us.

@Ocramius
Copy link
Copy Markdown
Member

Sorry, I really never got around checking this, but overall my focus is on removing features before we add anything new.

@antonmedv
Copy link
Copy Markdown

This is must have feature. It will make everything so easier.

@beberlei
Copy link
Copy Markdown
Member

The simplicity of this is beautiful, as it just needs one point in the "hotpath" to hook into to get it working. The additional code to triggerEagerLoads looks overly complex to me and use some non-Doctrineisms, but I believe this is something we should look into.

In SQL Alchemy this strategy is called "SELECT IN loading": https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html

@beberlei beberlei added this to the 2.8.0 milestone Feb 16, 2020
@arthurlauck
Copy link
Copy Markdown

This would be a amazing functionallity, I used a lot of Eloquent from laravel, and this is a nice way to load relations. Im developing a new applications using doctrine and I stumbled in a problem.

I have a one-to-many relation, but the one is a pagination, so the solution would be use a joined eager load, so it could run in 1 queries like:

select * from left join product_image product;

The would be ok, but is hard to cache, if any of the product_image change, I need to invalidate all this collection, in the select in method this would be so much easier.

select * from product; and select * from product_image where id in ({ids}), so I could separeted in two differentes queries, if any image change, I wouldn't need to invalidate product collection cache, just the product image cache.

@beberlei beberlei modified the milestones: 2.8.0, 2.8.1 Dec 4, 2020
@beberlei beberlei modified the milestones: 2.8.1, 2.8.2 Dec 4, 2020
@beberlei beberlei modified the milestones: 2.8.2, 2.9.0 Dec 13, 2020
beberlei added a commit to beberlei/doctrine2 that referenced this pull request Dec 13, 2020
beberlei added a commit to beberlei/doctrine2 that referenced this pull request Dec 13, 2020
…ions.

Co-authored-by: Wouter M. van Vliet <wouter@interpotential.com>
@beberlei
Copy link
Copy Markdown
Member

Superseded by #8391

@beberlei beberlei closed this Dec 13, 2020
@beberlei beberlei removed this from the 2.9.0 milestone Dec 13, 2020
maddanio pushed a commit to maddanio/orm that referenced this pull request Feb 24, 2023
…ions.

Co-authored-by: Wouter M. van Vliet <wouter@interpotential.com>
maddanio pushed a commit to maddanio/orm that referenced this pull request Feb 24, 2023
…ions.

Co-authored-by: Wouter M. van Vliet <wouter@interpotential.com>
beberlei added a commit to beberlei/doctrine2 that referenced this pull request Oct 10, 2023
…ions.

Co-authored-by: Wouter M. van Vliet <wouter@interpotential.com>
beberlei added a commit to beberlei/doctrine2 that referenced this pull request Oct 14, 2023
greg0ire added a commit that referenced this pull request Nov 14, 2023
[GH-1569] Optimize eager fetch for collections to batch query
derrabus added a commit to derrabus/orm that referenced this pull request Nov 15, 2023
* 2.17.x:
  Deprecate annotation classes for named queries
  Fix typos
  Housekeeping: Revert change to AbstractExporter, not needed without subselect fetch.
  Address review comments.
  Explain internals of eager loading in a bit more detail and how its configured.
  1:1 and M:1 associations also use fetch batch size configuration now.
  Add another testcase for DQL based fetch eager of collection.
  last violation hopefully
  Static analysis
  Housekeeping: phpcs
  Directly load many to many collections, batching not supported yet. fix tests.
  Avoid new fetch mode, use this strategy with fetch=EAGER for collections.
  Make sure to many assocatinos are also respecting AbstractQuery::setFetchMode
  Disallow use of fetch=SUBSELECT on to-one associations.
  Go through Persister API instead of indirectly through repository.
  Introduce configuration option for subselect batch size.
  Houskeeping: phpcs
  Disallow WITH keyword on fetch joined associatiosn via subselect.
  [doctrineGH-1569] Add new SUBSELECT fetch mode for OneToMany associations.
derrabus added a commit to derrabus/orm that referenced this pull request Nov 15, 2023
* 2.17.x:
  Deprecate annotation classes for named queries
  Fix typos
  Housekeeping: Revert change to AbstractExporter, not needed without subselect fetch.
  Address review comments.
  Explain internals of eager loading in a bit more detail and how its configured.
  1:1 and M:1 associations also use fetch batch size configuration now.
  Add another testcase for DQL based fetch eager of collection.
  last violation hopefully
  Static analysis
  Housekeeping: phpcs
  Directly load many to many collections, batching not supported yet. fix tests.
  Avoid new fetch mode, use this strategy with fetch=EAGER for collections.
  Make sure to many assocatinos are also respecting AbstractQuery::setFetchMode
  Disallow use of fetch=SUBSELECT on to-one associations.
  Go through Persister API instead of indirectly through repository.
  Introduce configuration option for subselect batch size.
  Houskeeping: phpcs
  Disallow WITH keyword on fetch joined associatiosn via subselect.
  [doctrineGH-1569] Add new SUBSELECT fetch mode for OneToMany associations.
derrabus added a commit to derrabus/orm that referenced this pull request Nov 15, 2023
* derrabus/3.0.x:
  Deprecate annotation classes for named queries
  Fix typos
  Housekeeping: Revert change to AbstractExporter, not needed without subselect fetch.
  Address review comments.
  Explain internals of eager loading in a bit more detail and how its configured.
  1:1 and M:1 associations also use fetch batch size configuration now.
  Add another testcase for DQL based fetch eager of collection.
  last violation hopefully
  Static analysis
  Housekeeping: phpcs
  Directly load many to many collections, batching not supported yet. fix tests.
  Avoid new fetch mode, use this strategy with fetch=EAGER for collections.
  Make sure to many assocatinos are also respecting AbstractQuery::setFetchMode
  Disallow use of fetch=SUBSELECT on to-one associations.
  Go through Persister API instead of indirectly through repository.
  Introduce configuration option for subselect batch size.
  Houskeeping: phpcs
  Disallow WITH keyword on fetch joined associatiosn via subselect.
  [doctrineGH-1569] Add new SUBSELECT fetch mode for OneToMany associations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.