Skip to content

Add syntax for CTE materialization hint to ActiveRecord#48305

Open
97jaz wants to merge 1 commit intorails:mainfrom
97jaz:activerecord-materialized-cte
Open

Add syntax for CTE materialization hint to ActiveRecord#48305
97jaz wants to merge 1 commit intorails:mainfrom
97jaz:activerecord-materialized-cte

Conversation

@97jaz
Copy link
Contributor

@97jaz 97jaz commented May 26, 2023

Motivation / Background

Following on the Arel work in #48261, this commit adds the ability to provide a materialization hint to a CTE in ActiveRecord. To support the hint, a more verbose syntax for CTE values is introduced:

Post.with(posts_with_comments: { query: Post.where("comments_count > ?", 0), materialized: true })

The CTE value is described as a hash that contains a required :query key and an optional :materialized key. The set of available options could be expanded in the future to accommodate, for example, postgres's various options for recursive CTEs.

Additional information

I don't know whether there's any appetite within the core team to expose this functionality in AR's public API. But if there is, this approach seemed like the most plausible choice, given the current #with interface. That said, I'm more than happy to talk about alternatives.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

Following on the Arel work in rails#48261, this commit adds the ability
to provide a materialization hint to a CTE in ActiveRecord. To
support the hint, a more verbose syntax for CTE values is
introduced:

   Post.with(t: { query: Post.where("comments_count > ?", 0), materialized: true })

The CTE value is described as a hash that contains a required
:query key and an optional :materialized key. The set of available
options could be expanded in the future to accomodate, for example,
postgres's various options for recursive CTEs.
@benedikt
Copy link

I'm not a huge fan of the nested query/materialized options, but I totally get why you chose them.

Here's an alternative idea:

Post.with(posts_with_comments: Post.where("comments_count > ?", 0), materialized: true)

This would materialize all CTEs mentioned in this particular with call. Mixing and matching materialized would require multiple separate calls:

Post.with(posts_with_comments: Post.where("comments_count > ?", 0), materialized: true)
    .with(posts_without_comments: Post.where("comments_count = ?", 0), materialized: false)

It reads a bit nicer in my opinion. However, it also makes it impossible to use materialized (or any future options) as a name for a CTE.

@97jaz
Copy link
Contributor Author

97jaz commented May 26, 2023

However, it also makes it impossible to use materialized (or any future options) as a name for a CTE.

Yeah, this is why I chose the more verbose syntax (which, I agree is ungainly). In particular, conflating the namespace for CTE names with the namespace for options makes adding new options backwards-incompatible. Strictly speaking, we could use a mixture of hash and keyword arguments, like:

def with(*args, materialized: nil)
  [...]
end

In which case, a caller can distinguish between use of the keyword and use of the hash by being more explicit with the hash syntax:

x.with({ materialized: [...] }, materialized: true)

... but, then, it would still be dangerous to add new options, since we wouldn't expect programmers to use the explicit (braced) syntax most of the time.

@97jaz
Copy link
Contributor Author

97jaz commented May 26, 2023

Another option in the design space is to change the definition of #with to:

def with(cte_hash, opts = {})

That is, change #with so that it only takes a single hash of name-value pairs that describe CTEs. It then takes a second, optional hash of options. Passing options is uglier in this version, but that seems preferable to the previous version, since (a) options will be less frequently used, and (b) adding new ones doesn't break existing code:

x.with(materialized: Post.all, { materialized: true })

This is, however, incompatible with the current definition, which takes a variable number of CTE hashes.

@benedikt
Copy link

I feel like this has already been implicitly ruled out by discussions in #47951 and #48261, but this would also work:

Post.with(posts_with_comments: Post.where("comments_count > ?", 0).materialized)

@97jaz
Copy link
Contributor Author

97jaz commented Jun 12, 2023

@matthewd Don't know if you have an opinion on this, but since you commented on the arel part, I figured I'd ask.

@matthewd
Copy link
Member

So I've been pondering this, and I find myself drawn in somewhat the opposite direction I pushed with #48261... 🤔

I kinda like this:

Post.with(posts_with_comments: Post.where("comments_count > ?", 0).materialized)

If we did go this way (a Relation#materialized property), we should definitely check that with accepts a raw Arel::Nodes::Cte, for cases where you want to include materialization on a non-Relation argument.

The particular reason I'm revisiting the idea of a Relation#materialized is not CTEs at all. Rather, I'm exploring the thought of trying to support it, where feasible, as a general optimization barrier.

Where the query is being passed into a CTE, we can obviously achieve that via our new materialized option... otherwise, it seems like a pretty common thing for SQL dialects to support in practice, even if not officially. I don't know if it's still true, but historically in PostgreSQL this was spelled OFFSET 0; I understand the same effect can be produced in MySQL by double-nesting the subquery (id IN (SELECT * FROM (SELECT id ..) q)).

Thoughts? Does this sound useful, or just a fruitless mental detour?

@97jaz
Copy link
Contributor Author

97jaz commented Jun 27, 2023

I see the appeal of ActiveRecord::Relation#materialized as a general optimization barrier, but in most contexts this would be a binary option; you're either adding the barrier or you're not. When using #with, however, it's a ternary option; you're either requesting a barrier, requesting no barrier, or not making any request at all. ActiveRecord::Relation#materialized could take an argument (true, false, or nil), and it could default to true, resulting in the example syntax. But in non-WITH contexts, there's no distinction to be drawn between the false and nil cases. So it seems to me that the use cases really aren't quite the same.

@erick-c-mc
Copy link

Hello @97jaz! Are there plans to move this forward?

@97jaz
Copy link
Contributor Author

97jaz commented Jul 17, 2024

@erick-c-mc After my last comment, I was waiting to see if @matthewd had a response, and then I kind of forgot about it.

@Jell
Copy link

Jell commented Jul 26, 2024

I ended up needing this feature in one of my projects recently, reading through the conversations I was wondering if we could consider accepting Arel::Nodes::Cte as argument to ActiveRecord::QueryMethods#with? And then maybe add a ActiveRecord::QueryMethods#as_cte to get an instance of Arel::Nodes::Cte from a relation.

It would look like this:

Post.with(Post.where("comments_count > ?", 0).as_cte(:post_with_comment))
Post.with(Post.where("comments_count > ?", 0).as_cte(:post_with_comment, materialized: true))
Post.with(Post.where("comments_count > ?", 0).as_cte(:post_with_comment, materialized: false))
Post.with_recursive(Post.where("comments_count > ?", 0).as_cte(:post_with_comment))
Post.with_recursive(Post.where("comments_count > ?", 0).as_cte(:post_with_comment, materialized: true))
Post.with_recursive(Post.where("comments_count > ?", 0).as_cte(:post_with_comment, materialized: false))

I have a PoC here: https://github.com/Jell/rails/pull/1/files if you think that could be a good approach I could try to polish it and make a PR, or feel free to steal my code if you think it's a reasonable approach!

@Jell
Copy link

Jell commented Jul 26, 2024

that approach would be totally backward compatible, and allow mixing Hash options with Arel::Nodes::Cte arguments:

Post.with(
  Post.where("comments_count > ?", 0).as_cte(:post_with_comment, materialized: true),
  posts_without_comments: Post.where("comments_count = ?", 0)
)

@alexandreabeh
Copy link

Hi @97jaz , any update on plans to move forward with this, considering recent information from @Jell ? Thanks!

@97jaz
Copy link
Contributor Author

97jaz commented Jan 15, 2025

@Jell Sorry -- I haven't looked at this, but I like the idea.

@97jaz
Copy link
Contributor Author

97jaz commented Jan 15, 2025

@Jell Okay, now I have looked at it, and I like it even more.

@97jaz
Copy link
Contributor Author

97jaz commented Jan 21, 2025

@Jell Are you planning to put up a PR?

@Jell
Copy link

Jell commented Jan 21, 2025

@Jell Are you planning to put up a PR?

I wasn't but I can give it a shot now! It should just be a matter of rebasing the PR I made, split the change into a better commit history and write some tests...

@Jell Jell mentioned this pull request Jan 21, 2025
4 tasks
@Jell
Copy link

Jell commented Jan 21, 2025

@97jaz I made a best effort attempt at a PR with my proposal here: #54322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants