Skip to content

Add support in Arel for materialized subqueries in CTEs#47951

Closed
97jaz wants to merge 1 commit intorails:mainfrom
97jaz:arel-materialized-ctes
Closed

Add support in Arel for materialized subqueries in CTEs#47951
97jaz wants to merge 1 commit intorails:mainfrom
97jaz:arel-materialized-ctes

Conversation

@97jaz
Copy link
Contributor

@97jaz 97jaz commented Apr 15, 2023

Motivation / Background

Postgres and SQLite both support a non-standard extension to the CTE syntax to indicate that a CTE subquery should be materialized, i.e., not folded into the main query but evaluated separately. This can be useful in cases where the query planner would otherwise make poor decisions.

The syntax, in both databases, is:
WITH foo AS MATERIALIZED (...) ...

Detail

This PR adds support by introducing a new unary Materialized node that can wrap a subquery like so:

posts = Arel::Table.new(:posts)
comments = Arel::Table.new(:comments)
good_comments = Arel::Table.new(:good_comments)
subquery = comments.project(Arel.star).where(comments[:rating].gt(7))
materialized_subquery = Arel::Nodes::Materialized.new(subquery)

posts.
  project(Arel.star).
  with(Arel::Nodes::As.new(good_comments, materialized_subquery)).
  where(posts[:id].in(good_comments.project(:post_id))).
  to_sql

# "WITH \"good_comments\" AS MATERIALIZED (SELECT * FROM \"comments\" WHERE \"comments\".\"rating\" > 7) SELECT * FROM \"posts\" WHERE \"posts\".\"id\" IN (SELECT post_id FROM \"good_comments\")"

MySQL doesn't support this SQL extension. I've taken the approach of simply omitting the MATERIALIZED keyword when generating SQL for MySQL.

Checklist

Before submitting the PR make sure the following are checked:

  • [ x ] This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • [ x ] Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • [ x ] Tests are added or updated if you fix a bug or add a feature.
  • [ x ] CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@97jaz 97jaz force-pushed the arel-materialized-ctes branch 2 times, most recently from 7d0caec to 2e52c16 Compare April 15, 2023 07:04
Postgres and SQLite both support a non-standard extension to the
CTE syntax to indicate that a CTE subquery should be materialized,
i.e., not folded into the main query but evaluated separately.
This can be useful in cases where the query planner would
otherwise make poor decisions.

The syntax, in both databases, is:
  `WITH foo AS MATERIALIZED (...) ...`

This PR adds support by introducing a new unary Materialized node
that can wrap a subquery like so:

   ```ruby
   posts = Arel::Table.new(:posts)
   comments = Arel::Table.new(:comments)
   good_comments = Arel::Table.new(:good_comments)
   subquery = comments.project(Arel.star).where(comments[:rating].gt(7))
   materialized_subquery = Arel::Nodes::Materialized.new(subquery)

   posts.
     project(Arel.star).
     with(Arel::Nodes::As.new(good_comments, materialized_subquery)).
     where(posts[:id].in(good_comments.project(:post_id))).
     to_sql

   # "WITH \"good_comments\" AS MATERIALIZED (SELECT * FROM \"comments\" WHERE \"comments\".\"rating\" > 7) SELECT * FROM \"posts\" WHERE \"posts\".\"id\" IN (SELECT post_id FROM \"good_comments\")"
   ```
@97jaz 97jaz force-pushed the arel-materialized-ctes branch from 2e52c16 to b572c64 Compare April 18, 2023 15:53
@97jaz
Copy link
Contributor Author

97jaz commented Apr 25, 2023

Anyone willing to review this?

@benedikt
Copy link

benedikt commented May 4, 2023

Looks good to me and would indeed be useful! 👍

@97jaz
Copy link
Contributor Author

97jaz commented May 8, 2023

Thanks @benedikt

@simi
Copy link
Contributor

simi commented May 13, 2023

Hello!

If I understand it well, this adds only to Arel private API and in theory it should not be used outside of the Arel codebase. Are there any plans to expose this on ActiveRecord public API side as well?

@97jaz
Copy link
Contributor Author

97jaz commented May 18, 2023

@simi That's right, this PR is arel-only. I figure this is a first step towards exposing the functionality in AR.

@demillir
Copy link

I could definitely use CTE materialization in my current application, which dynamically constructs deeply nested CTEs on very large data tables.

So consider this my vote for merging.

@matthewd
Copy link
Member

Arel has a bit of a mixture of nodes that represent semantics and nodes that directly correspond to keywords.

The node proposed in this PR is the latter, but I'd like to explore the idea of shifting more to prefer the former: keyword-shaped nodes force deeper node trees, and feel like they miss out on the theoretical translatability of the more "original" nodes. (If MySQL were to introduce this feature with a different spelling -- or indeed, if we were to try to implement Oracle's WITH cte AS (SELECT /*+ MATERIALIZE */ ..), the visitors would likely need to work around, rather than with, the extra node layer.)

I don't know whether this is a good or bad idea, but I'm wondering about instead introducing a CTE(name, subquery, materialized: nil) node or so (absorbing the also-keywordy and already-awkward AS node along the way). That feels like a small step in shifting the node tree back to focusing on semantics, and leaving syntax to the (variously dialect-specific) visitors.

That particular spelling would also leave open the option of materialized: false -> NOT MATERIALIZED... which seems like something we'd eventually want to support, and neither of NotMaterialized(q) or Not(Materialized(q)) seems remotely palatable to me.

This is all very much thinking-out-loud, not Canonical Core Opinion, though... wdyt?

@97jaz
Copy link
Contributor Author

97jaz commented May 19, 2023

@matthewd I agree that this would be a nicer approach, both because it requires less nesting of nodes and because it rules out more nonsensical syntactic constructions. A couple of questions/comments:

  • Are you thinking about this as a backwards-incompatible change to the current way of building CTEs in Arel? Or do you think that #with should (also) accept a new kind of node?
  • The materialized value seems to be tri-state, not binary. That is, there's a semantic distinction (in PG, at least) between WITH foo AS NOT MATERIALIZED (<relation>) and WITH foo AS (<relation>). I think that a materialized keyword argument that distinguishes between true, false and nil might be a bit awkward of an API. Is there an existing convention in Arel for some kind of enumerated type? Symbols?

[Edit]
Well, after thinking about this for a bit, I'm not sure if the tri-state boolean is so awkward. At any rate, I'm currently putting together a PR that implements this.

@97jaz
Copy link
Contributor Author

97jaz commented May 19, 2023

@matthewd I implemented a version of what you suggested in #48261.

@97jaz
Copy link
Contributor Author

97jaz commented May 24, 2023

Closing in favor of #48261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants