Skip to content

Add LogicalSystemLimit automatically for data-intensive operations#3749

Closed
LantaoJin wants to merge 13 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/3731
Closed

Add LogicalSystemLimit automatically for data-intensive operations#3749
LantaoJin wants to merge 13 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/3731

Conversation

@LantaoJin
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin commented Jun 9, 2025

Description

From v3.0.0, PPL introduces commands that may increase data volume. To prevent out-of-memory problem, the system automatically enforces a LogicalSystemLimit operator for such commands.

plugins.query.system_limit: The size configures the maximum of rows in the subsearch to data-intensive operations against (e.g. join, lookup). The default value is: 50000. Value range is from 0 to 2147483647 (Int.MaxValue).

Update

Now, all PPL join/lookup/expand commands (data-bloat) will be affected by this PR. In future, we can add more command argument to control specific command.

For Join, when join type is

  • SEMI, ANTI: no affect
  • RIGHT: add a LogicalSystemLimit operator to left side (main-search)
  • Others: add a LogicalSystemLimit operator to right side (sub-search)

For Lookup

  • add a LogicalSystemLimit operator to right side (sub-search)

For expand

  • add a LogicalSystemLimit operator to right side (sub-search)

The results of impacted search (for example, the lookup table of lookup command, right side of inner join, etc.)
cannot exceed the limitation (50000 rows by default). If the actual number of rows in lookup table or right side
of inner join is greater then the system limit, only the number of rows specified by the configuration will be searched.
You can set the configuration to the maximum integer value (2147483647) if you are certain resources are not a concern.

Related Issues

Resolves #3731

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

LantaoJin added 2 commits June 9, 2025 15:42
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin requested a review from penghuo June 10, 2025 09:04
@qianheng-aws
Copy link
Copy Markdown
Collaborator

  1. Will this change leads to incorrect results?

  2. To avoid data bloating, why not adding a limit operator for each child of join operator? It should have similar effect.

penghuo
penghuo previously approved these changes Jun 10, 2025
@penghuo penghuo dismissed their stale review June 10, 2025 15:33

new comments

@LantaoJin LantaoJin marked this pull request as draft June 11, 2025 08:56
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin changed the title Pushdown system limit automatically for data-intensive operations Add LogicalSystemLimit automatically for data-intensive operations Jun 11, 2025
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin
Copy link
Copy Markdown
Member Author

@penghuo @qianheng-aws @dai-chen I have updated the description with new code refactor, and docs. please take another look.

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@RyanL1997
Copy link
Copy Markdown
Collaborator

Hi @LantaoJin , is this still requred?

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@LantaoJin
Copy link
Copy Markdown
Member Author

Close as there is a new alternative PR #4501

@LantaoJin LantaoJin closed this Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev calcite calcite migration releated stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Set operator limitation for data-intensive operators

4 participants