Skip to content

PPL fillnull command enhancement#4421

Merged
dai-chen merged 11 commits intoopensearch-project:mainfrom
ahkcs:feat/fillnull_cmd
Oct 6, 2025
Merged

PPL fillnull command enhancement#4421
dai-chen merged 11 commits intoopensearch-project:mainfrom
ahkcs:feat/fillnull_cmd

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented Sep 30, 2025

What does this PR do?

Adds an alternative, option-style form for fillnull (available since 3.4) alongside existing with/using syntax. Semantics remain strict and schema-stable (no permissive casting in this PR).

New syntax (option-style):

source=accounts | fillnull value=0              // apply to all fields (subject to type constraints)
source=accounts | fillnull value=0 age balance  // apply to specific fields

Existing syntax (still supported):

source=accounts | fillnull with 0 in age, balance
source=accounts | fillnull using age=0, balance=0

Notes / Limitations (intentional in this PR)

  • value is mandatory (no default value yet).
  • Mixed-type fills with a single replacement still error by design (strict typing preserved).
  • Semantics remain strict and schema-preserving

Future Work

Related Issues

Partially resolves #4419

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

// Track if value= syntax was used (added in 3.4). Only needed to distinguish from with...in
// since both apply same value to all fields. using syntax is detected by checking if all
// replacement values are the same.
private final boolean useValueSyntax;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for anonymizer?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this change is debatable, you can refer to this comment: #4421 (comment)

Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, we should just have the anonymizer output a consistent format if the underlying parse rule is irrecoverable without special tracking.

Personally I'd actually prefer the anonymizer defaulted to value= syntax, since it's consistent and more in line with other command syntax.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong preference on this. I think anonimized query log is needed for operator to check the input query in case of trouble shooting(I suppose we don't use it for analyzing use case statistics, but not quite sure), while we don't want to log sensitive information.

Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @ahkcs . I just left some comments.

ahkcs added 5 commits October 2, 2025 12:12
Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feat/fillnull_cmd branch from 1a6898a to 4712ec2 Compare October 2, 2025 19:14
ahkcs added 4 commits October 2, 2025 12:57
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

// Check if the replacement type is compatible with the field type
// Allow NULL type family as it's compatible with any type
if (fieldFamily != replacementFamily
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what are the semantics of family equality? Will it automatically down-cast?

Wondering what would happen if you tried to fillnull an int with 8589934592 (2^33)

Copy link
Copy Markdown
Collaborator Author

@ahkcs ahkcs Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the replacement value is 2^33, then the replacement value would be LONG, which will throw error if the field type is INT
Let me add an IT test for this case

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type family equality check allows it. Both INT and BIGINT belong to SqlTypeFamily.NUMERIC, so the validation passes. Calcite's COALESCE function then handles the type widening automatically - it will widen the INT field to BIGINT to accommodate the larger value. No error is thrown; The result type becomes BIGINT.

I added an IT test (testFillNullWithLargeIntegerOnIntField) that confirms this behavior works correctly at runtime.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@dai-chen dai-chen merged commit e1a1bd8 into opensearch-project:main Oct 6, 2025
33 checks passed
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4421-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e1a1bd88dfd92d9fe4c954387583c52c0bf6c29b
# Push it to GitHub
git push --set-upstream origin backport/backport-4421-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4421-to-2.19-dev.

ahkcs added a commit to ahkcs/sql that referenced this pull request Oct 6, 2025
Signed-off-by: Kai Huang <ahkcs@amazon.com>
(cherry picked from commit e1a1bd8)
penghuo pushed a commit that referenced this pull request Oct 7, 2025
* PPL `fillnull` command enhancement (#4421)

Signed-off-by: Kai Huang <ahkcs@amazon.com>
(cherry picked from commit e1a1bd8)

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix IT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix anonymizer

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Oct 9, 2025
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 10, 2025
* main-apple: (218 commits)
  Add ignorePrometheus Flag for integTest and docTest (opensearch-project#4442)
  Create fab-radar.yml
  PPL `fillnull` command enhancement (opensearch-project#4421)
  reverting to _doc + _id (opensearch-project#4435)
  Support `multisearch` command in calcite (opensearch-project#4332)
  Add 3.3 release notes (opensearch-project#4422) (opensearch-project#4423)
  [SQL/PPL] Fix the `count(*)` and `dc(field)` to be capped at MAX_INTEGER opensearch-project#4416 (opensearch-project#4418)
  Change the default search sort tiebreaker to `_shard_doc` for PIT search (opensearch-project#4378)
  [Enhancement] Add error handling for known limitation of sql `JOIN` (opensearch-project#4344)
  Bugfix: SQL type mapping for legacy JDBC output (opensearch-project#3613)
  Version bump: 3.3 (opensearch-project#4417)
  Add max/min eval functions (opensearch-project#4333)
  Support time modifiers in search command  (opensearch-project#4224)
  Fix numbered token bug and make it optional output in patterns command (opensearch-project#4402)
  refactor span (opensearch-project#4334)
  Move release notes categories (opensearch-project#3818)
  [Doc] Enable doctest with Calcite (opensearch-project#4379)
  Mod function should return decimal instead of float when handle the operands are decimal literal (opensearch-project#4407)
  Scale of decimal literal should always be positive in Calcite (opensearch-project#4401)
  Enable Calcite by default and implicit fallback the unsupported commands (opensearch-project#4372)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Enhancement for PPL fillnull command

6 participants