Skip to content

Add documents on how to develop a UDF / UDAF#4094

Merged
LantaoJin merged 8 commits intoopensearch-project:mainfrom
yuancu:issues/4043
Aug 28, 2025
Merged

Add documents on how to develop a UDF / UDAF#4094
LantaoJin merged 8 commits intoopensearch-project:mainfrom
yuancu:issues/4043

Conversation

@yuancu
Copy link
Copy Markdown
Collaborator

@yuancu yuancu commented Aug 21, 2025

Description

This PR creates a dev doc Developing PPL Functions under folder docs/dev.

Additionally, it refactors the registration of aggregation functions to simplify its logic, making it similar to the creation and registration of non-aggregation functions to reduce learning barriers.

Related Issues

Resolves #4043

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuancu added 3 commits August 21, 2025 12:37
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu yuancu added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels Aug 21, 2025

New PPL Command Checklist
=========================
Developing PPL Commands and Functions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the the link.

I think we you can add a new section out of New PPL Command Checklist

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was too heavy to have two sections for developing functions and commands, considering most of other sections are high level instructions.

But somehow I cannot add lower level sections to this document. I updated them to separate sections.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Comment on lines +230 to +231
- Consult repository maintainers if PM is unavailable
- Be prepared for meetings to discuss syntax and usage details
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified by LLM unintentionally, should be ok though

qianheng-aws
qianheng-aws previously approved these changes Aug 22, 2025
- Adapt existing static methods: Convert Java static methods to UDFs using utility functions like ``UserDefinedFunctionUtils.adaptExprMethodToUDF``
- Implement from scratch

* Implement the ``ImplementorUDF`` interface
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call out that developers should ensure their UDF calculating logic is efficient by implementing data-irrelevant logic during the compilation phase rather than at runtime. In other words, that part of logic should preferably use ling4j's expression instead of internal static method calls.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added such comments. I'm thinking about adding an example so that it will be easier to comprehend. Does the following example fit the case?

@Override
public Expression implement(RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
     // Don't
    SqlTypeName type = call.getOperands().getFirst().getType().getTypeName();
    List<Expression> operands = Lists.concat(translatedOperands, List.of(Expressions.constant(type));
    return Expressions.call(Implementor.class, "allInOneOp", operands)

    // Do
    String opName = type == FLOAT ? "floatOp" : "integerOp";
    return Expressions.call(Implementor.class, , translatedOperands);
}

public static int allInOneOp(Number n, SqlTypeFamily t){
    if (t == FLOAT) { ... }
    else if (t == INTEGER) { .... }
}

public static int floatOp(float n){
    ...
}

public static int integerOp(integer n) {
   ...
}

yuancu added 2 commits August 22, 2025 18:07
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this PR includes code refactoring?

- Include at least syntax definition, usage and examples
- Implementation options are welcome if you have multiple ways to implement it
| ✅ Open an RFC issue describing the command:
- Specify the purpose and need for the new command
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new description really better than the original?

The LLM generated content changed the level of requirements. Some info I tried to callout is missing. For example, offline meeting is not necessary, syntax, usage and examples are minimal requirements of RFC, implementation details are not necessary in RFC, etc.

Can you revert the LLM generated suggestions except grammar fixing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the unnecessary change. I have reverted them.

@Swiddis
Copy link
Copy Markdown
Collaborator

Swiddis commented Aug 23, 2025

This should go in the dev docs at docs/dev instead of being put directly in the developer guide.

Dev guide should be more high level on how to build the project and maybe what the core entrypoints are. Making this its own doc will also give more room to add detail in e.g. lines like "Creating UDFs: A user-defined function is an instance of SqlOperator" (What is SqlOperator?)

yuancu added 2 commits August 25, 2025 13:47
…GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…cs/dev

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu
Copy link
Copy Markdown
Collaborator Author

yuancu commented Aug 25, 2025

This should go in the dev docs at docs/dev instead of being put directly in the developer guide.

Dev guide should be more high level on how to build the project and maybe what the core entrypoints are. Making this its own doc will also give more room to add detail in e.g. lines like "Creating UDFs: A user-defined function is an instance of SqlOperator" (What is SqlOperator?)

That makes sense. I have moved both the guidance for commands and functions from DEVELOPER_GUIDE to docs/dev

@LantaoJin LantaoJin merged commit 1ae4f28 into opensearch-project:main Aug 28, 2025
25 checks passed
@ykmr1224
Copy link
Copy Markdown
Collaborator

@yuancu
Should we backport this to 2.19-dev?
It is causing backport failure for #4100

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4094-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1ae4f288a3d6ec59fb5234b551e5cd6293cc3c8f
# Push it to GitHub
git push --set-upstream origin backport/backport-4094-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4094-to-2.19-dev.

ykmr1224 pushed a commit to ykmr1224/sql that referenced this pull request Sep 2, 2025
* Refactor: simplify registration of user-defined aggregation functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Draft guidance on developing user-defined functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Improve the developer guide with the help of LLM

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Revert changes to the section New PPL Command Checklist in DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Move ppl command and function dev guidance from DEVELOEPR_GUIDE to docs/dev

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
ykmr1224 pushed a commit to ykmr1224/sql that referenced this pull request Sep 2, 2025
* Refactor: simplify registration of user-defined aggregation functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Draft guidance on developing user-defined functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Improve the developer guide with the help of LLM

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Revert changes to the section New PPL Command Checklist in DEVELOPER_GUIDE.rst

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Move ppl command and function dev guidance from DEVELOEPR_GUIDE to docs/dev

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
yuancu added a commit that referenced this pull request Sep 3, 2025
* Refactor: simplify registration of user-defined aggregation functions



* Draft guidance on developing user-defined functions



* Improve the developer guide with the help of LLM



* Update DEVELOPER_GUIDE.rst



* Update DEVELOPER_GUIDE.rst



* Revert changes to the section New PPL Command Checklist in DEVELOPER_GUIDE.rst



* Move ppl command and function dev guidance from DEVELOEPR_GUIDE to docs/dev



---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add documents on how to develop a new UDF/UDAF

5 participants