Add compare_ip operator udfs#3821
Conversation
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
|
I think we should implement After that, for Ip comparison, we can convert string to ExprIP. And then we can implement some generic comparison functions for types of comparable, via calling their But current approach is still acceptable if we won't have other UDT in the future. |
qianheng-aws
left a comment
There was a problem hiding this comment.
Please add IT for ip comparison.
| // EXPR_IP is mapped to SqlTypeFamily.NULL in | ||
| // UserDefinedFunctionUtils.convertRelDataTypeToSqlTypeName | ||
| return UDFOperandMetadata.wrap(OperandTypes.STRING_STRING); | ||
| return UDFOperandMetadata.wrap( |
There was a problem hiding this comment.
Maybe we should use new PPLTypeChecker here and override its checkOperandTypes method to only allow accept string or ip.
| OpenSearchTypeFactory.TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER)); | ||
| case ANY, IGNORE -> List.of( | ||
| OpenSearchTypeFactory.TYPE_FACTORY.createSqlType(SqlTypeName.ANY)); | ||
| // We borrow SqlTypeFamily.NULL to represent EXPR_IP. This is a workaround |
There was a problem hiding this comment.
If we stop use CompositeOperandTypeChecker and create a new TypeChecker ourself, maybe we can avoid using SqlTypeFamily.NULL to represent EXPR_IP and check operands on RelDataType level.
There was a problem hiding this comment.
@qianheng-aws How about this implementation ishaoxy#1
It doesn't use NULL for IP, but created two other classes for IP type checking.
There was a problem hiding this comment.
It just came to me that there's not only compare that checks IP type. CIDR_MATCH also has to validate IP types. If we make a specific IP type checker for compare operators, we may also have to create one for each other functions like cidrmatch and geoip.
|
|
||
| void populate() { | ||
| // register operators for IP comparing | ||
| registerOperator(NOTEQUAL, PPLBuiltinOperators.NOT_EQUALS_IP); |
There was a problem hiding this comment.
Questions, Do we need to support IP as UDT in PPL engine?
In OpenSearch PPL, IP handling depends on index field type. CIDR-based filtering should use ip_range queries for ip fields, script pushdown / in-memory processing for keyword, text, and runtime string fields.
| Field Type | Use Case | Expectation |
|---|---|---|
| IP Field | search index=log ip="192.168.0.0/16" | Rewrite as term query |
| search index | where cidrmatch("192.168.0.0/16", ip) | Rewrite as term query | |
| Keyword Field | search index=log ip="192.168.0.0/16" | Rewrite as term query, extactally keyword match |
| search index | where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. | |
| Text Field | search index=log ip="192.168.0.0/16" | Rewrite as query_string query, full text search |
| search index | where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. | |
| Runtime Field | search index=log | parse ip=regex(...)| where ip="192.168.0.0/16" | Script pushdown — ip field is a string, it is a string comparsion query |
| search index=log | parse ip=regex(...)| where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. |
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Yes, I think that's what we need. In the future when we have type conversion, that IPTypeChecker should only accept type of IP, and type conversion should convert
Yes since we treat |
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Make IpComparisonOperators an inner enum of CompareIPFunction
| registerOperator(EQUAL, PPLBuiltinOperators.EQUALS_IP); | ||
| registerOperator(GREATER, PPLBuiltinOperators.GREATER_IP); | ||
| registerOperator(GTE, PPLBuiltinOperators.GTE_IP); | ||
| registerOperator(LESS, PPLBuiltinOperators.LESS_IP); | ||
| registerOperator(LTE, PPLBuiltinOperators.LTE_IP); |
There was a problem hiding this comment.
SqlStdOperatorTable.EQUALS` is a very basic std operator which widely used in calcite internal, I am worry that it may introduce potential bugs and performance regression for pushdown.
There was a problem hiding this comment.
But Calcite's built-in operators like SqlStdOperatorTable.EQUALS does not handle our UDT like IP. These new operators are only effective to IP comparison, controlled via type checkers. Comparison between other types will still falls to Calcite's built-in comparison operators.
There was two solutions:
- one is to convert IP UDT to a type that is comparable by calcite's comparators.
- another is to add new operators exclusively for IP comparision
We opted the latter.
There was a problem hiding this comment.
In order to keep the comparison logic the same as v2, I chose to add these specific operator udfs for IP comparison instead of converting it to string type that calcite can compare.
There was a problem hiding this comment.
These new operators are only effective to IP comparison, controlled via type checkers. Comparison between other types will still falls to Calcite's built-in comparison operators.
Can we move above logic to a specific method registerOverrideOperator, it quite confused me.
There was a problem hiding this comment.
Or modify to
registerOperator(EQUAL, check(PPLBuiltinOperators.EQUALS_IP,SqlStdOperatorTable.EQUALS));
and leverage check() to manage the specific register logic via typeChecker.
check() can name to case() or set() or queue()...
There was a problem hiding this comment.
Or modify to
registerOperator(EQUAL, check(PPLBuiltinOperators.EQUALS_IP,SqlStdOperatorTable.EQUALS));and leverage
check()to manage the specific register logic via typeChecker.
check()can name tocase()orset()orqueue()...
Modified, improving codes readability.
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
add type checker for cidr
| case EXPR_IP -> SqlTypeName.VARCHAR; | ||
| // EXPR_IP is mapped to SqlTypeName.OTHER since there is no | ||
| // corresponding SqlTypeName in Calcite. | ||
| case EXPR_IP -> SqlTypeName.OTHER; |
There was a problem hiding this comment.
please update the PR description to align changes.
There was a problem hiding this comment.
Thanks for reminding me.
| } else { | ||
| typeChecker = operator.getOperandTypeChecker(); | ||
| } | ||
| public void registerOperator(BuiltinFunctionName functionName, SqlOperator... operators) { |
There was a problem hiding this comment.
please add a javadoc for this method to explain what are the operators for
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
| IPAddress addr1 = IPUtils.toAddress(ip1); | ||
| IPAddress addr2 = IPUtils.toAddress(ip2); | ||
| int result = IPUtils.compare(addr1, addr2); | ||
| return switch (comparisonType) { |
There was a problem hiding this comment.
For efficiency consideration, we'd better move this switch case to the implement method
|
|
||
| public static boolean compare(Object obj1, Object obj2, ComparisonType comparisonType) { | ||
| try { | ||
| String ip1 = extractIpString(obj1); |
There was a problem hiding this comment.
Why not leverage ExprIpValue::compare? Then we don't need to do extra transformation for ExprIPValue
There was a problem hiding this comment.
Sorry for only focusing on IPUtils before and ignoring the functions encapsulated in ExprIpValue. Now fixed.
Signed-off-by: Xinyu Hao <haoxinyu@amazon.com>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3821-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6c3efa1475f815fc9025a2e3c533b7b21d9ea8cf
# Push it to GitHub
git push --set-upstream origin backport/backport-3821-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
|
@ishaoxy please follow the instructions to backport manually |
* ip_compare operator added Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * only type checker issue left Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix by modifying ip.sqlTypeName from OTHER to NULL in type checker Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix less Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify the CalcitePPLFunctionTypeTest text Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * allow CalciteIPComparisonIT in CalciteNoPushdownIT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Modify the signature description in udf Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix some typing errors Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify the udfs for better style Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Make IpComparisonOperators an inner enum of CompareIPFunction Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * modify registerOperator Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify registerOperator Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add type checker for cidr Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add javadoc Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * move switch case to the implement method Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> --------- Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit 6c3efa1)
* Add compare_ip operator udfs (#3821) * ip_compare operator added Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * only type checker issue left Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix by modifying ip.sqlTypeName from OTHER to NULL in type checker Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix less Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify the CalcitePPLFunctionTypeTest text Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * allow CalciteIPComparisonIT in CalciteNoPushdownIT Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Modify the signature description in udf Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * fix some typing errors Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify the udfs for better style Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * Make IpComparisonOperators an inner enum of CompareIPFunction Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * modify registerOperator Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * modify registerOperator Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add type checker for cidr Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * add javadoc Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * move switch case to the implement method Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> --------- Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit 6c3efa1) * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> * backport to java11 Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> --------- Signed-off-by: Xinyu Hao <haoxinyu@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-manually manually
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-manually
# Create a new branch
git switch --create backport/backport-3821-to-manually
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6c3efa1475f815fc9025a2e3c533b7b21d9ea8cf
# Push it to GitHub
git push --set-upstream origin backport/backport-3821-to-manually
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-manuallyThen, create a pull request where the |
Description
CompareIpFunctionfor IP address comparisons.SqlTypeName.OTHER.registerOperatormethod to support registering one or multiple operators under a single function name, enabling function overloading based on operand types.Related Issues
Resolves #3776
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.