Skip to content

fix: correct decimal detection in range filtering issue#17247#17372

Open
DanielSLDev wants to merge 2 commits intoopensearch-project:mainfrom
DanielSLDev:opensearch-bug-decimalLoss
Open

fix: correct decimal detection in range filtering issue#17247#17372
DanielSLDev wants to merge 2 commits intoopensearch-project:mainfrom
DanielSLDev:opensearch-bug-decimalLoss

Conversation

@DanielSLDev
Copy link
Copy Markdown

Description

Previously, hasDecimalPart(parsedLow) and hasDecimalPart(parsedHigh) were used after converting values to Long, causing loss of decimal precision and incorrect range adjustments.

Now, hasDecimalPart(rawLow) and hasDecimalPart(rawHigh) are used instead, ensuring proper detection before conversion and preventing off-by-one errors in range filters.

Related Issues

Resolves #17247

Check List

  • Functionality includes testing. (Working on them...)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Previously, hasDecimalPart(parsedLow) and hasDecimalPart(parsedHigh) were used after converting values to Long, causing loss of decimal precision and incorrect range adjustments.

Now, hasDecimalPart(rawLow) and hasDecimalPart(rawHigh) are used instead, ensuring proper detection before conversion and preventing off-by-one errors in range filters.

Signed-off-by: Daniel Salas <daniel.salas@encora.com>
@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 3791e74: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.44%. Comparing base (2d2b41d) to head (3791e74).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17372      +/-   ##
============================================
+ Coverage     72.34%   72.44%   +0.10%     
- Complexity    65480    65552      +72     
============================================
  Files          5291     5291              
  Lines        304313   304318       +5     
  Branches      44176    44176              
============================================
+ Hits         220142   220453     +311     
+ Misses        66133    65767     -366     
- Partials      18038    18098      +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@expani
Copy link
Copy Markdown
Contributor

expani commented Feb 18, 2025

Thanks for opening the PR @DanielSEncora

Please add a UT under org.opensearch.search.aggregations.startree.MetricAggregatorTests#testStarTreeDocValues which will fail on the absence of this fix.

…ecimals validates the behavior of the StarTree index when it works with decimal values (both double and float types). testStarTreeDocValuesDecimalPrecision validates that the StarTree index maintains decimal precision, particularly with high precision values like 10.9999 for double and 99.999f for float.

Signed-off-by: Daniel Salas <daniel.salas@encora.com>
@DanielSLDev
Copy link
Copy Markdown
Author

Hi @expani, the requested UTs have been added! Let me know if any other changes need to be made.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 264a58c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


public void testStarTreeDocValuesWithDecimals() throws IOException {
final List<DimensionFieldData> dimensionFieldDatum = List.of(
new DimensionFieldData("double_field", () -> 10.75, DimensionTypes.DOUBLE),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can generate a random float having precision and change it to int_field and long_field essentially non-decimal numeric fields.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @expani , my apologies. Could you elaborate on what changes should be made to my contribution?

Copy link
Copy Markdown
Contributor

@expani expani Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielSLDev I should be the one apologizing for not elaborating on the issue before :)

Here it goes :

Lets consider the below aggregation with a filter query

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/logs/_search -d '{
  "size": 0,
  "query": {
    "term": {
      "port": 9200
    }
  },
  "aggs": {
    "sum_request_size": {
      "sum": {
        "field": "size"
      }
    }
  }
}'

In this case, port is an integer field and is being passed as an integer by the user. All good here.

However, OpenSearch also allows user to pass floating point number instead of integer in the search query. In such cases, we round off to an integer before executing the query.

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/logs/_search -d '{
  "size": 0,
  "query": {
    "term": {
      "port": 9200.65
    }
  },
  "aggs": {
    "sum_request_size": {
      "sum": {
        "field": "size"
      }
    }
  }
}'

We were not aware of this feature which caused the bug of not rounding off the user input when Star Tree gets used for the aggregation.

This is applicable for all non-decimal fields like integer, long, short, byte; etc.

So, I suggested the change of converting

            new DimensionFieldData("double_field", () -> 10.75, DimensionTypes.DOUBLE),
            new DimensionFieldData("float_field", () -> 5.25f, DimensionTypes.FLOAT)

to

            new DimensionFieldData("integer_field", () -> randomFloat(), DimensionTypes.INTEGER),
            new DimensionFieldData("long_field", () -> randomFloat(), DimensionTypes.LONG)

Also, randomFloat would make the test cover more cases than hardcoding to a specific value.

@Shailesh-Kumar-Singh
Copy link
Copy Markdown
Contributor

Hi @DanielSLDev, I see that you have reverted the fix in your latest commit - You can proceed with fix, I have reverted the fix in my PR - link

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Apr 1, 2025
@sandeshkr419
Copy link
Copy Markdown
Member

@Shailesh-Kumar-Singh @DanielSLDev - trying to figure out if we are trying to do anything in this PR as I only see unit tests in the code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search Search query, autocomplete ...etc stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Use rawLow/rawHigh instead of parsedLow/parsedHigh in function - hasDecimalPart()

4 participants