Skip to content

Fix flaky TPC-H Q1 test due to bugs in MatcherUtils.closeTo()#5283

Merged
yuancu merged 2 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/5282
Mar 30, 2026
Merged

Fix flaky TPC-H Q1 test due to bugs in MatcherUtils.closeTo()#5283
yuancu merged 2 commits intoopensearch-project:mainfrom
LantaoJin:pr/issues/5282

Conversation

@LantaoJin
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin commented Mar 30, 2026

Description

Fixes intermittent failures in CalcitePPLTpchIT.testQ1 and CalcitePPLTpchPaginatingIT.testQ1 caused by two bugs in MatcherUtils.closeTo():

  1. Incorrect element pairing via indexOf: The old code used actualValues.indexOf(v) to find each element's position, then looked up the corresponding expected value. indexOf returns the index of the first occurrence matching via equals(), so duplicate values would silently compare against the wrong expected element. This also meant no size mismatch was ever detected.
  2. Made closeTo() ULP-aware to handle floating-point precision differences.

Related Issues

Resolves #5282

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 9357beb)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Null Safety

In the new loop, if either actual or expected is null, calling actual.equals(expected) will throw a NullPointerException. The original code had the same issue, but the new code should handle null values explicitly (e.g., using Objects.equals(actual, expected) for the non-numeric branch).

for (int i = 0; i < actualValues.size(); i++) {
  Object actual = actualValues.get(i);
  Object expected = expectedValues.get(i);
  if (actual instanceof Number && expected instanceof Number) {
    if (!valuesAreClose((Number) actual, (Number) expected)) {
      return false;
    }
  } else if (!actual.equals(expected)) {
    return false;
  }
Missing Approx

The PR description mentions adding an Approx wrapper with an approx(value, error) factory method for per-value custom tolerance, but this feature does not appear in the diff. If this was intended to be part of the fix, it is missing from the implementation.

public static TypeSafeMatcher<JSONArray> closeTo(Object... values) {
  return new TypeSafeMatcher<JSONArray>() {
    @Override
    protected boolean matchesSafely(JSONArray item) {
      List<Object> expectedValues = new ArrayList<>(Arrays.asList(values));
      List<Object> actualValues = new ArrayList<>();
      item.iterator().forEachRemaining(v -> actualValues.add((Object) v));
      if (actualValues.size() != expectedValues.size()) {
        return false;
      }
      for (int i = 0; i < actualValues.size(); i++) {
        Object actual = actualValues.get(i);
        Object expected = expectedValues.get(i);
        if (actual instanceof Number && expected instanceof Number) {
          if (!valuesAreClose((Number) actual, (Number) expected)) {
            return false;
          }
        } else if (!actual.equals(expected)) {
          return false;
        }
      }
      return true;
    }

    @Override
    public void describeTo(Description description) {
      description.appendText(Arrays.toString(values));
    }

    /**
     * ULP-aware comparison: tolerates up to {@link #ULP_TOLERANCE_FACTOR} ULPs or {@link
     * #ABSOLUTE_TOLERANCE}, whichever is larger.
     */
    private boolean valuesAreClose(Number v1, Number v2) {
      double d1 = v1.doubleValue();
      double d2 = v2.doubleValue();
      double diff = Math.abs(d1 - d2);
      double ulpTolerance = ULP_TOLERANCE_FACTOR * Math.max(Math.ulp(d1), Math.ulp(d2));
      return diff <= Math.max(ABSOLUTE_TOLERANCE, ulpTolerance);
    }
  };
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to 9357beb
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against NaN and Infinity inputs

When either d1 or d2 is 0.0, Math.ulp(0.0) returns the smallest positive double
(~5e-324), making ulpTolerance effectively zero and falling back only to
ABSOLUTE_TOLERANCE. However, if one value is zero and the other is very small but
non-zero, the ULP of the non-zero value could be much larger. Consider using
Math.max of both ULPs to handle near-zero comparisons, but also guard against NaN or
Infinity inputs which would cause ulp to return NaN or Infinity.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java [347]

+if (Double.isNaN(d1) || Double.isNaN(d2) || Double.isInfinite(d1) || Double.isInfinite(d2)) {
+  return Double.compare(d1, d2) == 0;
+}
 double ulpTolerance = ULP_TOLERANCE_FACTOR * Math.max(Math.ulp(d1), Math.ulp(d2));
Suggestion importance[1-10]: 4

__

Why: While guarding against NaN and Infinity is a valid concern, this is a test utility class where such edge cases are unlikely to occur in practice. The suggestion adds defensive code that may not be needed for the typical use case of comparing numeric test results.

Low
Handle null elements to prevent NullPointerException

If any element in actualValues or expectedValues is null, calling
actual.equals(expected) or casting to Number will throw a NullPointerException. Add
null checks before performing the comparison to avoid unexpected test failures.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java [316-330]

 item.iterator().forEachRemaining(v -> actualValues.add((Object) v));
 if (actualValues.size() != expectedValues.size()) {
   return false;
 }
 for (int i = 0; i < actualValues.size(); i++) {
   Object actual = actualValues.get(i);
   Object expected = expectedValues.get(i);
+  if (actual == null || expected == null) {
+    if (actual != expected) return false;
+    continue;
+  }
   if (actual instanceof Number && expected instanceof Number) {
     if (!valuesAreClose((Number) actual, (Number) expected)) {
       return false;
     }
   } else if (!actual.equals(expected)) {
     return false;
   }
 }
Suggestion importance[1-10]: 4

__

Why: Adding null checks is a valid defensive programming practice, but in this test utility context, null values in JSONArray comparisons are an edge case. The improved_code accurately reflects the suggested change, but the impact is limited for a test helper class.

Low

Previous suggestions

Suggestions up to commit 3743dc5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix boundary condition in approximate comparison

The Approx comparison uses a strict greater-than (>) check, which means a difference
exactly equal to the error is considered a failure. For consistency with
valuesAreClose (which uses <=), the check should use >= or the condition should be
inverted to <= error for the passing case.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java [333-336]

 if (!(actual instanceof Number)
     || Math.abs(((Number) actual).doubleValue() - ((Approx) expected).value)
-        > ((Approx) expected).error) {
+        >= ((Approx) expected).error) {
Suggestion importance[1-10]: 4

__

Why: The suggestion points out a real but minor boundary inconsistency: valuesAreClose uses <= (passes when difference equals error), while Approx uses > (fails when difference equals error). In practice, exact equality at the boundary is extremely unlikely with floating-point values, making this a low-impact edge case.

Low
General
Explicitly mark captured local variable as final

The variable defaultError is declared as a local variable inside the closeTo method
but is only used inside the anonymous inner class valuesAreClose. While this works
due to Java's effectively-final capture, it would be clearer and safer to ensure it
is explicitly declared as final to make the intent explicit and prevent accidental
reassignment.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java [319]

+final double defaultError = 1e-10;
 
-
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The variable defaultError is already effectively final, so this suggestion offers no functional improvement.

Low

@LantaoJin LantaoJin added bug Something isn't working flaky-test Flaky build or test issue backport 3.6 bugFix testing Related to improving software testing and removed bug Something isn't working labels Mar 30, 2026
@LantaoJin LantaoJin marked this pull request as ready for review March 30, 2026 03:42
isPushdownDisabled() ? 35676192.096999995 : 35676192.097,
isPushdownDisabled()
? 35676192.096999995
: approx(35676192.097, 1e-6), // workaround for arm64
Copy link
Copy Markdown
Collaborator

@songkant-aws songkant-aws Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approx(35676192.097, 1e-6) change in CalcitePPLTpchIT.testQ1() looks too targeted for #5282. The issue shows at least two observed arm64 failures in IEEE-754 double standard:

  • 35676192.096999995 vs 35676192.097 in CalcitePPLTpchIT.testQ1()
  • 1041301.0700000001 vs 1041301.07 (and also 999060.8979999999 vs 999060.898) in CalcitePPLTpchPaginatingIT.testQ1()

Would it make sense to change the default numeric comparison in closeTo() to something ULP-aware instead of hardcoding a single absolute threshold? For example, something along the lines of:

abs(actual - expected) <= max(1e-10, k * max(Math.ulp(actual), Math.ulp(expected))), we could start with k = 4, aka 4 ULP error tolerance. Though the failed case has around 1 ULP error, we'd better set a conservative threshold.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9357beb

@yuancu yuancu merged commit 246ed0d into opensearch-project:main Mar 30, 2026
40 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 30, 2026
* Fix the flaky tpch Q1

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Change to ULP-aware to handle floating-point precision differences

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 246ed0d)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@LantaoJin LantaoJin deleted the pr/issues/5282 branch March 30, 2026 09:00
peterzhuamazon pushed a commit that referenced this pull request Mar 30, 2026
… (#5284)

* Fix the flaky tpch Q1



* Change to ULP-aware to handle floating-point precision differences



---------


(cherry picked from commit 246ed0d)

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ahkcs added a commit that referenced this pull request Mar 30, 2026
* Init CLAUDE.md (#5259)

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add label to exempt specific PRs from stalled labeling (#5263)

* Implement `reverse` performance optimization (#4775)

Co-authored-by: Jialiang Liang <jiallian@amazon.com>

* Add songkant-aws as maintainer (#5244)

* Move some maintainers from active to Emeritus (#5260)

* Move inactive current maintainers to Emeritus

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Remove affiliation column for emeritus maintainers

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* formatted

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix formatting in MAINTAINERS.md

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>

* Add query cancellation support via _tasks/_cancel API for PPL queries (#5254)

* Add query cancellation support via _tasks/_cancel API for PPL queries

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Refactor PPL query cancellation to cooperative model and other PR suggestions.

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

---------

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Add Calcite native SQL planning in UnifiedQueryPlanner (#5257)

* feat(api): Add Calcite native SQL planning path in UnifiedQueryPlanner

Add SQL support to the unified query API using Calcite's native parser
pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode),
bypassing the ANTLR parser used by PPL.

Changes:
- UnifiedQueryPlanner: use PlanningStrategy to dispatch
  CalciteNativeStrategy vs CustomVisitorStrategy
- CalciteNativeStrategy: Calcite Planner with try-with-resources
  for ANSI SQL
- CustomVisitorStrategy: ANTLR-based path for PPL (and future SQL V2)
- UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to
  preserve lowercase OpenSearch index names

Signed-off-by: Chen Dai <daichen@amazon.com>

* test(api): Add SQL planner tests and refactor test base for multi-language support

- Refactor UnifiedQueryTestBase with queryType() hook for subclass override
- Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN,
  ORDER BY, subquery, case sensitivity, namespaces, and error handling
- Update UnifiedQueryContextTest to verify SQL context creation

Signed-off-by: Chen Dai <daichen@amazon.com>

* perf(benchmarks): Add SQL queries to UnifiedQueryBenchmark

Add language (PPL/SQL) and queryPattern param dimensions for
side-by-side comparison of equivalent queries across both languages.
Remove separate UnifiedSqlQueryBenchmark in favor of unified class.

Signed-off-by: Chen Dai <daichen@amazon.com>

* docs(api): Update README to reflect SQL support in UnifiedQueryPlanner

Signed-off-by: Chen Dai <daichen@amazon.com>

* fix(api): Normalize trailing whitespace in assertPlan comparison

RelOptUtil.toString() appends a trailing newline after the last plan
node, which doesn't match Java text block expectations. Also add
\r\n normalization for Windows CI compatibility, consistent with
the existing pattern in core module tests.

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* [Feature] Support graphLookup with literal value as its start (#5253)

* [Feature] Support graphLookup as top-level PPL command (#5243)

Add support for graphLookup as the first command in a PPL query with
literal start values, instead of requiring piped input from source=.

Syntax:
  graphLookup table start="value" edge=from-->to as output
  graphLookup table start=("v1", "v2") edge=from-->to as output

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Spotless check

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Ignore child pipe if using start value

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add graphLookup integration tests per PPL command checklist

- Add explain plan tests in CalciteExplainIT with YAML assertions
- Add v2-unsupported tests in NewAddedCommandsIT
- Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite
- Skip graphLookup tests when pushdown is disabled (required by impl)
- Add expected plan YAML files for piped and top-level graphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove brace of start value list

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Apply docs website feedback to ppl functions (#5207)

* apply doc website feedback to ppl functions

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* take out comments

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix links

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* feat(api): Add profiling support to unified query API (#5268)

Add query profiling infrastructure that measures time spent in each
query phase (analyze, optimize, execute, format). Profiling is opt-in
via UnifiedQueryContext.builder().profiling(true) and uses thread-local
context to avoid passing profiling state through every method.

Key changes:
- QueryProfiling/ProfileContext for thread-local profiling lifecycle
- UnifiedQueryContext.measure() API for timing arbitrary phases
- Auto-profiling in UnifiedQueryPlanner (analyze) and compiler (optimize)
- UnifiedQueryTestBase shared test fixture for unified query tests
- Comprehensive profiling tests with non-flaky >= 0 timing assertions

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UnifiedQueryParser with language-specific implementations (#5274)

Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser
interface with language-specific implementations: PPLQueryParser (returns
UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode).

UnifiedQueryContext owns the parser instance, created eagerly by the
builder which has direct access to query type and future SQL config.
Each implementation receives only its required dependencies:
PPLQueryParser takes Settings, CalciteSqlQueryParser takes
CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains
the parser from the context via the interface type.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix flaky TPC-H Q1 test due to bugs in `MatcherUtils.closeTo()` (#5283)

* Fix the flaky tpch Q1

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Change to ULP-aware to handle floating-point precision differences

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: qianheng <qianheng@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Co-authored-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.6 bugFix flaky-test Flaky build or test issue testing Related to improving software testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][Test] Flaky CalcitePPLTpchIT on Linux arm64 when pushdown enabled

3 participants