ESQL: Golden tests by GalLalouche · Pull Request #139598 · elastic/elasticsearch

GalLalouche · 2025-12-16T12:30:03Z

This PR adds golden tests to ES|QL!

● I added a GoldenTestsReadme.md with instructions on how to add new golden tests.
● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc.
● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework.
○ When I recently worked on these tests (#138023,#138765), the majority of my time was spent mechanically fixing the expected output in the tests.
○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests.
○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down.
○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.

elasticsearchmachine · 2025-12-16T12:30:27Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

luigidellaquila

Thanks @GalLalouche, I had a first look at the PR and it looks awesome.

The output is a bit more verbose than the old tests, but definitely clear and manageable.

I left a comment, but I'll need some time to check all the details in depth.

luigidellaquila · 2025-12-23T16:13:38Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+                    new EsqlFunctionRegistry(),
+                    CsvTests.loadIndexResolution(CsvTests.testDatasets(parsedStatement)),
+                    defaultLookupResolution(),
+                    new EnrichResolution(),


I like the idea of reusing CSV tests datasets.

nit: The only thing that concerns me a bit here is that we are mixing resolutions from three different places (CSV indices, AnalyzerTestUtils lookup resolution and an empty EnrichResolution), making it harder to understand what data we have when we write new tests, or how to add new datasets.

Long term, it would be good to get all of them from a single place, taking as a single source the datasets defined in CsvTestsDataLoader.

This is for a follow-up PR for sure.

luigidellaquila

I left two more hints to make the CI green

luigidellaquila · 2025-12-30T14:05:01Z

...ugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java

        "mapping-default-incompatible.json",
        "employees_incompatible.csv"
    ).noSubfields();
+    private static final TestDataset ALL_TYPES = new TestDataset("test", "mapping-all-types.json", "test.csv");


test.csv file doesn't seem to exist, that's the reason for the CI failures.
I tried to add an empty file with that name and the tests seem to pass, but I don't know if that was the intention (or if you had some additional data)

luigidellaquila · 2025-12-30T14:12:26Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+import org.elasticsearch.xpack.esql.session.Versioned;
+import org.elasticsearch.xpack.esql.stats.SearchStats;
+
+import java.io.File;


Some APIs are forbidden in ES (eg. java.io.File), you'll have to replace them (Eg. with java.nio.file) to make the CI happy.

You can check API usage with

./gradlew :x-pack:plugin:esql:forbiddenApisTest

it will give you a full list of violations

luigidellaquila

Added another round of comments.

I'll wait for the CI to be green, but apart from that I think it's very good 🚀

luigidellaquila · 2025-12-30T15:28:59Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/tree/Node.java

-     * </pre>
-     */
-    final StringBuilder treeString(StringBuilder sb, int depth, BitSet hasParentPerDepth) {
+    public String goldenTestToString() {


nit: It would be good to have some javadocs here, explaining the difference between goldenTestToString and goldenTestNodeString

This code doesn't exist anymore but, I added some docs to the relevant format enum.

luigidellaquila · 2025-12-30T15:34:17Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/tree/Node.java

        List<Object> props = nodeProperties();
        for (Object prop : props) {
+            if (prop instanceof NameId) {
+                continue;


Maybe it's irrelevant, but instead of skipping, we could add a constant placeholder (eg. "$NameId$"), just to distinguish the cases where this element is present vs absent.

I've changed this to use a # prefix. This way, I'm removing them in the golden tests using regex, as suggested by @astefan, but we don't lose this information.

luigidellaquila · 2025-12-30T15:56:21Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestsReadme.MD

+When a golden test fails, the test runner will create a new `.actual` file, in the same directory as
+the `.expected` file. `.actual` files are git ignored by default, you can manually delete them after
+inspection, or if you wish to avoid the `.actual` file creation, you can use `-Dgolden.noactual`.
+If you want to automatically delete them on test success, you can use `-Dgolden.cleanactuals`.


-Dgolden.cleanactuals should be -Dgolden.cleanactual

Nice catch, thanks!

astefan

Didn't go through all the changes, I have a big ask before I continue reviewing: don't change the core classes of ESQL for something that is inherently testing related. Externalize whatever you need from Node and all related Node classes in code that is testing related only.

astefan · 2025-12-31T09:42:21Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/expression/Expression.java

+
+    @Override
+    protected String propertiesToString(boolean skipIfChild) {
+        return super.propertiesToString(false /* skipIfChild */);


What's with the inline comment? This code style is inconsistent with the rest of the code in ESQL (maybe ES itself, not sure).

My way of avoiding Boolean blindness :) Any way, I refactored this code, and the new version uses an enum instead.

astefan · 2025-12-31T09:45:33Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/tree/Node.java

+     * Like the above, but ensures consistent tree rendering for golden tests. Overrides of this method should avoid adding information
+     * that can change between repeated test runs, such as object identities or pseudo-random generated names.
+     */
+    public String goldenTestNodeString() {


Why are you modifying all these core classes of ESQL for something that is inherently testing related?
This is production code that should be transparent to whatever testing infrastructure needs.

I've refactored the code so the changes to production classes isn't as bad, but some classes their production code had to change since they truncated their lists. I opted to make the old nodeString method final, to make implemented who wish to override this implementation take the new format enum, as it would ensure that future classes would be aware of the requested format, and also made sure I haven't missed any cases.

As for the named identifiers, I'm removing those via regex instead of doing in production, so that code is much shorter now!

… also sourceText by default.

GalLalouche · 2026-01-05T19:20:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/tree/Node.java

    }

    @Override
    public String toString() {


@astefan I think here it would have been safer to toString() final to ensure all implementers are updated and pass the format to recursive call correctly, but I've avoided that for now to minimize production classes changes, as there's 28 classes to update at least!

GalLalouche · 2026-01-06T11:09:26Z

...qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/EsqlQueryGenerator.java

            return null;
        }
-        if (randomBoolean() && result.contains("*") == false) {
+        if (randomBoolean() && result.contains("*") == false && result.startsWith("`") == false) {


@luigidellaquila Since I changed the name to already be wrapped with `, I had to fix this.

GalLalouche · 2026-01-06T11:09:58Z

...server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java

            .collect(Collectors.toList());
    }

+    private static String normalizeColumnName(String name) {


@luigidellaquila Since I added a new column with - in it, I changed this.

It makes sense.
I think we never get quoted column names here so we should be OK for now, but maybe it makes sense to add an && name.startsWith("`") == false just to be sure.

luigidellaquila · 2026-01-07T08:27:13Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestReproduceInfoPrinter.java

+import java.nio.charset.StandardCharsets;
+
+/** Adds -Dgolden.bulldoze to the reproduction line for golden test failures. */
+public class GoldenTestReproduceInfoPrinter extends RunListener {


I'm afraid the CI won't accept classes that extend RunListener directly

It's actually dumber than that: it thinks it's a test class because it has method that start with test 🤦. Apparently, making this a nested static class works though.

luigidellaquila

LGTM, thanks!

astefan

I've went through another round with the PR. I think the idea of the golden tests is good, it needs some polishing though.
Also, I really believe that at least some of the examples provided (and their expected files) shouldn't be merged. Some of the tests are not, imho, good examples of golden testing.

astefan · 2026-01-08T12:20:29Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestsReadme.MD

+cause the golden test to fail nonetheless if you are verifying that the logical plan hasn't changed.
+
+One should feel comfortable bulldozing failed tests if one expects changes in the underlying tree.
+Golden tests should be the *first* line of defense, not the only one!


I would expand on this idea by encouraging to write additional unit tests to check for some specific things that cannot be verified through the plan String representation. In these optimization steps we don't always check only the string output of the resulting plan.

For example, LogicalPlanOptimizerTests:355:

assertThat(agg.aggregates().get(0).id(), equalTo(Expressions.attribute(agg.groupings().get(0)).id()));

LogicalPlanOptimizerTests:548:

assertWarnings( "No limit defined, adding default limit of [1000]", "Line 2:28: Field 'b' shadowed by field at line 2:47", "Line 2:9: Field 'b' shadowed by field at line 2:47" );

And there are also some inline stats related tests which, due to the specific characteristics of this command, they have an additional "mocking" step to help with some more advanced checks. See LogicalPlanOptimizerTests.testDoubleInlineStatsPrunning_With_MV_Functions.

astefan · 2026-01-08T12:35:07Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/core/tree/Node.java

+        int maxProperties = switch (format) {
+            case LIMITED -> TO_STRING_MAX_PROP;
+            case FULL -> Integer.MAX_VALUE;
+        };


Pass max width and max props values as parameters to NodeStringFormat enum values and here and down below use that enum's value directly without this switch here and the one below.

Something like this

public enum NodeStringFormat { /** No list truncation, no line breaks due to string width. */ FULL(Integer.MAX_VALUE, Integer.MAX_VALUE), /** List truncation and line breaks due to string width applied. */ LIMITED(TO_STRING_MAX_PROP, TO_STRING_MAX_WIDTH); private int toStringMaxPropsCount; private int toStringMaxWidth; NodeStringFormat(int toStringMaxPropsCount, int toStringMaxWidth) { this.toStringMaxPropsCount = toStringMaxPropsCount; this.toStringMaxWidth = toStringMaxWidth; } }

Ah, of course! Thank you!

astefan · 2026-01-08T13:36:57Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+                logger.info("Bulldozing file {}", outputPath);
+                return createNewOutput(outputPath, plan);
+            } else {
+                if (Files.exists(outputPath)) {


It's not clear to me what happens if the expected output file doesn't exist and in fact it should. I see in this piece of code that it's auto-generated if it doesn't exist.
Shouldn't there be the case that tells the author of the test to "generate the output file first or use this option to auto-generate it"? Or there shouldn't be a need for this and always auto-creating it is fine?

It's always created if it doesn't exist. From the README:

The test runner will pick up the "orphan" test and create the required outputs (see next section) based on the test name. You can also specify which plans you want to verify (all plans are output by default)

astefan · 2026-01-08T14:24:00Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+    protected enum Stage {
+        ANALYZER,
+        LOGICAL,
+        PHYSICAL,
+        // There's no LOCAL_LOGICAL here since in product we use PlannerUtils.localPlan to produce the local physical plan directly from
+        // non-local physical plan.
+        LOCAL_PHYSICAL
+    }


Since this is called a Stage, I think it would be more appropriate to have the values named as ANALYSIS, LOGICAL_OPTIMIZATION, PHYSICAL_OPTIMIZATION, LOCAL_PHYSICAL_OPTIMIZATION.

astefan · 2026-01-08T14:56:30Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+        return s.lines().map(String::strip).collect(Collectors.joining("\n"));
+    }
+
+    private static String extractTestName() {


I don't have experience with this low-level way of dealing with tests, but gut-feeling wise I'd say the ES code base should have something for finding out test names, test classes names etc.

I quickly searched and I found that ESTestCase has something called getTestName(). I'd look into that to see maybe reinventing the wheel here is not needed.

astefan · 2026-01-08T15:07:13Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+        private <T extends QueryPlan<T>> TestResult verifyOrWrite(T plan, Stage stage) throws IOException {
+            var outputPath = outputPath(stage);
+            if (System.getProperty("golden.bulldoze") != null) {
+                logger.info("Bulldozing file {}", outputPath);


Do you maybe have an alternative to "Bulldoze", something more techy/IT-specific? I'd resonate more with a word like that. But, if you feel strongly about this one, we can keep it.

This was just the term we used at my previous place :) Renamed to overwrite.

astefan · 2026-01-08T15:20:16Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/GoldenTestCase.java

+                return;
+            }
+            if (isGoldenTest(failure)) {
+                printToErr(captureDelegate(failure).replace("REPRODUCE WITH:", "BULLDOZE WITH:") + " -Dgolden.bulldoze");


I would have used the logger everywhere if I were to write such a test class. Why the use of System.err.println?

This follows the printing semantics of ReproduceInfoPrinter, which also writes to System.err.println: https://github.com/elastic/elasticsearch/blob/main/test/framework/src/main/java/org/elasticsearch/test/junit/listeners/ReproduceInfoPrinter.java#L113

astefan · 2026-01-08T15:27:21Z

...orm_to_query_and_tags_with_customized_upper_limit/cluster_0/query_60/local_physical.expected

@@ -0,0 +1,1793 @@
+LimitExec[1000[INTEGER],null]


If I were to search and double check the test itself and the test output file, I think it would have been easier for this folder to have the name of the test itself, meaning testRoundToTransformToQueryAndTagsWithCustomizedUpperLimit no underscores anywhere, just the same name as the test. In the IDE I would simply copy the test name and then search for the file with the same name (pasted from the test class).

Also, tbh this expected file is much much harder to grasp and reason about than the actual test and imho this is not a good example of golden testing being a better alternative. The file is huge I don't humanly see how someone would scroll the entire thing and double check it.

Renamed the paths to use camelCase instead of underscore_case.

I agree that some of the golden tests end up with a huge amount of filler, rendering them useless. I think it might be salvageable by allowing tests to specify how the output is being printed: for example, in this case most of the lines are just from the round_to buckets being printed. This test would still have merit even if it ignored that portion, but that's probably a task for another PR! I've remove all the tests that ended up in huge outputs.

astefan

LGTM. Thank you, @GalLalouche!

This PR adds golden tests to ES|QL! ● I added a GoldenTestsReadme.md with instructions on how to add new golden tests. ● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc. ● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework. ○ When I recently worked on these tests (elastic#138023,elastic#138765), the majority of my time was spent mechanically fixing the expected output in the tests. ○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests. ○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down. ○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.

ESQL: Golden tests

fa96100

GalLalouche added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Dec 16, 2025

elasticsearchmachine added the v9.3.0 label Dec 16, 2025

Remove test.policy

371d234

elasticsearchmachine added v9.4.0 and removed v9.3.0 labels Dec 17, 2025

Merge branch 'main' into feature/golden_tests

b0dcef9

luigidellaquila self-requested a review December 23, 2025 08:18

luigidellaquila reviewed Dec 23, 2025

View reviewed changes

luigidellaquila reviewed Dec 30, 2025

View reviewed changes

astefan requested changes Dec 31, 2025

View reviewed changes

GalLalouche added 11 commits December 31, 2025 12:42

Merge branch 'main' into feature/golden_tests

d8633d8

TEMP: Undo production changes

f004409

Simplify golden test overriding

84cab3a

Replace enum with boolean

1e380cc

Refactor NodeUtils

56209f4

Small CR fixes and forbidden API shenanigans

c15347b

Fix missing test file, rename to all_types

d962862

Add docs to format

f5e1392

Merge branch 'main' into feature/golden_tests

32141d1

Move all-types.csv to esql

7702f83

Unignore .md files, since those are no longer generated.

38e6012

GalLalouche requested review from astefan and luigidellaquila January 5, 2026 17:12

GalLalouche added 2 commits January 5, 2026 21:11

Small refactors, fix Literal bug

30e9f75

Small fix to Expression to make it so its default toString(Format) is…

283b83d

… also sourceText by default.

GalLalouche commented Jan 5, 2026

View reviewed changes

Fix generative test

cf0e5d5

GalLalouche commented Jan 6, 2026

View reviewed changes

luigidellaquila reviewed Jan 7, 2026

View reviewed changes

GalLalouche added 2 commits January 7, 2026 12:51

Fix GoldenTestReproduceInfoPrinter

3d9501d

CR fix

3c3a97c

luigidellaquila approved these changes Jan 8, 2026

View reviewed changes

astefan reviewed Jan 8, 2026

View reviewed changes

GalLalouche added 4 commits January 15, 2026 22:30

Some CR fixes, not done

770225a

More fixes

b4d6dd2

Merge branch 'main' into feature/golden_tests

619e6d6

Merge

f687e49

GalLalouche requested a review from astefan January 16, 2026 14:22

astefan approved these changes Jan 19, 2026

View reviewed changes

Merge branch 'main' into feature/golden_tests

a4a4d87

GalLalouche merged commit b35518e into elastic:main Jan 19, 2026
35 checks passed

Conversation

GalLalouche commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 16, 2025

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

luigidellaquila Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GalLalouche commented Dec 16, 2025 •

edited

Loading

luigidellaquila Dec 23, 2025 •

edited

Loading