Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
luigidellaquila
left a comment
There was a problem hiding this comment.
Thanks @GalLalouche, I had a first look at the PR and it looks awesome.
The output is a bit more verbose than the old tests, but definitely clear and manageable.
I left a comment, but I'll need some time to check all the details in depth.
| new EsqlFunctionRegistry(), | ||
| CsvTests.loadIndexResolution(CsvTests.testDatasets(parsedStatement)), | ||
| defaultLookupResolution(), | ||
| new EnrichResolution(), |
There was a problem hiding this comment.
I like the idea of reusing CSV tests datasets.
nit: The only thing that concerns me a bit here is that we are mixing resolutions from three different places (CSV indices, AnalyzerTestUtils lookup resolution and an empty EnrichResolution), making it harder to understand what data we have when we write new tests, or how to add new datasets.
Long term, it would be good to get all of them from a single place, taking as a single source the datasets defined in CsvTestsDataLoader.
This is for a follow-up PR for sure.
luigidellaquila
left a comment
There was a problem hiding this comment.
I left two more hints to make the CI green
| "mapping-default-incompatible.json", | ||
| "employees_incompatible.csv" | ||
| ).noSubfields(); | ||
| private static final TestDataset ALL_TYPES = new TestDataset("test", "mapping-all-types.json", "test.csv"); |
There was a problem hiding this comment.
test.csv file doesn't seem to exist, that's the reason for the CI failures.
I tried to add an empty file with that name and the tests seem to pass, but I don't know if that was the intention (or if you had some additional data)
| import org.elasticsearch.xpack.esql.session.Versioned; | ||
| import org.elasticsearch.xpack.esql.stats.SearchStats; | ||
|
|
||
| import java.io.File; |
There was a problem hiding this comment.
Some APIs are forbidden in ES (eg. java.io.File), you'll have to replace them (Eg. with java.nio.file) to make the CI happy.
You can check API usage with
./gradlew :x-pack:plugin:esql:forbiddenApisTest
it will give you a full list of violations
luigidellaquila
left a comment
There was a problem hiding this comment.
Added another round of comments.
I'll wait for the CI to be green, but apart from that I think it's very good 🚀
| * </pre> | ||
| */ | ||
| final StringBuilder treeString(StringBuilder sb, int depth, BitSet hasParentPerDepth) { | ||
| public String goldenTestToString() { |
There was a problem hiding this comment.
nit: It would be good to have some javadocs here, explaining the difference between goldenTestToString and goldenTestNodeString
There was a problem hiding this comment.
This code doesn't exist anymore but, I added some docs to the relevant format enum.
| List<Object> props = nodeProperties(); | ||
| for (Object prop : props) { | ||
| if (prop instanceof NameId) { | ||
| continue; |
There was a problem hiding this comment.
Maybe it's irrelevant, but instead of skipping, we could add a constant placeholder (eg. "$NameId$"), just to distinguish the cases where this element is present vs absent.
There was a problem hiding this comment.
I've changed this to use a # prefix. This way, I'm removing them in the golden tests using regex, as suggested by @astefan, but we don't lose this information.
| When a golden test fails, the test runner will create a new `.actual` file, in the same directory as | ||
| the `.expected` file. `.actual` files are git ignored by default, you can manually delete them after | ||
| inspection, or if you wish to avoid the `.actual` file creation, you can use `-Dgolden.noactual`. | ||
| If you want to automatically delete them on test success, you can use `-Dgolden.cleanactuals`. |
There was a problem hiding this comment.
-Dgolden.cleanactuals should be -Dgolden.cleanactual
There was a problem hiding this comment.
Nice catch, thanks!
astefan
left a comment
There was a problem hiding this comment.
Didn't go through all the changes, I have a big ask before I continue reviewing: don't change the core classes of ESQL for something that is inherently testing related. Externalize whatever you need from Node and all related Node classes in code that is testing related only.
|
|
||
| @Override | ||
| protected String propertiesToString(boolean skipIfChild) { | ||
| return super.propertiesToString(false /* skipIfChild */); |
There was a problem hiding this comment.
What's with the inline comment? This code style is inconsistent with the rest of the code in ESQL (maybe ES itself, not sure).
There was a problem hiding this comment.
My way of avoiding Boolean blindness :) Any way, I refactored this code, and the new version uses an enum instead.
| * Like the above, but ensures consistent tree rendering for golden tests. Overrides of this method should avoid adding information | ||
| * that can change between repeated test runs, such as object identities or pseudo-random generated names. | ||
| */ | ||
| public String goldenTestNodeString() { |
There was a problem hiding this comment.
Why are you modifying all these core classes of ESQL for something that is inherently testing related?
This is production code that should be transparent to whatever testing infrastructure needs.
There was a problem hiding this comment.
I've refactored the code so the changes to production classes isn't as bad, but some classes their production code had to change since they truncated their lists. I opted to make the old nodeString method final, to make implemented who wish to override this implementation take the new format enum, as it would ensure that future classes would be aware of the requested format, and also made sure I haven't missed any cases.
As for the named identifiers, I'm removing those via regex instead of doing in production, so that code is much shorter now!
… also sourceText by default.
| } | ||
|
|
||
| @Override | ||
| public String toString() { |
There was a problem hiding this comment.
@astefan I think here it would have been safer to toString() final to ensure all implementers are updated and pass the format to recursive call correctly, but I've avoided that for now to minimize production classes changes, as there's 28 classes to update at least!
| return null; | ||
| } | ||
| if (randomBoolean() && result.contains("*") == false) { | ||
| if (randomBoolean() && result.contains("*") == false && result.startsWith("`") == false) { |
There was a problem hiding this comment.
@luigidellaquila Since I changed the name to already be wrapped with `, I had to fix this.
| .collect(Collectors.toList()); | ||
| } | ||
|
|
||
| private static String normalizeColumnName(String name) { |
There was a problem hiding this comment.
@luigidellaquila Since I added a new column with - in it, I changed this.
There was a problem hiding this comment.
It makes sense.
I think we never get quoted column names here so we should be OK for now, but maybe it makes sense to add an && name.startsWith("`") == false just to be sure.
| import java.nio.charset.StandardCharsets; | ||
|
|
||
| /** Adds -Dgolden.bulldoze to the reproduction line for golden test failures. */ | ||
| public class GoldenTestReproduceInfoPrinter extends RunListener { |
There was a problem hiding this comment.
I'm afraid the CI won't accept classes that extend RunListener directly
There was a problem hiding this comment.
It's actually dumber than that: it thinks it's a test class because it has method that start with test 🤦. Apparently, making this a nested static class works though.
astefan
left a comment
There was a problem hiding this comment.
I've went through another round with the PR. I think the idea of the golden tests is good, it needs some polishing though.
Also, I really believe that at least some of the examples provided (and their expected files) shouldn't be merged. Some of the tests are not, imho, good examples of golden testing.
| cause the golden test to fail nonetheless if you are verifying that the logical plan hasn't changed. | ||
|
|
||
| One should feel comfortable bulldozing failed tests if one expects changes in the underlying tree. | ||
| Golden tests should be the *first* line of defense, not the only one! |
There was a problem hiding this comment.
I would expand on this idea by encouraging to write additional unit tests to check for some specific things that cannot be verified through the plan String representation. In these optimization steps we don't always check only the string output of the resulting plan.
For example, LogicalPlanOptimizerTests:355:
assertThat(agg.aggregates().get(0).id(), equalTo(Expressions.attribute(agg.groupings().get(0)).id()));
LogicalPlanOptimizerTests:548:
assertWarnings(
"No limit defined, adding default limit of [1000]",
"Line 2:28: Field 'b' shadowed by field at line 2:47",
"Line 2:9: Field 'b' shadowed by field at line 2:47"
);
And there are also some inline stats related tests which, due to the specific characteristics of this command, they have an additional "mocking" step to help with some more advanced checks. See LogicalPlanOptimizerTests.testDoubleInlineStatsPrunning_With_MV_Functions.
| int maxProperties = switch (format) { | ||
| case LIMITED -> TO_STRING_MAX_PROP; | ||
| case FULL -> Integer.MAX_VALUE; | ||
| }; |
There was a problem hiding this comment.
Pass max width and max props values as parameters to NodeStringFormat enum values and here and down below use that enum's value directly without this switch here and the one below.
Something like this
public enum NodeStringFormat {
/** No list truncation, no line breaks due to string width. */
FULL(Integer.MAX_VALUE, Integer.MAX_VALUE),
/** List truncation and line breaks due to string width applied. */
LIMITED(TO_STRING_MAX_PROP, TO_STRING_MAX_WIDTH);
private int toStringMaxPropsCount;
private int toStringMaxWidth;
NodeStringFormat(int toStringMaxPropsCount, int toStringMaxWidth) {
this.toStringMaxPropsCount = toStringMaxPropsCount;
this.toStringMaxWidth = toStringMaxWidth;
}
}
There was a problem hiding this comment.
Ah, of course! Thank you!
| logger.info("Bulldozing file {}", outputPath); | ||
| return createNewOutput(outputPath, plan); | ||
| } else { | ||
| if (Files.exists(outputPath)) { |
There was a problem hiding this comment.
It's not clear to me what happens if the expected output file doesn't exist and in fact it should. I see in this piece of code that it's auto-generated if it doesn't exist.
Shouldn't there be the case that tells the author of the test to "generate the output file first or use this option to auto-generate it"? Or there shouldn't be a need for this and always auto-creating it is fine?
There was a problem hiding this comment.
It's always created if it doesn't exist. From the README:
The test runner will pick up the "orphan" test and create the required outputs (see next section)
based on the test name.
You can also specify which plans you want to verify (all plans are output by default)
| protected enum Stage { | ||
| ANALYZER, | ||
| LOGICAL, | ||
| PHYSICAL, | ||
| // There's no LOCAL_LOGICAL here since in product we use PlannerUtils.localPlan to produce the local physical plan directly from | ||
| // non-local physical plan. | ||
| LOCAL_PHYSICAL | ||
| } |
There was a problem hiding this comment.
Since this is called a Stage, I think it would be more appropriate to have the values named as ANALYSIS, LOGICAL_OPTIMIZATION, PHYSICAL_OPTIMIZATION, LOCAL_PHYSICAL_OPTIMIZATION.
| return s.lines().map(String::strip).collect(Collectors.joining("\n")); | ||
| } | ||
|
|
||
| private static String extractTestName() { |
There was a problem hiding this comment.
I don't have experience with this low-level way of dealing with tests, but gut-feeling wise I'd say the ES code base should have something for finding out test names, test classes names etc.
I quickly searched and I found that ESTestCase has something called getTestName(). I'd look into that to see maybe reinventing the wheel here is not needed.
| private <T extends QueryPlan<T>> TestResult verifyOrWrite(T plan, Stage stage) throws IOException { | ||
| var outputPath = outputPath(stage); | ||
| if (System.getProperty("golden.bulldoze") != null) { | ||
| logger.info("Bulldozing file {}", outputPath); |
There was a problem hiding this comment.
Do you maybe have an alternative to "Bulldoze", something more techy/IT-specific? I'd resonate more with a word like that. But, if you feel strongly about this one, we can keep it.
There was a problem hiding this comment.
This was just the term we used at my previous place :) Renamed to overwrite.
| return; | ||
| } | ||
| if (isGoldenTest(failure)) { | ||
| printToErr(captureDelegate(failure).replace("REPRODUCE WITH:", "BULLDOZE WITH:") + " -Dgolden.bulldoze"); |
There was a problem hiding this comment.
I would have used the logger everywhere if I were to write such a test class. Why the use of System.err.println?
There was a problem hiding this comment.
This follows the printing semantics of ReproduceInfoPrinter, which also writes to System.err.println: https://github.com/elastic/elasticsearch/blob/main/test/framework/src/main/java/org/elasticsearch/test/junit/listeners/ReproduceInfoPrinter.java#L113
| @@ -0,0 +1,1793 @@ | |||
| LimitExec[1000[INTEGER],null] | |||
There was a problem hiding this comment.
If I were to search and double check the test itself and the test output file, I think it would have been easier for this folder to have the name of the test itself, meaning testRoundToTransformToQueryAndTagsWithCustomizedUpperLimit no underscores anywhere, just the same name as the test. In the IDE I would simply copy the test name and then search for the file with the same name (pasted from the test class).
Also, tbh this expected file is much much harder to grasp and reason about than the actual test and imho this is not a good example of golden testing being a better alternative. The file is huge I don't humanly see how someone would scroll the entire thing and double check it.
There was a problem hiding this comment.
- Renamed the paths to use camelCase instead of underscore_case.
- I agree that some of the golden tests end up with a huge amount of filler, rendering them useless. I think it might be salvageable by allowing tests to specify how the output is being printed: for example, in this case most of the lines are just from the
round_tobuckets being printed. This test would still have merit even if it ignored that portion, but that's probably a task for another PR! I've remove all the tests that ended up in huge outputs.
astefan
left a comment
There was a problem hiding this comment.
LGTM. Thank you, @GalLalouche!
This PR adds golden tests to ES|QL! ● I added a GoldenTestsReadme.md with instructions on how to add new golden tests. ● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local node_reduce), etc. ● As a proof of concept, I've re-implemented (most) of the tests in SubstitudeRoundToTests using the golden testing framework. ○ When I recently worked on these tests (elastic#138023,elastic#138765), the majority of my time was spent mechanically fixing the expected output in the tests. ○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests. ○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down. ○ One of the tests (testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.
This PR adds golden tests to ES|QL!
● I added a GoldenTestsReadme.md with instructions on how to add new golden tests.
● I tried to keep this (relatively) minimal for the first PR. In the future we can add more features, configurations, stages (e.g., local
node_reduce), etc.● As a proof of concept, I've re-implemented (most) of the tests in
SubstitudeRoundToTestsusing the golden testing framework.○ When I recently worked on these tests (#138023,#138765), the majority of my time was spent mechanically fixing the expected output in the tests.
○ This was a good use case since I needed to add more support for test nesting due to the loops in the original tests.
○ While more text is being produced when using golden tests (70k lines!), fixing the expected output is a lot simpler, since you can just bulldoze the output! In this case, eyeballing the difference is enough to be convinced that the change is correct. So the maintenance cost goes down.
○ One of the tests (
testRoundToWithTimeSeriesIndices) was skipped because its output isn't actually consistent across runs. The original test only verifies a very specific subset of the AST, so it's not a good candidate for golden tests.