ESQL: Different approach to keeping track of fields coming from indices#143944
ESQL: Different approach to keeping track of fields coming from indices#143944elasticsearchmachine merged 16 commits intoelastic:mainfrom
Conversation
full-text functions limitations
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
luigidellaquila
left a comment
There was a problem hiding this comment.
Thanks @astefan, I like the approach very much.
This is not an easy topic, probably as complex as field resolution.
I left a couple of comments regarding how we deal with existing columns, I think there is room for improvement, but this is already a step forward compared to the previous implementation.
| } | ||
| default -> { | ||
| // For commands that don't create named columns (KEEP, DROP, SORT, LIMIT, WHERE, etc.), | ||
| // any column not in previous is from the command (e.g. ENRICH, LOOKUP_JOIN, CHANGE_POINT) |
There was a problem hiding this comment.
Unfortunately this is only half of the story: a column could be in the previous output and still be overwritten by these commands.
To be safe, we should let single commands define which columns they "touched" and invalidate all of them; worst case, we can invalidate all the columns for commands we don't know. This will restrict the test surface a lot, but it will avoid false negatives.
There was a problem hiding this comment.
Thank you for thinking more about this aspect. I appreciate it.
I think this all boils down to the strategy we want/each adopts with generative testing: I tend to start with fewer restrictions and adjust along the way. This surfaces issues for new things we add (like full-text functions support, like new commands being added). If the adjustments become a burden, then more like a shotgun approach is needed (like the one you mentioned - columns "touched" become invalid). The drawback is the constant monitoring of generative testing.
For the moment, I'll keep the code here as is and see what/if things become unmanageable.
There was a problem hiding this comment.
I think it makes sense 👍
| Set<String> createdColumns = new HashSet<>(); | ||
|
|
||
| switch (commandName) { | ||
| case "eval" -> { |
There was a problem hiding this comment.
We also had a special check for MV_EXPAND before.
Is an expanded column still valid for full text functions?
There was a problem hiding this comment.
No, it is not. I rushed into removing the mv_expand check seeing that the issue is partially fixed, but that specific issue had a fix that forbade mv_expand with full-text functions. I added back the check.
…generativeIT_refine5
…sticsearch into generativeIT_refine5
…es (elastic#143944) This updates the Generative IT infra to keep track of fields that come from indices, helpful especially for full-text functions which, for now, can work only on fields that come from real indices. Also, there is some cleanup of errors that shouldn't be valid anymore, as the corresponding issues have been closed. AI-assisted PR.
…es (elastic#143944) This updates the Generative IT infra to keep track of fields that come from indices, helpful especially for full-text functions which, for now, can work only on fields that come from real indices. Also, there is some cleanup of errors that shouldn't be valid anymore, as the corresponding issues have been closed. AI-assisted PR.
This updates the Generative IT infra to keep track of fields that come from indices, helpful especially for full-text functions which, for now, can work only on fields that come from real indices.
Also, there is some cleanup of errors that shouldn't be valid anymore, as the corresponding issues have been closed.
AI-assisted PR.