[ES|QL] add reverse function#113297
Conversation
|
Hi @drewdaemon, I've created a changelog YAML for you. |
…n/elasticsearch into 98545/reverse-function
…98545/reverse-function
| message:keyword | message_reversed:keyword | ||
| 😂😁😅😐🥺😢😭 | 😭😢🥺😐😅😁😂 | ||
| // end::reverseEmoji-result[] |
There was a problem hiding this comment.
Nice - small comment, I would use different emoticons to better signal visually the reversal: 🏡🚌✈ becomes ✈ 🚌🏡
There was a problem hiding this comment.
Additionally add also a string that combines and intertwines regular chars + emoticons + extended ones (こんにちは) and check they are properly reversed (namely that the encoding is preserved).
There was a problem hiding this comment.
Nice - small comment, I would use different emoticons to better signal visually the reversal
I think of the current ones like a sort of easter egg in the docs... you have to squint a bit and then you're rewarded with "getting it."
That said, I defer to you — LMK
cc @leemthompo in case he has input
There was a problem hiding this comment.
I think Costin's right that it would be easier with a bit more visual variance in the emojis but we're knee deep in nit territory here 😆
There was a problem hiding this comment.
Actually Liam this is the most important conversation going on in this PR right now 😆
Okay, I'll take another stab at emoji art 🖼️
There was a problem hiding this comment.
My battle for these emojis is met with opposition at every turn... first it was our handling of unicode graphemes and now it's a custom font on the docs website... will I prevail?....
Screen.Recording.2024-09-26.at.4.01.39.PM.mov
| def(Locate.class, Locate::new, "locate"), | ||
| def(Repeat.class, Repeat::new, "repeat"), | ||
| def(Space.class, Space::new, "space") }, | ||
| def(Trim.class, Trim::new, "trim") }, |
There was a problem hiding this comment.
All JetBrains :)
…98545/reverse-function
bpintea
left a comment
There was a problem hiding this comment.
Some minor/optional drive-by comments only.
| return reversed.toString(); | ||
| } | ||
|
|
||
| private static boolean isOneByteUTF8(BytesRef ref) { |
There was a problem hiding this comment.
I understood ASCII vs unicode to be distinct from the UTF encoding so I tried to be specific.
Let me know if I misunderstood!
There was a problem hiding this comment.
I've used words like "low ascii" or "one bytes utf-8" for the same thing. Either is fine by me. But in grand java tradition - what you've written is quite descriptive.
There was a problem hiding this comment.
Maybe isUTF8ASCII or something?
There was a problem hiding this comment.
I find isOneByteUTF8 clearer than other alternatives.
UTF8 encoding code points corresponding to ASCII on one byte is known, but maybe not "well enough" known. Let's leave it as is.
There was a problem hiding this comment.
I think isOneByteUTF8 the best, yeah.
| private static void reverseArray(byte[] array, int start, int end) { | ||
| while (start < end) { | ||
| byte temp = array[start]; | ||
| array[start] = array[end]; | ||
| array[end] = temp; | ||
| start++; | ||
| end--; | ||
| } | ||
| } |
There was a problem hiding this comment.
Would be nice to add this back to org.elasticsearch.common.util.ArrayUtils.
| if (isOneByteUTF8(val)) { | ||
| // this is the fast path. we know we can just reverse the bytes. | ||
| BytesRef reversed = BytesRef.deepCopyOf(val); | ||
| reverseArray(reversed.bytes, reversed.offset, reversed.offset + reversed.length - 1); |
There was a problem hiding this comment.
This scans an ASCII text for three times (even though not necessarily byte-by-byte).
I guess the first pass could attempt a reverse already, though that would imply allocation for non-ASCII text.
I guess it's fine as is, probably not making a practical difference either way, can be optimised if ever needed.
There was a problem hiding this comment.
True. At least we're in O(n).
can be optimised if ever needed
Agreed
|
Hmmm, these failing tests appear to be a Catch-22 centered on this CSV test. When I run I'm not really sure what to do about this. Could someone take a look? |
That's the best catch. I'll look. |
|
Okay, now it looks like the warning returned in the multi-node scenario is not identifying the line and column number correctly:
In the CSV test, the warning is |
|
Ah! That's you actually! I'll link. |
|
|
||
| @Override | ||
| public void writeTo(StreamOutput out) throws IOException { | ||
| out.writeNamedWriteable(field()); |
There was a problem hiding this comment.
Add source.writeTo(out) and read it with this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class). Then you'll need to flip the test - it is setup with that alwaysempty thing.
After that the source should appear.
This is a left over from when serializing Source was really expensive. Back then we didn't serialize it. We should serialize it because it's really cheap now - just a few bytes.
There was a problem hiding this comment.
I realized UnaryScalarFunction already takes care of this, so I simplified the constructor in ad7c68f
…98545/reverse-function
…98545/reverse-function
💚 Backport successful
|
Adds a REVERSE string function
Adds a REVERSE string function
Part of #98545
Adds a
REVERSEfunction to the ES|QL language.REVERSEreverses the characters in a string. If you want to know more, read the docs!