Big arrays sliced from netty buffers (double)#90745
Big arrays sliced from netty buffers (double)#90745elasticsearchmachine merged 11 commits intoelastic:mainfrom
Conversation
Based on elastic#89668 but for doubles. This should allow aggregations down the road to read doubles values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
|
|
||
| static { | ||
| if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) { | ||
| throw new Error("The deserialization assumes this class is written with little-endian ints."); |
There was a problem hiding this comment.
yes, this was a copy paste error...
| for (int i = 0; i < pages.length - 1; i++) { | ||
| out.write(pages[i]); | ||
| } | ||
| out.write(pages[pages.length - 1], 0, lastPageEnd * Double.BYTES); |
There was a problem hiding this comment.
I wonder if we should share this code with the int/long/whatever other types. It's nearly the same code. 🤷 sounds like a good follow up change.
| public void testGetDoubleLE() { | ||
| // first bytes array = 1.2, second bytes array = 1.4, third bytes array = 1.6 | ||
| BytesReference[] refs = new BytesReference[] { | ||
| new BytesArray(new byte[] { 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, -0xD, 0x3F }), |
There was a problem hiding this comment.
Maybe we should make one big byte array and then randomly break it into smaller arrays?
… to reuse ArrayIndexOutOfBoundsException exception.
| // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception, | ||
| // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case. | ||
| // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task. | ||
| expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5)); |
There was a problem hiding this comment.
The additional the testGetDoubleLE() test sometimes causes the jvm to reuse the same AIOOB exception. These reused exceptions have no message and no stacktrace.
This reproduces when running:
./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8
And also failed in PR CI.
Running with OmitStackTraceInFastThrow disabled (is enabled by default) stops the exception reuse and test case doesn't fail without this modification:
./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8 -Dtests.jvm.argline="-XX:-OmitStackTraceInFastThrow"
We either need to ensure -XX:-OmitStackTraceInFastThrow is set when running gradle test task or adjust the test, which I have done now. I don't think asserting the exception message is that important?
There was a problem hiding this comment.
I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.
| -0x67, | ||
| -0x67, | ||
| -0x7, | ||
| 0x3F }; |
There was a problem hiding this comment.
I wonder if you could do:
byte[] data = new byte[3 * Double.BYTES];
ByteUtils.writeDoubleLE(data, 1.2, 0);
ByteUtils.writeDoubleLE(data, 1.4, Double.BYTES);
ByteUtils.writeDoubleLE(data, 1.6, 2 * Double.BYTES);
Would that be more readable? I'm not really sure.
| int length = Math.min(bytesPerChunk, data.length - offset); | ||
| refs.add(new BytesArray(data, offset, length)); | ||
| } | ||
| BytesReference comp = CompositeBytesReference.of(refs.toArray(BytesReference[]::new)); |
| // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception, | ||
| // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case. | ||
| // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task. | ||
| expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5)); |
There was a problem hiding this comment.
I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.
| public double getDoubleLE(int index) { | ||
| long bits = (long) (get(index + 7) & 0xFF) << 56 | (long) (get(index + 6) & 0xFF) << 48 | (long) (get(index + 5) & 0xFF) << 40 | ||
| | (long) (get(index + 4) & 0xFF) << 32 | (long) (get(index + 3) & 0xFF) << 24 | (get(index + 2) & 0xFF) << 16 | (get(index + 1) | ||
| & 0xFF) << 8 | get(index) & 0xFF; |
There was a problem hiding this comment.
I don't know if we want to do it in this PR or in a follow up, but we'll want this same bit twiddling for the long version, and might as well make this reusable for that case.
There was a problem hiding this comment.
👍 I will add getLongLE method to this class and the interface and a unit test for it.
| -0x67, | ||
| -0x67, | ||
| -0x7, | ||
| 0x3F }; |
There was a problem hiding this comment.
This seems like a good use case for the // tag::noformat - // end::noformat syntax to disable automatic formatting for a section. See the note in CONTRIBUTING.md. Personally, I would put the bytes for each double on one line.
|
|
||
| @Override | ||
| public double get(long index) { | ||
| if (index > Integer.MAX_VALUE / 8) { |
There was a problem hiding this comment.
Nit, but...
| if (index > Integer.MAX_VALUE / 8) { | |
| if (index > Integer.MAX_VALUE / Long.BYTES) { |
|
|
||
| @Override | ||
| public long size() { | ||
| return ref.length() / 8; |
There was a problem hiding this comment.
| return ref.length() / 8; | |
| return ref.length() / Long.BYTES; |
| // We can't serialize messages longer than 2gb anyway | ||
| throw new ArrayIndexOutOfBoundsException(); | ||
| } | ||
| return ref.getDoubleLE((int) index * 8); |
There was a problem hiding this comment.
| return ref.getDoubleLE((int) index * 8); | |
| return ref.getDoubleLE((int) index * Long.BYTES); |
|
|
||
| @Override | ||
| public void close() { | ||
| ref.decRef(); |
There was a problem hiding this comment.
Maybe this is obvious, but I'm not used to looking at this part of the code - where's the corresponding incRef for this?
There was a problem hiding this comment.
You'll love this. It's in the ctor - in readReleasableBytesReference. The way this links into the rest of the world is that ReleasableBytesReference#streamInput returns a specialized input that incs the ref when you call readReleasableBytesReference.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
original-brownbear
left a comment
There was a problem hiding this comment.
LGTM :) thanks Martijn! Just one nit about the endianness check ... but we can look into this whenever, just figured I'd point it out.
| */ | ||
| final class BigDoubleArray extends AbstractBigArray implements DoubleArray { | ||
|
|
||
| static { |
There was a problem hiding this comment.
Do we actually need this here as well? Maybe we can just make a static method for this somewhere at least since we really only use it once? Or put this in a bootstrap check? Seems strange to duplicate this check doesn't it?
There was a problem hiding this comment.
Or put this in a bootstrap check?
I like this idea. I will attempt this in a followup pr.
not-napoleon
left a comment
There was a problem hiding this comment.
LGTM, thank you for taking this!
Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
Move little endian byte order checks to a single bootstrap check. Originated from elastic#90745
Move little endian byte order checks to a single bootstrap check. Originated from #90745
Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
Based on #89668 but for doubles. This should allow aggregations down the road to read doubles values
directly from netty buffer, rather than copying it from the netty buffer.
Relates to #89437