Use a buffer to do character to byte conversion in StreamOutput#writeString by s1monw · Pull Request #21680 · elastic/elasticsearch

s1monw · 2016-11-19T14:10:37Z

Today we call writeByte up to 3x per character in each string written via
StreamOutput#writeString this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via writeBytes instead reducing the overhead of this operation.

Closes #21660

…String Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes elastic#21660

rjernst

Lgtm.

jpountz

Looks good. Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks. I don't mind doing it in this PR or a different one.

jpountz · 2016-11-19T16:32:05Z

core/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java

+        final int bufferSize = Math.min(3 * charCount, 1024); // at most 3 bytes per character is needed here
+        if (convertStringBuffer.length < bufferSize) {
+            convertStringBuffer = new byte[ArrayUtil.oversize(bufferSize, Byte.BYTES)];
+        }


I think you can replace the three above lines with just convertStringBuffer = ArrayUtil.grow(convertStringBuffer, bufferSize);

so I had this before but there is no need to copy the array since we are trashing it that's why I used oversize?

jpountz · 2016-11-19T18:47:32Z

core/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java

+            // make sure any possible char can fit into the buffer in any possible iteration
+            // we need at most 3 bytes so we flush the buffer once we have less than 3 bytes
+            // left before we start another iteration
+            if (offset >= buffer.length-3) {


I think it should be a strict greater than?

jpountz · 2016-11-19T18:48:37Z

core/src/test/java/org/elasticsearch/common/io/stream/BytesStreamsTests.java

+                }
+            }
+        }
+    }


maybe also test an explicit big string that only contains chars that are stored on 3 bytes?

s1monw · 2016-11-21T08:21:22Z

Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks.

I don't think we can unless we change the way we write these strings to the wire. We only know how many characters we need to read but don't know how of how many bytes they are composed. I think we can down the road change the way we read strings but then we have to do it on both ends. makes sense?

I mean we can read the min number of characters but this might make the loop quite complicated

s1monw · 2016-11-21T08:23:29Z

pushed some new commits @jpountz

jpountz

Fair enough. LGTM.

tlrx · 2016-11-21T08:38:04Z

I'm wondering if we could make a similar change for the other methods that are often used like writeInt, writeVInt etc?

s1monw · 2016-11-21T08:46:39Z

I'm wondering if we could make a similar change for the other methods that are often used like writeInt, writeVInt etc?

in those methods the number of writeByte is constant I think unless we inline all the complexity it' won't help nor be sustainable

jpountz

LGTM

habdank · 2016-11-21T09:20:48Z

Dears,

I see, you react very fast on my ticket #21660. Thanks!
I have one small question. Is it possible to get this fix as well in the 2.4.1 release?
If not may I get the patch separately, so I can patch the code my copy of the elasticsearch?
Thanks in advance.

Best regards,
Seweryn.

s1monw · 2016-11-21T09:30:59Z

@jpountz I think we can backport this to 2.4.x WDYT?

@habdank we are discussing the backport, yet I don't know when the next release will happen though

habdank · 2016-11-21T09:38:40Z

@s1monw: Thanks for the info.

…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660

* master: (42 commits) Add support for merging custom meta data in tribe node (elastic#21552) [DOCS] Show EC2's auto attribute (elastic#21474) Add information about the removal of store throttling to the migration guide. Add a recommendation against large documents to the docs. (elastic#21652) Add indices options tests to search api REST tests (elastic#21701) Fixing indentation in geospatial querying example. (elastic#21682) Fix typo in filters aggregation docs (elastic#21690) Add BWC layer for Exceptions (elastic#21694) Add checkstyle rule to forbid empty javadoc comments (elastic#20881) Docs: Added offline install link for discovery-file plugin remove pointless catch exception in TransportSearchAction (elastic#21689) Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686) Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680) Fix integer overflows when dealing with templates. (elastic#21628) Fix highlighting on a stored keyword field (elastic#21645) Set execute permissions for native plugin programs (elastic#21657) adjust visibility of DiscoveryNodes.Delta constructor Remove unused DiscoveryNodes.Delta constructor Remove unused DiscoveryNode#removeDeadMembers public method Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes ...

jpountz · 2016-11-22T07:50:26Z

@s1monw backporting looks safe, +1 to do it

habdank · 2016-11-22T09:25:24Z

@s1monw and @jpountz

May I ask (a bit directly), when it could be done?
I really appreciate it, because I tried to port it myself, but I failed, because I do not know well elasticsearch code. So when it is done I will take it and test in our environment.

Thanks in advance and best regards.

…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660

s1monw · 2016-11-22T16:46:50Z

@habdank I pushed this to 2.4 a second ago - this will be in 2.4.3

habdank · 2016-11-24T07:33:46Z

Thanks for all the help! New patched 2.4.3 library works for us much better.

s1monw · 2016-11-24T19:36:10Z

@habdank happy to help

s1monw added :Core/Infra/Core Core issues without another label >bug v5.0.2 v5.1.1 v6.0.0-alpha1 labels Nov 19, 2016

s1monw assigned jpountz Nov 19, 2016

s1monw added the review label Nov 19, 2016

rjernst approved these changes Nov 19, 2016

View reviewed changes

jpountz approved these changes Nov 19, 2016

View reviewed changes

jpountz reviewed Nov 19, 2016

View reviewed changes

s1monw added 2 commits November 21, 2016 09:21

apply feedback

a65de33

add comment

665feec

jpountz approved these changes Nov 21, 2016

View reviewed changes

simplify loop in StreamInput#readString

cad5308

make readString consistent with writeString and add comments

b5d94e3

jpountz approved these changes Nov 21, 2016

View reviewed changes

s1monw merged commit d913242 into elastic:master Nov 21, 2016

s1monw deleted the issues/21660 branch November 21, 2016 09:47

s1monw added the v2.4.3 label Nov 22, 2016

Conversation

s1monw commented Nov 19, 2016

Uh oh!

rjernst left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz Nov 19, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

jpountz Nov 19, 2016

Choose a reason for hiding this comment

Uh oh!

jpountz Nov 19, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw commented Nov 21, 2016

Uh oh!

s1monw commented Nov 21, 2016

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Nov 21, 2016

Uh oh!

s1monw commented Nov 21, 2016

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

habdank commented Nov 21, 2016

Uh oh!

s1monw commented Nov 21, 2016

Uh oh!

habdank commented Nov 21, 2016

Uh oh!

jpountz commented Nov 22, 2016

Uh oh!

habdank commented Nov 22, 2016

Uh oh!

s1monw commented Nov 22, 2016

Uh oh!

habdank commented Nov 24, 2016

Uh oh!

s1monw commented Nov 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants