Use a buffer to do character to byte conversion in StreamOutput#writeString#21680
Use a buffer to do character to byte conversion in StreamOutput#writeString#21680s1monw merged 5 commits intoelastic:masterfrom
Conversation
…String Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes elastic#21660
jpountz
left a comment
There was a problem hiding this comment.
Looks good. Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks. I don't mind doing it in this PR or a different one.
| final int bufferSize = Math.min(3 * charCount, 1024); // at most 3 bytes per character is needed here | ||
| if (convertStringBuffer.length < bufferSize) { | ||
| convertStringBuffer = new byte[ArrayUtil.oversize(bufferSize, Byte.BYTES)]; | ||
| } |
There was a problem hiding this comment.
I think you can replace the three above lines with just convertStringBuffer = ArrayUtil.grow(convertStringBuffer, bufferSize);
There was a problem hiding this comment.
so I had this before but there is no need to copy the array since we are trashing it that's why I used oversize?
| // make sure any possible char can fit into the buffer in any possible iteration | ||
| // we need at most 3 bytes so we flush the buffer once we have less than 3 bytes | ||
| // left before we start another iteration | ||
| if (offset >= buffer.length-3) { |
There was a problem hiding this comment.
I think it should be a strict greater than?
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
maybe also test an explicit big string that only contains chars that are stored on 3 bytes?
I don't think we can unless we change the way we write these strings to the wire. We only know how many characters we need to read but don't know how of how many bytes they are composed. I think we can down the road change the way we read strings but then we have to do it on both ends. makes sense? I mean we can read the min number of characters but this might make the loop quite complicated |
|
pushed some new commits @jpountz |
|
I'm wondering if we could make a similar change for the other methods that are often used like |
in those methods the number of writeByte is constant I think unless we inline all the complexity it' won't help nor be sustainable |
|
Dears, I see, you react very fast on my ticket #21660. Thanks! Best regards, |
|
@s1monw: Thanks for the info. |
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
* master: (42 commits) Add support for merging custom meta data in tribe node (elastic#21552) [DOCS] Show EC2's auto attribute (elastic#21474) Add information about the removal of store throttling to the migration guide. Add a recommendation against large documents to the docs. (elastic#21652) Add indices options tests to search api REST tests (elastic#21701) Fixing indentation in geospatial querying example. (elastic#21682) Fix typo in filters aggregation docs (elastic#21690) Add BWC layer for Exceptions (elastic#21694) Add checkstyle rule to forbid empty javadoc comments (elastic#20881) Docs: Added offline install link for discovery-file plugin remove pointless catch exception in TransportSearchAction (elastic#21689) Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686) Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680) Fix integer overflows when dealing with templates. (elastic#21628) Fix highlighting on a stored keyword field (elastic#21645) Set execute permissions for native plugin programs (elastic#21657) adjust visibility of DiscoveryNodes.Delta constructor Remove unused DiscoveryNodes.Delta constructor Remove unused DiscoveryNode#removeDeadMembers public method Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes ...
|
@s1monw backporting looks safe, +1 to do it |
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
|
@habdank I pushed this to 2.4 a second ago - this will be in |
|
Thanks for all the help! New patched 2.4.3 library works for us much better. |
|
@habdank happy to help |
Today we call
writeByteup to 3x per character in each string written viaStreamOutput#writeStringthis can have quite some overhead when stringsare long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via
writeBytesinstead reducing the overhead of this operation.Closes #21660