Skip to content

Revert Serializing Outbound Transport Messages on IO Threads#64632

Merged
original-brownbear merged 3 commits intoelastic:masterfrom
original-brownbear:revert-serialize-outbound-on-io-threads
Nov 5, 2020
Merged

Revert Serializing Outbound Transport Messages on IO Threads#64632
original-brownbear merged 3 commits intoelastic:masterfrom
original-brownbear:revert-serialize-outbound-on-io-threads

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

Serializing outbound transport message on the IO loop was introduced in #56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here:

// This is currently safe to do because calling `onResponse` will serialize the bytes to the network layer data
// structure on the same thread. So the bytes will be copied before the reference is released.
and that are not easy to work around on short notice.

Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Nov 5, 2020
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the revert myself fairly naively and found just a couple of small differences, see below.

@Override
public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) {
assert msg instanceof OutboundHandler.SendContext;
assert Transports.assertDefaultThreadContext(transport.getThreadPool().getThreadContext());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can keep this assertion? It relates to #57792.

private void internalSend(TcpChannel channel, SendContext sendContext) {
private void internalSend(TcpChannel channel, SendContext sendContext) throws IOException {
channel.getChannelStats().markAccessed(threadPool.relativeTimeInMillis());
// stash thread context so that channel event loop is not polluted by thread context
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can keep this comment? It relates to #57792.

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks for spotting those two, sorry for the oversight!

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks David!

@original-brownbear original-brownbear merged commit 9b4a151 into elastic:master Nov 5, 2020
@original-brownbear original-brownbear deleted the revert-serialize-outbound-on-io-threads branch November 5, 2020 13:41
original-brownbear added a commit that referenced this pull request Nov 5, 2020
…#64653)

Serializing outbound transport message on the IO loop was introduced in #56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here: https://github.com/elastic/elasticsearch/blob/f22ddf822e24bb17f1772c3bacb7ee97a00339b8/x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/repositories/GetCcrRestoreFileChunkAction.java#L60-L61 and that are not easy to work around on short notice.

Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.
original-brownbear added a commit that referenced this pull request Nov 5, 2020
…#64654)

Serializing outbound transport message on the IO loop was introduced in #56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here: https://github.com/elastic/elasticsearch/blob/f22ddf822e24bb17f1772c3bacb7ee97a00339b8/x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/repositories/GetCcrRestoreFileChunkAction.java#L60-L61 and that are not easy to work around on short notice.

Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.
@original-brownbear original-brownbear restored the revert-serialize-outbound-on-io-threads branch January 6, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker >bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. v7.10.0 v7.11.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants