Skip to content

Broken stats serialization #22285

@jasontedor

Description

@jasontedor

We have a serialization bug somewhere in the stats serialization code. I've now seen five six independent reports (2, 4, 5 and two three more that are not linkable) of:

[2016-12-12T09:26:50,081][WARN ][o.e.t.n.Netty4Transport  ] [...] exception caught on transport layer [[id: 0xcbdaf621, L:/...:35678 - R:.../...:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [...], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@44aa70c], error [false]; resetting
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1257) ~[elasticsearch-5.1.1.jar:5.1.1]

and the related

Caused by: java.io.EOFException: tried to read: 91755306 bytes but only 114054 remaining

and

Caused by: java.lang.IllegalStateException: No routing state mapped for [103]
    at org.elasticsearch.cluster.routing.ShardRoutingState.fromValue(ShardRoutingState.java:71) ~[elasticsearch-5.1.1.jar:5.1.1]

It seems to always be in some stats response, either a node stats response, or a cluster stats response and it's coming from TransportBroadcastByNodeAction and the single action defined by a lambda in TransportNodesAction$AsyncAction. We are blowing reading the stream somewhere and then reading garbage subsequently.

Whatever it is, it's pesky. So far, there is not a reliable reproduction and finding the bug is tricky since these responses serialize the entire world.

The first instance of this led to #21478 so that we know the handler name, #22152 so we can detect corruption earlier, and #22223 to clean up some serialization code. Right now, I do not think we've squashed the issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions