Broken stats serialization

We have a serialization bug somewhere in the stats serialization code. I've now seen ~~five~~ six independent reports ([2](https://github.com/elastic/elasticsearch/pull/21478#issuecomment-264607855), [4](https://discuss.elastic.co/t/random-exceptions-on-transport-layer-and-subsequent-node-disconnections/68704), [5](https://discuss.elastic.co/t/remote-transport-exception-in-5-1-1-w-jdk-1-8-0-111/69607) and ~~two~~ three more that are not linkable) of:

```
[2016-12-12T09:26:50,081][WARN ][o.e.t.n.Netty4Transport  ] [...] exception caught on transport layer [[id: 0xcbdaf621, L:/...:35678 - R:.../...:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [...], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@44aa70c], error [false]; resetting
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1257) ~[elasticsearch-5.1.1.jar:5.1.1]
```

and the related

```
Caused by: java.io.EOFException: tried to read: 91755306 bytes but only 114054 remaining
```

and

```
Caused by: java.lang.IllegalStateException: No routing state mapped for [103]
    at org.elasticsearch.cluster.routing.ShardRoutingState.fromValue(ShardRoutingState.java:71) ~[elasticsearch-5.1.1.jar:5.1.1]
```

It seems to always be in some stats response, either a node stats response, or a cluster stats response and it's coming from `TransportBroadcastByNodeAction` and the single action defined by a lambda in `TransportNodesAction$AsyncAction`. We are blowing reading the stream somewhere and then reading garbage subsequently.

Whatever it is, it's pesky. So far, there is not a reliable reproduction and finding the bug is tricky since these responses serialize the entire world.

The first instance of this led to #21478 so that we know the handler name, #22152 so we can detect corruption earlier, and #22223 to clean up some serialization code. Right now, I do not think we've squashed the issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken stats serialization #22285

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Broken stats serialization #22285

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions