Elasticsearch Version
7.17.2, 8.1.1 and 8.1.2
Installed Plugins
Various
Java Version
bundled
OS Version
Cloud
Problem Description
I've observed a few new-ish clusters in the wild that are closing transport connections because of an exception constructing a circuit-breaker message due to the circuit breaker having a negative used value. It seems a little extreme to fail like this, but then again this situation would trip an assertion if assertions were enabled.
Steps to Reproduce
Unknown at time of writing 😞
Seems pretty rare: looking at the past week's data I only see this in a handful of clusters, running 7.17.2, 8.1.1 and 8.1.2.
Not even sure how to label this. I suspect something networky but there are other places that interact with circuit breakers.
Logs (if relevant)
[instance-0000000xxx] exception caught on transport layer [Netty4TcpChannel{localAddress=/xx.xx.xx.xx:xxxxx, remoteAddress=/xx.xx.xx.xx:xxxxx, profile=default}], closing connection
java.lang.IllegalArgumentException: Values less than -1 bytes are not supported: -370832b
at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:85) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:80) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.lambda$checkParentLimit$8(HierarchyCircuitBreakerService.java:431) ~[elasticsearch-8.1.2.jar:8.1.2]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
at java.util.Iterator.forEachRemaining(Iterator.java:133) ~[?:?]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:432) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:108) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:215) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:119) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86) ~[elasticsearch-8.1.2.jar:8.1.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[transport-netty4-8.1.2.jar:8.1.2]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1371) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1234) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
at java.lang.Thread.run(Thread.java:833) [?:?]
Elasticsearch Version
7.17.2, 8.1.1 and 8.1.2
Installed Plugins
Various
Java Version
bundled
OS Version
Cloud
Problem Description
I've observed a few new-ish clusters in the wild that are closing transport connections because of an exception constructing a circuit-breaker message due to the circuit breaker having a negative
usedvalue. It seems a little extreme to fail like this, but then again this situation would trip an assertion if assertions were enabled.Steps to Reproduce
Unknown at time of writing 😞
Seems pretty rare: looking at the past week's data I only see this in a handful of clusters, running 7.17.2, 8.1.1 and 8.1.2.
Not even sure how to label this. I suspect something networky but there are other places that interact with circuit breakers.
Logs (if relevant)