I looked into transport worker slow logging in 7.17 and one transport action and one outstanding and recurring issue is logs like the below:
[instance-0000000000] handling inbound transport message [InboundMessage{Header{1325}{7.17.0}{241879}{true}{false}{false}{false}{indices:data/read/search[free_context/scroll]}}] took [6004ms] which is above the warn threshold of [5000ms]
I believe this is caused by the fact that the underlying action decrements the store ref count. If it turns out to be the lat to decrement the ref count here, then that leads to the closing (including acquiring the shard lock) to run on a transport thread.

I think this is always a I think this can only happen (but happens quite a bit in Cloud logs) if there's a concurrent relocation or so but regardless IO should never run on transport workers.
I wonder if we may have other spots where this occurs and the last decrement for the store hits via a search action on a transport thread. It might be worth adding an assertion for not running the store close on a transport worker when fixing this.
I looked into transport worker slow logging in 7.17 and one transport action and one outstanding and recurring issue is logs like the below:
I believe this is caused by the fact that the underlying action decrements the store ref count. If it turns out to be the lat to decrement the ref count here, then that leads to the closing (including acquiring the shard lock) to run on a transport thread.
I think this is always a I think this can only happen (but happens quite a bit in Cloud logs) if there's a concurrent relocation or so but regardless IO should never run on transport workers.
I wonder if we may have other spots where this occurs and the last decrement for the store hits via a search action on a transport thread. It might be worth adding an assertion for not running the store close on a transport worker when fixing this.