Cherry pick #89740 to 25.8: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation#90734
Merged
robot-ch-test-poll4 merged 2 commits intobackport/25.8/89740from Nov 24, 2025
Conversation
…uring cancellation
Previously only sendCancel() was protected against disconnection, but
RemoteQueryExecutor may read after sendCancel() and this will lead to
NULL pointer dereference:
2025.11.07 14:05:20.159362 [ 73554 ] {b7ebe401-aa59-4306-9663-278657eac110} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 241. DB::Exception: Query memory tracker: fault injected. Would use 2.00 MiB (attempt to allocate chunk of 1.00 MiB), maximum: 9.31 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c88fd80
3. ./src/Common/Exception.h:141: DB::Exception::Exception<char const*, char const*, String, String, String>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, char const*&&, char const*&&, String&&, String&&, String&&) @ 0x0000000013cbcc21
4. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:333: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cba10e
5. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:476: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cb9dc4
6. ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:80: CurrentMemoryTracker::allocImpl(long, bool) @ 0x0000000013c0c079
7. ./src/Common/CurrentMemoryTracker.cpp:103: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x0000000013c096c7
8. ./src/Common/PODArray.h:167: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 0ul, 0ul>::reallocPowerOfTwoElements<>(unsigned long) @ 0x0000000013cc7f0e
9. ./src/Common/PODArray.h:243: DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) @ 0x000000001af881d7
10. ./ci/tmp/build/./src/Compression/CompressedReadBuffer.cpp:10: non-virtual thunk to DB::CompressedReadBuffer::nextImpl() @ 0x000000001af87922
11. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
12. ./src/IO/ReadBuffer.h:81: DB::NativeReader::read() @ 0x000000001a4e2e4f
13. ./ci/tmp/build/./src/Client/Connection.cpp:1422: DB::Connection::receiveDataImpl(DB::NativeReader&) @ 0x000000001a2edb59
14. ./ci/tmp/build/./src/Client/Connection.cpp:1434: DB::Connection::receivePacket() @ 0x000000001a2ed154
15. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
16. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:67: DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x000000001779498d
17. ./ci/tmp/build/./src/Common/AsyncTaskExecutor.cpp:89: void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000017793e83
(version 25.10.1.3832 (official build))
2025.11.07 14:05:20.159570 [ 28135 ] {} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 210. DB::NetException: Connection reset by peer, while reading from socket (peer: [::ffff:127.0.0.1]:57678, local: [::ffff:127.0.0.10]:9000). (NETWORK_ERROR), Stack trace (when copying this message, always include the lines below):
0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
2. ./src/Common/NetException.h:26: DB::NetException::NetException<String, String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&, String&&) @ 0x0000000013e6d433
3. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:83: DB::ReadBufferFromPocoSocketBase::socketReceiveBytesImpl(char*, unsigned long) @ 0x0000000013e6f17b
4. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:107: DB::ReadBufferFromPocoSocketBase::nextImpl() @ 0x0000000013e6fa75
5. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
6. ./src/IO/ReadBuffer.h:81: DB::TCPHandler::runImpl() @ 0x000000001a46b7b0
7. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2836: DB::TCPHandler::run() @ 0x000000001a48ec99
8. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x000000001f54aa87
9. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x000000001f54af19
10. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x000000001f5116c7
11. ./base/poco/Foundation/src/Thread_POSIX.cpp:341: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f50fac1
12. ? @ 0x0000000000094ac3
13. ? @ 0x00000000001268c0
(version 25.10.1.3832 (official build))
2025.11.07 14:05:20.159568 [ 63583 ] {b7ebe401-aa59-4306-9663-278657eac110} <Trace> ReadFromParallelRemoteReplicasStep: (127.0.0.10:9000) Cancelling query because enough data has been read
BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
BaseDaemon: Stack trace: 0x000000001a2ebf7b 0x000000001a335768 0x000000001a33547e 0x000000001777d565 0x000000001a53597a 0x000000001a5300e6 0x000000001a534103 0x0000000013d908eb 0x0000000013d97d66 0x0000000013d8d812 0x0000000013d9549a 0x00007f521e393ac3 0x00007f521e4258c0
BaseDaemon: 2.0. inlined from ./src/IO/VarInt.h:98: DB::readVarUInt(unsigned long&, DB::ReadBuffer&)
BaseDaemon: 2. ./ci/tmp/build/./src/Client/Connection.cpp:1315: DB::Connection::receivePacket() @ 0x000000001a2ebf7b
BaseDaemon: 3. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
BaseDaemon: 4. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:253: DB::MultiplexedConnections::receivePacket() @ 0x000000001a33547e
BaseDaemon: 5. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutor.cpp:808: DB::RemoteQueryExecutor::finish() @ 0x000000001777d565
So let's simply throw in sendCancel() if the connection was lost
Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
1ef3da3
into
backport/25.8/89740
138 of 145 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Original pull-request #89740
Do not merge this PR manually
This pull-request is a first step of an automated backporting.
It contains changes similar to calling
git cherry-picklocally.If you intend to continue backporting the changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.
The check results does not matter at this step - you can safely ignore them.
Troubleshooting
If the conflicts were resolved in a wrong way
If this cherry-pick PR is completely screwed by a wrong conflicts resolution, and you want to recreate it:
pr-cherrypicklabel from the PRYou also need to check the Original pull-request for
pr-backports-createdlabel, and delete if it's presented thereThe PR source
The PR is created in the CI job