Skip to content

Cherry pick #89740 to 25.10: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation#90738

Merged
robot-ch-test-poll4 merged 2 commits intobackport/25.10/89740from
cherrypick/25.10/89740
Nov 24, 2025
Merged

Cherry pick #89740 to 25.10: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation#90738
robot-ch-test-poll4 merged 2 commits intobackport/25.10/89740from
cherrypick/25.10/89740

Conversation

@robot-ch-test-poll4
Copy link
Copy Markdown
Contributor

Original pull-request #89740

Do not merge this PR manually

This pull-request is a first step of an automated backporting.
It contains changes similar to calling git cherry-pick locally.
If you intend to continue backporting the changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.

The check results does not matter at this step - you can safely ignore them.

Troubleshooting

If the conflicts were resolved in a wrong way

If this cherry-pick PR is completely screwed by a wrong conflicts resolution, and you want to recreate it:

  • delete the pr-cherrypick label from the PR
  • delete this branch from the repository

You also need to check the Original pull-request for pr-backports-created label, and delete if it's presented there

The PR source

The PR is created in the CI job

azat added 2 commits November 7, 2025 18:49
…uring cancellation

Previously only sendCancel() was protected against disconnection, but
RemoteQueryExecutor may read after sendCancel() and this will lead to
NULL pointer dereference:

    2025.11.07 14:05:20.159362 [ 73554 ] {b7ebe401-aa59-4306-9663-278657eac110} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 241. DB::Exception: Query memory tracker: fault injected. Would use 2.00 MiB (attempt to allocate chunk of 1.00 MiB), maximum: 9.31 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

    0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
    1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
    2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c88fd80
    3. ./src/Common/Exception.h:141: DB::Exception::Exception<char const*, char const*, String, String, String>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, char const*&&, char const*&&, String&&, String&&, String&&) @ 0x0000000013cbcc21
    4. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:333: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cba10e
    5. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:476: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cb9dc4
    6. ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:80: CurrentMemoryTracker::allocImpl(long, bool) @ 0x0000000013c0c079
    7. ./src/Common/CurrentMemoryTracker.cpp:103: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x0000000013c096c7
    8. ./src/Common/PODArray.h:167: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 0ul, 0ul>::reallocPowerOfTwoElements<>(unsigned long) @ 0x0000000013cc7f0e
    9. ./src/Common/PODArray.h:243: DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) @ 0x000000001af881d7
    10. ./ci/tmp/build/./src/Compression/CompressedReadBuffer.cpp:10: non-virtual thunk to DB::CompressedReadBuffer::nextImpl() @ 0x000000001af87922
    11. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
    12. ./src/IO/ReadBuffer.h:81: DB::NativeReader::read() @ 0x000000001a4e2e4f
    13. ./ci/tmp/build/./src/Client/Connection.cpp:1422: DB::Connection::receiveDataImpl(DB::NativeReader&) @ 0x000000001a2edb59
    14. ./ci/tmp/build/./src/Client/Connection.cpp:1434: DB::Connection::receivePacket() @ 0x000000001a2ed154
    15. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
    16. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:67: DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x000000001779498d
    17. ./ci/tmp/build/./src/Common/AsyncTaskExecutor.cpp:89: void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000017793e83
     (version 25.10.1.3832 (official build))
    2025.11.07 14:05:20.159570 [ 28135 ] {} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 210. DB::NetException: Connection reset by peer, while reading from socket (peer: [::ffff:127.0.0.1]:57678, local: [::ffff:127.0.0.10]:9000). (NETWORK_ERROR), Stack trace (when copying this message, always include the lines below):

    0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
    1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
    2. ./src/Common/NetException.h:26: DB::NetException::NetException<String, String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&, String&&) @ 0x0000000013e6d433
    3. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:83: DB::ReadBufferFromPocoSocketBase::socketReceiveBytesImpl(char*, unsigned long) @ 0x0000000013e6f17b
    4. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:107: DB::ReadBufferFromPocoSocketBase::nextImpl() @ 0x0000000013e6fa75
    5. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
    6. ./src/IO/ReadBuffer.h:81: DB::TCPHandler::runImpl() @ 0x000000001a46b7b0
    7. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2836: DB::TCPHandler::run() @ 0x000000001a48ec99
    8. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x000000001f54aa87
    9. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x000000001f54af19
    10. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x000000001f5116c7
    11. ./base/poco/Foundation/src/Thread_POSIX.cpp:341: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f50fac1
    12. ? @ 0x0000000000094ac3
    13. ? @ 0x00000000001268c0
     (version 25.10.1.3832 (official build))
    2025.11.07 14:05:20.159568 [ 63583 ] {b7ebe401-aa59-4306-9663-278657eac110} <Trace> ReadFromParallelRemoteReplicasStep: (127.0.0.10:9000) Cancelling query because enough data has been read

    BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
    BaseDaemon: Stack trace: 0x000000001a2ebf7b 0x000000001a335768 0x000000001a33547e 0x000000001777d565 0x000000001a53597a 0x000000001a5300e6 0x000000001a534103 0x0000000013d908eb 0x0000000013d97d66 0x0000000013d8d812 0x0000000013d9549a 0x00007f521e393ac3 0x00007f521e4258c0
    BaseDaemon: 2.0. inlined from ./src/IO/VarInt.h:98: DB::readVarUInt(unsigned long&, DB::ReadBuffer&)
    BaseDaemon: 2. ./ci/tmp/build/./src/Client/Connection.cpp:1315: DB::Connection::receivePacket() @ 0x000000001a2ebf7b
    BaseDaemon: 3. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
    BaseDaemon: 4. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:253: DB::MultiplexedConnections::receivePacket() @ 0x000000001a33547e
    BaseDaemon: 5. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutor.cpp:808: DB::RemoteQueryExecutor::finish() @ 0x000000001777d565

So let's simply throw in sendCancel() if the connection was lost
Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
@robot-ch-test-poll4 robot-ch-test-poll4 added pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only! do not test disable testing on pull request pr-bugfix Pull request with bugfix, not backported by default labels Nov 24, 2025
@robot-ch-test-poll4 robot-ch-test-poll4 merged commit 801b969 into backport/25.10/89740 Nov 24, 2025
138 of 145 checks passed
@robot-ch-test-poll4 robot-ch-test-poll4 deleted the cherrypick/25.10/89740 branch November 24, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not test disable testing on pull request pr-bugfix Pull request with bugfix, not backported by default pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants