Skip to content

Conversation

@promag
Copy link
Contributor

@promag promag commented May 20, 2020

This fixes a race between http_request_cb and StopHTTPServer where
the work queue is used after release.

Fixes #18856.

@promag
Copy link
Contributor Author

promag commented May 20, 2020

@MarcoFalke can't find the issue where this was discussed 😞

@maflcko
Copy link
Member

maflcko commented May 20, 2020

#18856

@promag
Copy link
Contributor Author

promag commented May 20, 2020

Thanks! I was filtering for your issues...

@maflcko
Copy link
Member

maflcko commented May 20, 2020

Concept ACK, assuming that it does what we do in the python code. Haven't looked at the code here.

self.network_event_loop.call_soon_threadsafe(self.network_event_loop.stop)

@maflcko maflcko added this to the 0.20.1 milestone May 20, 2020
@promag promag force-pushed the 2020-05-fix-race-interrupt-http-server branch from 4c0a617 to c8c1f24 Compare May 21, 2020 08:56
@promag
Copy link
Contributor Author

promag commented May 21, 2020

@hebasto included 38e2be4 from #17457. If you run QT_QPA_PLATFORM=cocoa src/qt/test/test_bitcoin-qt without this commit you will see the warning.

@tarboss
Copy link

tarboss commented May 21, 2020

setting env variable: QT_QPA_PLATFORM=windows ->same warning
QWARN : AppTests::appTests() QObject::connect: Cannot queue arguments of type 'size_t'
(Make sure 'size_t' is registered using qRegisterMetaType().)

also on windows 10 & msvc the first test exits silent becoz
fs::remove_all(m_path_root); in BasicTestingSetup::~BasicTestingSetup() says cannot remove dir,its not empty, but u can catch with try statement, if u want do more small fixes 👍

@promag
Copy link
Contributor Author

promag commented May 21, 2020

setting env variable: QT_QPA_PLATFORM=windows ->same warning
QWARN : AppTests::appTests() QObject::connect: Cannot queue arguments of type 'size_t'
(Make sure 'size_t' is registered using qRegisterMetaType().)

@tarboss 38e2be4be00914e8be7ae9c7215985c9b481b0ae fixes it right?

@tarboss
Copy link

tarboss commented May 21, 2020

at least that warning

********* Start testing of AppTests *********
Config: Using QtTest library 5.9.8, Qt 5.9.8 (x86_64-little_endian-llp64 static debug build; by MSVC 2017)
PASS : AppTests::initTestCase()
QINFO : AppTests::appTests() Backing up GUI settings to "C:\Users\tester\AppData\Local\Temp\test_common_Bitcoin Core\b516affdfbc77f1640d7e3e97f95db107556a19c4535813c07f5a21862418b89\regtest\guisettings.ini.bak"
QDEBUG : AppTests::appTests() requestInitialize : Requesting initialize
QDEBUG : AppTests::appTests() initialize : Running initialization in thread
QDEBUG : AppTests::appTests() initializeResult : Initialization result: true
QINFO : AppTests::appTests() Platform customization: "windows"
QDEBUG : AppTests::appTests() TransactionTablePriv::refreshWallet
QWARN : AppTests::appTests() QWindowsWindow::setGeometry: Unable to set geometry 751x430+598+336 on QWidgetWindow/'RPCConsoleWindow'. Resulting geometry: 751x431+598+336 (frame: 8, 31, 8, 8, custom margin: 0, 0, 0, 0, minimum size: 751x365, maximum size: 16777215x16777215).
QDEBUG : AppTests::appTests() requestShutdown : Requesting shutdown
QDEBUG : AppTests::appTests() shutdown : Running Shutdown in thread
QDEBUG : AppTests::appTests() NotifyUnload
QDEBUG : AppTests::appTests() shutdown : Shutdown finished

E:\win10dev\bc\bitcoin\build_msvc\x64\Debug\test_bitcoin-qt.exe (process 11012) exited with code 3.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

@promag promag force-pushed the 2020-05-fix-race-interrupt-http-server branch from c8c1f24 to 1d39b7c Compare May 21, 2020 17:20
@promag promag changed the title http: Fix workQueue race on InterruptHTTPServer http: Release work queue after event base finish May 21, 2020
@promag
Copy link
Contributor Author

promag commented May 21, 2020

This might result in a connection reset by peer or something like that because one request job can be enqueued but not handled (no response sent).

@laanwj
Copy link
Member

laanwj commented May 26, 2020

I'm confused as to why the Qt change is in here. Please add an explanation to the opening post, or open a separate PR for it.

ACK 1d39b7cf00f9d4a64d93caa9d47a177d759c2d6a

ACK fec477712f9b37dc16b61a54e037073974633f67

@promag
Copy link
Contributor Author

promag commented May 26, 2020

@laanwj see #19033 (comment). I can drop the commit, again.

@maflcko
Copy link
Member

maflcko commented May 26, 2020

I prefer a new pull as well. The gui test thing doesn't need backport

@promag promag force-pushed the 2020-05-fix-race-interrupt-http-server branch from 1d39b7c to fec4777 Compare May 26, 2020 13:21
@promag
Copy link
Contributor Author

promag commented May 26, 2020

Done, and rebased.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fec477712f9b37dc16b61a54e037073974633f67, I have not tested the code, but I have reviewed it and it looks OK, I agree it can be merged.

@maflcko
Copy link
Member

maflcko commented Jun 6, 2020

Testing with diff:

diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index 5e78fd1d71..fe3a3a8aab 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -264,6 +264,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     // Dispatch to worker thread
     if (i != iend) {
         std::unique_ptr<HTTPWorkItem> item(new HTTPWorkItem(std::move(hreq), path, i->handler));
+        UninterruptibleSleep(std::chrono::milliseconds{55});
         assert(workQueue);
         if (workQueue->Enqueue(item.get()))
             item.release(); /* if true, queue took ownership */

Before:

$ ./test/functional/feature_uacomment.py --valgrind 
2020-06-06T19:12:04.028000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_rjp_1t0x
2020-06-06T19:12:14.514000Z TestFramework (INFO): test multiple -uacomment
2020-06-06T19:12:25.066000Z TestFramework (INFO): test -uacomment max length
2020-06-06T19:13:04.021000Z TestFramework (INFO): test -uacomment unsafe characters
2020-06-06T19:14:39.777000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 484, in assert_start_raises_init_error
    self.wait_for_rpc_connection()
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 219, in wait_for_rpc_connection
    raise FailedToStartError(self._node_msg(
test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status -6 during initialization

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_framework.py", line 119, in main
    self.run_test()
  File "./test/functional/feature_uacomment.py", line 36, in run_test
    self.nodes[0].assert_start_raises_init_error(["-uacomment=" + unsafe_char], expected, match=ErrorMatch.FULL_REGEX)
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 501, in assert_start_raises_init_error
    self._raise_assertion_error(
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 161, in _raise_assertion_error
    raise AssertionError(self._node_msg(msg))
AssertionError: [node 0] Expected message "Error: User Agent comment \(\)\) contains unsafe characters." does not fully match stderr:
"Error: User Agent comment ()) contains unsafe characters.
bitcoind: httpserver.cpp:268: void http_request_cb(struct evhttp_request *, void *): Assertion `workQueue' failed."

After:

$ while ./test/functional/feature_uacomment.py --valgrind ; do true ; done 
2020-06-06T19:17:01.430000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_oy_7aci5
2020-06-06T19:17:12.355000Z TestFramework (INFO): test multiple -uacomment
2020-06-06T19:17:23.166000Z TestFramework (INFO): test -uacomment max length
2020-06-06T19:18:02.112000Z TestFramework (INFO): test -uacomment unsafe characters
2020-06-06T19:19:56.602000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 484, in assert_start_raises_init_error
    self.wait_for_rpc_connection()
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 219, in wait_for_rpc_connection
    raise FailedToStartError(self._node_msg(
test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_framework.py", line 119, in main
    self.run_test()
  File "./test/functional/feature_uacomment.py", line 36, in run_test
    self.nodes[0].assert_start_raises_init_error(["-uacomment=" + unsafe_char], expected, match=ErrorMatch.FULL_REGEX)
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 501, in assert_start_raises_init_error
    self._raise_assertion_error(
  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 161, in _raise_assertion_error
    raise AssertionError(self._node_msg(msg))
AssertionError: [node 0] Expected message "Error: User Agent comment \(🏃\) contains unsafe characters." does not fully match stderr:
"Error: User Agent comment (🏃) contains unsafe characters.
==3931735== Invalid read of size 8
==3931735==    at 0x4ACA354: evhttp_request_get_output_buffer (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x3E67AC: HTTPRequest::WriteReply(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (httpserver.cpp:581)
==3931735==    by 0x3E62C5: HTTPRequest::~HTTPRequest() (httpserver.cpp:526)
==3931735==    by 0x3E9417: operator() (unique_ptr.h:84)
==3931735==    by 0x3E9417: ~unique_ptr (unique_ptr.h:360)
==3931735==    by 0x3E9417: ~HTTPWorkItem (httpserver.cpp:47)
==3931735==    by 0x3E9417: HTTPWorkItem::~HTTPWorkItem() (httpserver.cpp:47)
==3931735==    by 0x3EA42D: operator() (unique_ptr.h:84)
==3931735==    by 0x3EA42D: ~unique_ptr (unique_ptr.h:360)
==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > (stl_construct.h:140)
==3931735==    by 0x3EA42D: __destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *> (stl_construct.h:152)
==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *> (stl_construct.h:184)
==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > (alloc_traits.h:738)
==3931735==    by 0x3EA42D: std::deque<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::allocator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > >::_M_destroy_data_aux(std::_Deque_iterator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >&, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >*>, std::_Deque_iterator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >&, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >*>) (deque.tcc:872)
==3931735==    by 0x3EA1F9: _M_destroy_data (stl_deque.h:2046)
==3931735==    by 0x3EA1F9: std::deque<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::allocator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > >::~deque() (stl_deque.h:1004)
==3931735==    by 0x3E5D7D: ~WorkQueue (httpserver.cpp:89)
==3931735==    by 0x3E5D7D: StopHTTPServer() (httpserver.cpp:480)
==3931735==    by 0x14329B: Shutdown(NodeContext&) (init.cpp:190)
==3931735==    by 0x13C627: AppInit (bitcoind.cpp:164)
==3931735==    by 0x13C627: main (bitcoind.cpp:180)
==3931735==  Address 0x1b92c238 is 152 bytes inside a block of size 216 free'd
==3931735==    at 0x483B9F5: free (vg_replace_malloc.c:538)
==3931735==    by 0x4ACA8BC: evhttp_connection_free (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4ACACD4: evhttp_free (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x3E5D41: StopHTTPServer() (httpserver.cpp:472)
==3931735==    by 0x14329B: Shutdown(NodeContext&) (init.cpp:190)
==3931735==    by 0x13C627: AppInit (bitcoind.cpp:164)
==3931735==    by 0x13C627: main (bitcoind.cpp:180)
==3931735==  Block was alloc'd at
==3931735==    at 0x483CAE9: calloc (vg_replace_malloc.c:760)
==3931735==    by 0x4ACA66C: evhttp_request_new (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4ACA7AC: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4ACB074: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4AB8850: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4AB04A9: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x4AB0BA6: event_base_loop (in /usr/lib64/libevent-2.1.so.6.0.2)
==3931735==    by 0x3E59A6: ThreadHTTP(event_base*) (httpserver.cpp:292)
==3931735==    by 0x3E9D17: __invoke_impl<bool, bool (*)(event_base *), event_base *> (invoke.h:60)
==3931735==    by 0x3E9D17: __invoke<bool (*)(event_base *), event_base *> (invoke.h:95)
==3931735==    by 0x3E9D17: _M_invoke<0, 1> (thread:264)
==3931735==    by 0x3E9D17: operator() (thread:271)
==3931735==    by 0x3E9D17: std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (*)(event_base*), event_base*> > >::_M_run() (thread:215)
==3931735==    by 0x4C6FAC3: ??? (in /usr/lib64/libstdc++.so.6.0.28)
==3931735==    by 0x48C0431: start_thread (in /usr/lib64/libpthread-2.31.so)
==3931735==    by 0x4FEB9D2: clone (in /usr/lib64/libc-2.31.so)
==3931735== 
{
   <insert_a_suppression_name_here>
   Memcheck:Addr8
   fun:evhttp_request_get_output_buffer
   fun:_ZN11HTTPRequest10WriteReplyEiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
   fun:_ZN11HTTPRequestD1Ev
   fun:operator()
   fun:~unique_ptr
   fun:~HTTPWorkItem
   fun:_ZN12HTTPWorkItemD0Ev
   fun:operator()
   fun:~unique_ptr
   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > >
   fun:__destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *>
   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *>
   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > >
   fun:_ZNSt5dequeISt10unique_ptrI11HTTPClosureSt14default_deleteIS1_EESaIS4_EE19_M_destroy_data_auxESt15_Deque_iteratorIS4_RS4_PS4_ESA_
   fun:_M_destroy_data
   fun:_ZNSt5dequeISt10unique_ptrI11HTTPClosureSt14default_deleteIS1_EESaIS4_EED2Ev
   fun:~WorkQueue
   fun:_Z14StopHTTPServerv
   fun:_Z8ShutdownR11NodeContext
   fun:AppInit
   fun:main
}
==3931735== 
==3931735== Exit program on first error (--exit-on-first-error=yes)"

@promag
Copy link
Contributor Author

promag commented Jun 11, 2020

Thanks for these instructions @MarcoFalke.

@promag promag force-pushed the 2020-05-fix-race-interrupt-http-server branch from 93eefaa to d54596a Compare June 23, 2020 00:36
@promag
Copy link
Contributor Author

promag commented Jun 23, 2020

@MarcoFalke updated, see #19033 (comment).

@promag
Copy link
Contributor Author

promag commented Sep 7, 2020

Ping.

@maflcko maflcko modified the milestones: 0.20.2, 22.0 Nov 16, 2020
Copy link
Contributor

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to reproduce the problem consistently by adding the artificial delay suggested by @MarcoFalke and then starting a constant stream of RPC requests (of any kind):

$ while src/bitcoin-cli listbanned ; do :; done

and then stopping bitcoind with either control-c or the stop RPC.

bitcoind: httpserver.cpp:271: void http_request_cb(evhttp_request*, void*): Assertion `workQueue' failed.
Aborted (core dumped)

I verified that with this PR, this doesn't happen. Nice PR! My suggestions are optional, although guarding the LogPrintf is probably worth doing.

ACK 4e353cb

{
LOCK(cs);
if (queue.size() >= maxDepth) {
if (!running || queue.size() >= maxDepth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a necessary change, but because of this it, the following can appear in the debug.log:

2020-12-14T19:03:41Z WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting

which is harmless, but a bit confusing and definitely misleading. I suggest adding this guard around that LogPrintf:

        if (workQueue->Enqueue(item.get()))
            item.release(); /* if true, queue took ownership */
        else {
            if (!ShutdownRequested()) {
                LogPrintf("WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting\n");
            }
            item->req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, "Work queue depth exceeded");

(I did manually test that change.) Or you could change Enqueue() to return more than just a bool to indicate the reason the request was rejected, but it's probably not worth it. Or, you could check ShutdownRequested() before calling Enqueue() and there would be no need to check running in Enqueue. But I think your change is best because the WorkQueue object knows if it's running or not, makes sense to take advantage of that knowledge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in that case the RPC error is also misleading. I'd like to avoid more calls to ShutdownRequested. Maybe just reword the error to something along "work queue full or shutdown in progress".

while (running && queue.empty())
cond.wait(lock);
if (!running)
if (!running && queue.empty())
Copy link
Contributor

@LarryRuane LarryRuane Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change! This ensures that the queue is empty before exiting the thread, so there are no leftover items in the work queue when the destructor runs; without this, I got the following to appear in debug.log during testing:

2020-12-14T19:11:40Z ~HTTPRequest: Unhandled request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I would have to test to make sure but I think without this the client wouldn't get a response. I also think that very last response would now have connection: close header, see HTTPRequest::WriteReply - another case I'd like to remove the call to ShutdownRequested.

Copy link
Contributor

@LarryRuane LarryRuane Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tested this, and without the changes to Enqueue and Run, bitcoin-cli prints the following:

error: Could not connect to the server 127.0.0.1:8332 (error code 1 - "EOF reached")

Make sure the bitcoind server is running and that you are connecting to the correct RPC port.

(I think this is the client not getting a response.)

With your changes to Enqueue and Run:

error: couldn't parse reply from server

This is a slight improvement, I guess. This error seems familiar... If I cherry-pick the 2-line change from #18335 then we get:

error: Server response: Work queue depth exceeded

It's the same wrong error message that I mentioned for the LogPrintf 😆. I guess that PR should change to say something like work queue depth exceeded or client or shutdown in progress, as you suggested. (Nothing for this needed by your PR here.)

if (workQueue) {
delete workQueue;
workQueue = nullptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this is the correct fix, because this now occurs after the call to g_thread_http.join() (just above, line 468). This join ensures that no new items can be enqueued.

Suggestion, you may want to keep all of the workQueue shutdown code together and move it to just after the call to g_thread_http.join(), that is, after line 469 (with the current line numbering, in other words just before if (eventHTTP)), instead of having it split between two places (beginning at line 453 above, and here). I don't think there's any reason those worker threads can't live a little longer (and there's no harm that I can think of). This would follow the pattern of tearing down the higher-level stuff before the lower-level stuff (not doing that is the source of this bug in the first place!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be 😄 this is the goal of this PR - destroy the work queue after event base finish. After g_thread_http.join() there can be stuff pending on the event base, and if you delete the workQueue before the event base finishes then all pending workItems will be destroyed (because they are unique_ptr) and then it's only problems from here on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I didn't understand about the event base; so maybe we could do all the shutdown (deconstruct) of the workQueue down here, where you now have the delete. Would this work?

suggestion
void StopHTTPServer()
{
    LogPrint(BCLog::HTTP, "Stopping HTTP server\n");
    // Unlisten sockets, these are what make the event loop running, which means
    // that after this and all connections are closed the event loop will quit.
    for (evhttp_bound_socket *socket : boundSockets) {
        evhttp_del_accept_socket(eventHTTP, socket);
    }
    boundSockets.clear();
    if (eventBase) {
        LogPrint(BCLog::HTTP, "Waiting for HTTP event thread to exit\n");
        if (g_thread_http.joinable()) g_thread_http.join();
    }
    if (eventHTTP) {
        evhttp_free(eventHTTP);
        eventHTTP = nullptr;
    }
    if (eventBase) {
        event_base_free(eventBase);
        eventBase = nullptr;
    }
    if (workQueue) {
        LogPrint(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n");
        for (auto& thread: g_thread_http_workers) {
            thread.join();
        }
        g_thread_http_workers.clear();
        delete workQueue;
        workQueue = nullptr;
    }
    LogPrint(BCLog::HTTP, "Stopped HTTP server\n");
}

(That seems slightly simpler and clearer to me, but the way you have it now is good.) If you make changes, I don't mind taking another look (and testing) and quickly giving another ack.

@promag
Copy link
Contributor Author

promag commented Dec 14, 2020

Thanks for reviewing @LarryRuane! I'll hold on your suggestion to keep your and @fjahr ACK. I also think the error and log messages can be improved in a follow to avoid dragging this. In any case I hope my above comments are clear to you, otherwise let me know.

@hebasto mind to test again? It would be great to push this early on the release cycle.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4e353cb, tested (rebased on top of master 9313c4e) on Linux Mint 20.1 (x86_64) using MarcoFalke's patch, including different -rpcthreads/-rpcworkqueue cases. The bug is fixed. The code is correct.

Agree, that the message "error: Server response: Work queue depth exceeded" should be reworded as it covers two different cases now.

Edit (laanwj): removed the @ in the ACK line

maflcko pushed a commit that referenced this pull request May 11, 2021
…mbers

34b04ee refactor: Add TSA annotations to the WorkQueue class members (Hennadii Stepanov)

Pull request description:

  Noted while reviewing #19033, and hoping this will not conflict with it :)

ACKs for top commit:
  promag:
    Code review ACK 34b04ee.

Tree-SHA512: 4c15729acd95223263c19bc0dd64b9e7960872b48edee6eee97a5d0c2b99b8838185ac3a2ccd5bee992cb3a12498633427fe9919be5a12da9949fcf69a6275a0
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 11, 2021
…lass members

34b04ee refactor: Add TSA annotations to the WorkQueue class members (Hennadii Stepanov)

Pull request description:

  Noted while reviewing bitcoin#19033, and hoping this will not conflict with it :)

ACKs for top commit:
  promag:
    Code review ACK 34b04ee.

Tree-SHA512: 4c15729acd95223263c19bc0dd64b9e7960872b48edee6eee97a5d0c2b99b8838185ac3a2ccd5bee992cb3a12498633427fe9919be5a12da9949fcf69a6275a0
@achow101
Copy link
Member

ACK 4e353cb

Reviewed the code and it makes sense. However I was unable to reproduce the race condition so I cannot test this.

@LarryRuane
Copy link
Contributor

LarryRuane commented Jun 15, 2021

(re-)ACK 4e353cb
Tested again rebased on master d75a1df, Ubuntu 20.04, was able to reproduce the bug using the one-line patch Marco suggested, and the PR still fixes it.

@laanwj laanwj merged commit 6a67366 into bitcoin:master Jun 21, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 24, 2021
4e353cb http: Release work queue after event base finish (João Barbosa)

Pull request description:

  This fixes a race between `http_request_cb` and `StopHTTPServer` where
  the work queue is used after release.

  Fixes bitcoin#18856.

ACKs for top commit:
  fjahr:
    Code review ACK 4e353cb
  achow101:
    ACK 4e353cb
  LarryRuane:
    ACK 4e353cb
  hebasto:
    ACK 4e353cb, tested (rebased on top of master 9313c4e) on Linux Mint 20.1 (x86_64) using MarcoFalke's [patch](bitcoin#19033 (comment)), including different `-rpcthreads`/`-rpcworkqueue` cases. The bug is fixed. The code is correct.

Tree-SHA512: 185d2a9744d0d5134d782bf321ac9958ba17b11a5b3d70b4897c8243e6b146dfd3f23c57aef8e10ae9484374120b64389c1949a9cf0a21dccc47ffc934c20930
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
…lass members

34b04ee refactor: Add TSA annotations to the WorkQueue class members (Hennadii Stepanov)

Pull request description:

  Noted while reviewing bitcoin#19033, and hoping this will not conflict with it :)

ACKs for top commit:
  promag:
    Code review ACK 34b04ee.

Tree-SHA512: 4c15729acd95223263c19bc0dd64b9e7960872b48edee6eee97a5d0c2b99b8838185ac3a2ccd5bee992cb3a12498633427fe9919be5a12da9949fcf69a6275a0
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
4e353cb http: Release work queue after event base finish (João Barbosa)

Pull request description:

  This fixes a race between `http_request_cb` and `StopHTTPServer` where
  the work queue is used after release.

  Fixes bitcoin#18856.

ACKs for top commit:
  fjahr:
    Code review ACK 4e353cb
  achow101:
    ACK 4e353cb
  LarryRuane:
    ACK 4e353cb
  hebasto:
    ACK 4e353cb, tested (rebased on top of master 9313c4e) on Linux Mint 20.1 (x86_64) using MarcoFalke's [patch](bitcoin#19033 (comment)), including different `-rpcthreads`/`-rpcworkqueue` cases. The bug is fixed. The code is correct.

Tree-SHA512: 185d2a9744d0d5134d782bf321ac9958ba17b11a5b3d70b4897c8243e6b146dfd3f23c57aef8e10ae9484374120b64389c1949a9cf0a21dccc47ffc934c20930
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 15, 2021
…lass members

34b04ee refactor: Add TSA annotations to the WorkQueue class members (Hennadii Stepanov)

Pull request description:

  Noted while reviewing bitcoin#19033, and hoping this will not conflict with it :)

ACKs for top commit:
  promag:
    Code review ACK 34b04ee.

Tree-SHA512: 4c15729acd95223263c19bc0dd64b9e7960872b48edee6eee97a5d0c2b99b8838185ac3a2ccd5bee992cb3a12498633427fe9919be5a12da9949fcf69a6275a0
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 15, 2021
4e353cb http: Release work queue after event base finish (João Barbosa)

Pull request description:

  This fixes a race between `http_request_cb` and `StopHTTPServer` where
  the work queue is used after release.

  Fixes bitcoin#18856.

ACKs for top commit:
  fjahr:
    Code review ACK 4e353cb
  achow101:
    ACK 4e353cb
  LarryRuane:
    ACK 4e353cb
  hebasto:
    ACK 4e353cb, tested (rebased on top of master 9313c4e) on Linux Mint 20.1 (x86_64) using MarcoFalke's [patch](bitcoin#19033 (comment)), including different `-rpcthreads`/`-rpcworkqueue` cases. The bug is fixed. The code is correct.

Tree-SHA512: 185d2a9744d0d5134d782bf321ac9958ba17b11a5b3d70b4897c8243e6b146dfd3f23c57aef8e10ae9484374120b64389c1949a9cf0a21dccc47ffc934c20930
gwillen pushed a commit to ElementsProject/elements that referenced this pull request Jun 1, 2022
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rpc: InterruptHTTPServer race / workQueue use-after-free

9 participants