win: prevent machine hang due to tty event explosion#2308
win: prevent machine hang due to tty event explosion#2308jblazquez wants to merge 1 commit intolibuv:v1.xfrom
Conversation
The tty subsystem on Windows was listening for console events from all processes to detect when our console window was being resized. This could cause an explosion in the number of events queued by the system when running many console applications in parallel that all wrote to their console quickly. The end result was a complete machine hang. Now we determine, whenever possible, what our corresponding conhost.exe process is and listen for console events from that process only. This detection does not work in 32-bit applications running on 64-bit Windows so those default to the old behavior of listening to all processes.
|
Wow, that's a thorough bug report, thanks! The PR as-is LGTM. regarding what to do for 32bit apps on 64bit Windows... I'd say we wait for reports to come, and possible apply 3) by setting some UUID as the title only in that case. |
|
CI: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/1407/ The bug report is 11/10, great job! Originally I did not found a way to get the conhost procid, so I left 0 there. |
The tty subsystem on Windows was listening for console events from all processes to detect when our console window was being resized. This could cause an explosion in the number of events queued by the system when running many console applications in parallel that all wrote to their console quickly. The end result was a complete machine hang. Now we determine, whenever possible, what our corresponding conhost.exe process is and listen for console events from that process only. This detection does not work in 32-bit applications running on 64-bit Windows so those default to the old behavior of listening to all processes. PR-URL: #2308 Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
|
Landed in dabc737 |
|
Just so you all know: this will not work for application sessions using libuv under a "pseudoconsole." They have HWNDs to support legacy scenarios (like vim: it will crash if it doesn't find one. this is known to be silly.) but those HWNDs will never signal any accessibility events or be displayed anywhere on the screen. There's a bit more info at microsoft/terminal#1811. We'd like to offer a better window size event than the one that comes in on the input handle, but we haven't yet designed it. 😄 I am moderately horrified at the MSDN documentation saying "The Win32 API provides no direct method for obtaining the window handle associated with a console application." as that is patently false. I'll get a documentation bug going to get that fixed, because that method is absolutely atrocious. Incidentally, would |
|
For some background: this was added, because the previous version (reading @DHowett-MSFT we need the PID of the I'll make another PR that removes this hook for 32/64 bit mixture as Visual Studio bundles 32bit Node instance and is most likely run on a 64bit machine. I'll reintroduce the old method as a fallback, hopefully, this will also solve microsoft/terminal#1811. If after that the performance issues prevail, we can always switch to pooling the console size every once in a while. Or if you have any other idea how this can be implemented, please let me know. |
Continuing improvement of SIGWINCH from PR libuv#2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR libuv#1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was alos updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv#2308
Continuing improvement of SIGWINCH from PR libuv#2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR libuv#1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was alos updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv#2308
Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR #1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was also updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: #2308 PR-URL: #2381 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
The tty subsystem on Windows was listening for console events from all processes to detect when our console window was being resized. This could cause an explosion in the number of events queued by the system when running many console applications in parallel that all wrote to their console quickly. The end result was a complete machine hang. Now we determine, whenever possible, what our corresponding conhost.exe process is and listen for console events from that process only. This detection does not work in 32-bit applications running on 64-bit Windows so those default to the old behavior of listening to all processes. PR-URL: libuv/libuv#2308 Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR #1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was also updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv/libuv#2308 PR-URL: libuv/libuv#2381 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
The tty subsystem on Windows was listening for console events from all processes to detect when our console window was being resized. This could cause an explosion in the number of events queued by the system when running many console applications in parallel that all wrote to their console quickly. The end result was a complete machine hang. Now we determine, whenever possible, what our corresponding conhost.exe process is and listen for console events from that process only. This detection does not work in 32-bit applications running on 64-bit Windows so those default to the old behavior of listening to all processes. PR-URL: libuv/libuv#2308 Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR #1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was also updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv/libuv#2308 PR-URL: libuv/libuv#2381 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Problem
In #1408 the detection of console window resizing on Windows was changed to use the Windows Accessibility API (SetWinEventHook), apparently because the previous implementation couldn't reliably detect console window resizing under some conditions.
Unfortunately, the current implementation can easily lead to a total machine hang due to an explosion of console events being queued by the system. This can happen when multiple libuv-powered Windows console applications that write output to their console quickly are run in parallel, even if the applications don't care about SIGWINCH signals at all.
The problem is in the way SetWinEventHook is being used. The libuv code is passing
0for theidProcessparameter, which means "listen for events sent by any process in the system". It was doing this presumably because on Windows the console window is actually owned by a different process (conhost.exe) so libuv would never receive console resizing events if it listened only on its own process ID. Unfortunately, the event hook receives notifications when many other things happen to a console, for example when it scrolls due to a new line being written, so having many chatty console applications all listening for events from all other applications quickly leads to O(N^2) events being queued which easily brings down the whole system. It seems that Microsoft warns about this here:To demonstrate the problem, here's a simple application that writes 25 lines/sec to the console which can easily trigger the issue:
You can save that to a file called
uvhang.cand build the application as follows from an x64 Native Tools Command Prompt for VS2017:Now that you have
uvhang.exein the current directory, create a batch file calleduvhang.batwith these contents:NOTE: Running this batch file will probably hang your system. Proceed with care.
This batch file will start 50 instances of the program in parallel. On an 8-core Ryzen 2700X machine this leads to an unresponsive system within 5 seconds. We've seen 36-core production servers brought to their knees with a slightly larger number of instances (our production environment fits this profile of many console applications running in parallel).
From looking at the ETW profiling data, one can see that the
win32kfull.sysdriver spends an inordinate amount of time queueing these console events. Here's an example profile:You can see from this 23-second profile that
win32kfull.sysqueued 170,000 event messages and spent 17 seconds doing so. If you want to capture a profile like this yourself you can use a batch file like this:The problem also occurs with Node applications. Save this code to
uvhang.js:Now you can cause the system hang with this batch file:
Solution
The idea behind this PR is simply to listen for console events only from our corresponding
conhost.exeprocess, which we find usingNtQueryInformationProcesswithProcessConsoleHostProcess. With this change, we still receive console resizing events for our console and the number of events queued bywin32kfull.sysgrows only linearly with the number of processes:This works for 32-bit applications running on 32-bit Windows and 64-bit applications running on 64-bit Windows. Unfortunately it doesn't work for 32-bit applications running on 64-bit Windows because the WOW64 translation layer disallows the use of
ProcessConsoleHostProcessfor some reason. This PR currently falls back to the existing behavior of listening for events from all processes in that case, but since that means it can still cause system hangs I was wondering which of these alternatives would be preferred:conhost.exeprocess.