Skip to content

win: prevent machine hang due to tty event explosion#2308

Closed
jblazquez wants to merge 1 commit intolibuv:v1.xfrom
jblazquez:fix-winevent-explosion
Closed

win: prevent machine hang due to tty event explosion#2308
jblazquez wants to merge 1 commit intolibuv:v1.xfrom
jblazquez:fix-winevent-explosion

Conversation

@jblazquez
Copy link
Copy Markdown
Contributor

Problem

In #1408 the detection of console window resizing on Windows was changed to use the Windows Accessibility API (SetWinEventHook), apparently because the previous implementation couldn't reliably detect console window resizing under some conditions.

Unfortunately, the current implementation can easily lead to a total machine hang due to an explosion of console events being queued by the system. This can happen when multiple libuv-powered Windows console applications that write output to their console quickly are run in parallel, even if the applications don't care about SIGWINCH signals at all.

The problem is in the way SetWinEventHook is being used. The libuv code is passing 0 for the idProcess parameter, which means "listen for events sent by any process in the system". It was doing this presumably because on Windows the console window is actually owned by a different process (conhost.exe) so libuv would never receive console resizing events if it listened only on its own process ID. Unfortunately, the event hook receives notifications when many other things happen to a console, for example when it scrolls due to a new line being written, so having many chatty console applications all listening for events from all other applications quickly leads to O(N^2) events being queued which easily brings down the whole system. It seems that Microsoft warns about this here:

The USER component of the operating system allocates memory for events that are handled by out-of-context hook functions. The memory is freed when the hook functions return. If a hook function does not process events quickly enough, USER resources are lowered, eventually resulting in a fault or extremely slow response times. These problems may occur if:

  • Events are fired very rapidly.
  • The system is slow.
  • The hook function processes events slowly.

To demonstrate the problem, here's a simple application that writes 25 lines/sec to the console which can easily trigger the issue:

#include <uv.h>
#include <stdio.h>

void timer_cb(uv_timer_t* handle)
{
    static int i = 0;
    printf("Timer: %d\n", ++i);
}

int main()
{
    uv_timer_t timer;
    uv_timer_init(uv_default_loop(), &timer);
    uv_timer_start(&timer, timer_cb, 0, 40);
    uv_run(uv_default_loop(), UV_RUN_DEFAULT);
}

You can save that to a file called uvhang.c and build the application as follows from an x64 Native Tools Command Prompt for VS2017:

cl /c /EHsc /Ilibuv/include uvhang.c
link uvhang.obj libuv/Release/lib/libuv.lib ws2_32.lib iphlpapi.lib advapi32.lib user32.lib userenv.lib psapi.lib

Now that you have uvhang.exe in the current directory, create a batch file called uvhang.bat with these contents:

@echo off
FOR /L %%A IN (1,1,50) DO (
	start /min uvhang.exe
)

NOTE: Running this batch file will probably hang your system. Proceed with care.

This batch file will start 50 instances of the program in parallel. On an 8-core Ryzen 2700X machine this leads to an unresponsive system within 5 seconds. We've seen 36-core production servers brought to their knees with a slightly larger number of instances (our production environment fits this profile of many console applications running in parallel).

From looking at the ETW profiling data, one can see that the win32kfull.sys driver spends an inordinate amount of time queueing these console events. Here's an example profile:

hang

You can see from this 23-second profile that win32kfull.sys queued 170,000 event messages and spent 17 seconds doing so. If you want to capture a profile like this yourself you can use a batch file like this:

@echo off
xperf -on Latency -f %temp%\kernel.etl -stackwalk Profile -start MyTrace -on Microsoft-Windows-Win32k -f %temp%\user.etl -BufferSize 1024 -MaxBuffers 1024 -MaxFile 1024
FOR /L %%A IN (1,1,50) DO (
	start /min uvhang.exe
)
ping -n 5 localhost
taskkill /f /im:uvhang.exe
xperf -stop MyTrace -stop
xperf -merge %temp%\kernel.etl %temp%\user.etl %temp%\merged.etl
wpa %temp%\merged.etl

The problem also occurs with Node applications. Save this code to uvhang.js:

var i = 0;

function timerFunc() {
  console.log(`Timer: ${++i}`);
}

setInterval(timerFunc, 40);

Now you can cause the system hang with this batch file:

@echo off
FOR /L %%A IN (1,1,50) DO (
	start /min node uvhang.js
)

Solution

The idea behind this PR is simply to listen for console events only from our corresponding conhost.exe process, which we find using NtQueryInformationProcess with ProcessConsoleHostProcess. With this change, we still receive console resizing events for our console and the number of events queued by win32kfull.sys grows only linearly with the number of processes:

nohang

This works for 32-bit applications running on 32-bit Windows and 64-bit applications running on 64-bit Windows. Unfortunately it doesn't work for 32-bit applications running on 64-bit Windows because the WOW64 translation layer disallows the use of ProcessConsoleHostProcess for some reason. This PR currently falls back to the existing behavior of listening for events from all processes in that case, but since that means it can still cause system hangs I was wondering which of these alternatives would be preferred:

  1. Leave the PR as-is and add a warning somewhere about this problem for 32-bit apps on 64-bit Windows.
  2. Revert the console window resizing detection change from win,tty: improve SIGWINCH support #1408 to its old behavior.
  3. Use this method which involves changing the window title to some unique string to find our conhost.exe process.

The tty subsystem on Windows was listening for console events from all
processes to detect when our console window was being resized. This
could cause an explosion in the number of events queued by the system
when running many console applications in parallel that all wrote to
their console quickly. The end result was a complete machine hang.

Now we determine, whenever possible, what our corresponding conhost.exe
process is and listen for console events from that process only. This
detection does not work in 32-bit applications running on 64-bit
Windows so those default to the old behavior of listening to all
processes.
@saghul
Copy link
Copy Markdown
Member

saghul commented May 22, 2019

Wow, that's a thorough bug report, thanks! The PR as-is LGTM. regarding what to do for 32bit apps on 64bit Windows... I'd say we wait for reports to come, and possible apply 3) by setting some UUID as the title only in that case.

Copy link
Copy Markdown
Member

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@bzoz
Copy link
Copy Markdown
Member

bzoz commented May 22, 2019

CI: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/1407/
Node: https://ci.nodejs.org/view/libuv/job/libuv-in-node/97/

The bug report is 11/10, great job! Originally I did not found a way to get the conhost procid, so I left 0 there.

bzoz pushed a commit that referenced this pull request Jul 12, 2019
The tty subsystem on Windows was listening for console events from all
processes to detect when our console window was being resized. This
could cause an explosion in the number of events queued by the system
when running many console applications in parallel that all wrote to
their console quickly. The end result was a complete machine hang.

Now we determine, whenever possible, what our corresponding conhost.exe
process is and listen for console events from that process only. This
detection does not work in 32-bit applications running on 64-bit
Windows so those default to the old behavior of listening to all
processes.

PR-URL: #2308
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
@bzoz
Copy link
Copy Markdown
Member

bzoz commented Jul 12, 2019

Landed in dabc737

@bzoz bzoz closed this Jul 12, 2019
@DHowett-MSFT
Copy link
Copy Markdown

Just so you all know: this will not work for application sessions using libuv under a "pseudoconsole." They have HWNDs to support legacy scenarios (like vim: it will crash if it doesn't find one. this is known to be silly.) but those HWNDs will never signal any accessibility events or be displayed anywhere on the screen. There's a bit more info at microsoft/terminal#1811.

We'd like to offer a better window size event than the one that comes in on the input handle, but we haven't yet designed it. 😄

I am moderately horrified at the MSDN documentation saying "The Win32 API provides no direct method for obtaining the window handle associated with a console application." as that is patently false. I'll get a documentation bug going to get that fixed, because that method is absolutely atrocious.

Incidentally, would GetConsoleWindow help you? 😁

@bzoz
Copy link
Copy Markdown
Member

bzoz commented Jul 16, 2019

For some background: this was added, because the previous version (reading WINDOW_BUFFER_SIZE_EVENT from the console) had a number of issues - see "Notes" in the old documentation.

@DHowett-MSFT we need the PID of the conhost.exe process for the event filter. Calling GetWindowThreadProcessId on the result of GetConsoleWindow does not give that PID, so unless there is some other method this will not be useful.

I'll make another PR that removes this hook for 32/64 bit mixture as Visual Studio bundles 32bit Node instance and is most likely run on a 64bit machine. I'll reintroduce the old method as a fallback, hopefully, this will also solve microsoft/terminal#1811.

If after that the performance issues prevail, we can always switch to pooling the console size every once in a while. Or if you have any other idea how this can be implemented, please let me know.

bzoz added a commit to JaneaSystems/libuv that referenced this pull request Jul 18, 2019
Continuing improvement of SIGWINCH from PR libuv#2308.

Running SetWinEventHook without filtering for the specific PIDs has
significant impact on the performance of the entire system. This PR
changes the way SIGWINCH is handled.

The SetWinEventHook callback now signals a separate thread,
uv__tty_console_resize_watcher_thread. This thread calls
uv__tty_console_signal_resize() which checks if the console was actually
resized. The uv__tty_console_resize_watcher_thread makes sure to not to
call the uv__tty_console_signal_resize function more than 30 times per
second.

The SetWinEventHook will not be installed, if the PID of the
conhost.exe process that owns the console window cannot be
determinated. This can happen when a 32bit libuv app is running on a
64bit Windows.

For such cases PR libuv#1408 is partially reverted - when tty reads
WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to
uv__tty_console_signal_resize(). This will also help when the app is
running under console emulators. Documentation was alos updated to
reflect that.

Refs: microsoft/terminal#1811
Refs: microsoft/terminal#410
Refs: libuv#2308
bzoz added a commit to JaneaSystems/libuv that referenced this pull request Sep 5, 2019
Continuing improvement of SIGWINCH from PR libuv#2308.

Running SetWinEventHook without filtering for the specific PIDs has
significant impact on the performance of the entire system. This PR
changes the way SIGWINCH is handled.

The SetWinEventHook callback now signals a separate thread,
uv__tty_console_resize_watcher_thread. This thread calls
uv__tty_console_signal_resize() which checks if the console was actually
resized. The uv__tty_console_resize_watcher_thread makes sure to not to
call the uv__tty_console_signal_resize function more than 30 times per
second.

The SetWinEventHook will not be installed, if the PID of the
conhost.exe process that owns the console window cannot be
determinated. This can happen when a 32bit libuv app is running on a
64bit Windows.

For such cases PR libuv#1408 is partially reverted - when tty reads
WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to
uv__tty_console_signal_resize(). This will also help when the app is
running under console emulators. Documentation was alos updated to
reflect that.

Refs: microsoft/terminal#1811
Refs: microsoft/terminal#410
Refs: libuv#2308
bzoz added a commit that referenced this pull request Sep 5, 2019
Continuing improvement of SIGWINCH from PR #2308.

Running SetWinEventHook without filtering for the specific PIDs has
significant impact on the performance of the entire system. This PR
changes the way SIGWINCH is handled.

The SetWinEventHook callback now signals a separate thread,
uv__tty_console_resize_watcher_thread. This thread calls
uv__tty_console_signal_resize() which checks if the console was actually
resized. The uv__tty_console_resize_watcher_thread makes sure to not to
call the uv__tty_console_signal_resize function more than 30 times per
second.

The SetWinEventHook will not be installed, if the PID of the
conhost.exe process that owns the console window cannot be
determinated. This can happen when a 32bit libuv app is running on a
64bit Windows.

For such cases PR #1408 is partially reverted - when tty reads
WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to
uv__tty_console_signal_resize(). This will also help when the app is
running under console emulators. Documentation was also updated to
reflect that.

Refs: microsoft/terminal#1811
Refs: microsoft/terminal#410
Refs: #2308

PR-URL: #2381
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
The tty subsystem on Windows was listening for console events from all
processes to detect when our console window was being resized. This
could cause an explosion in the number of events queued by the system
when running many console applications in parallel that all wrote to
their console quickly. The end result was a complete machine hang.

Now we determine, whenever possible, what our corresponding conhost.exe
process is and listen for console events from that process only. This
detection does not work in 32-bit applications running on 64-bit
Windows so those default to the old behavior of listening to all
processes.

PR-URL: libuv/libuv#2308
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
Continuing improvement of SIGWINCH from PR #2308.

Running SetWinEventHook without filtering for the specific PIDs has
significant impact on the performance of the entire system. This PR
changes the way SIGWINCH is handled.

The SetWinEventHook callback now signals a separate thread,
uv__tty_console_resize_watcher_thread. This thread calls
uv__tty_console_signal_resize() which checks if the console was actually
resized. The uv__tty_console_resize_watcher_thread makes sure to not to
call the uv__tty_console_signal_resize function more than 30 times per
second.

The SetWinEventHook will not be installed, if the PID of the
conhost.exe process that owns the console window cannot be
determinated. This can happen when a 32bit libuv app is running on a
64bit Windows.

For such cases PR #1408 is partially reverted - when tty reads
WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to
uv__tty_console_signal_resize(). This will also help when the app is
running under console emulators. Documentation was also updated to
reflect that.

Refs: microsoft/terminal#1811
Refs: microsoft/terminal#410
Refs: libuv/libuv#2308

PR-URL: libuv/libuv#2381
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
The tty subsystem on Windows was listening for console events from all
processes to detect when our console window was being resized. This
could cause an explosion in the number of events queued by the system
when running many console applications in parallel that all wrote to
their console quickly. The end result was a complete machine hang.

Now we determine, whenever possible, what our corresponding conhost.exe
process is and listen for console events from that process only. This
detection does not work in 32-bit applications running on 64-bit
Windows so those default to the old behavior of listening to all
processes.

PR-URL: libuv/libuv#2308
Reviewed-By: Saúl Ibarra Corretgé <saghul@gmail.com>
Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
Continuing improvement of SIGWINCH from PR #2308.

Running SetWinEventHook without filtering for the specific PIDs has
significant impact on the performance of the entire system. This PR
changes the way SIGWINCH is handled.

The SetWinEventHook callback now signals a separate thread,
uv__tty_console_resize_watcher_thread. This thread calls
uv__tty_console_signal_resize() which checks if the console was actually
resized. The uv__tty_console_resize_watcher_thread makes sure to not to
call the uv__tty_console_signal_resize function more than 30 times per
second.

The SetWinEventHook will not be installed, if the PID of the
conhost.exe process that owns the console window cannot be
determinated. This can happen when a 32bit libuv app is running on a
64bit Windows.

For such cases PR #1408 is partially reverted - when tty reads
WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to
uv__tty_console_signal_resize(). This will also help when the app is
running under console emulators. Documentation was also updated to
reflect that.

Refs: microsoft/terminal#1811
Refs: microsoft/terminal#410
Refs: libuv/libuv#2308

PR-URL: libuv/libuv#2381
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants