Skip to content

[elastic] Add Windows support for stdout/stderr redirects#176789

Closed
0xDELUXA wants to merge 1 commit intopytorch:mainfrom
0xDELUXA:fix/elastic-redirects-windows-support
Closed

[elastic] Add Windows support for stdout/stderr redirects#176789
0xDELUXA wants to merge 1 commit intopytorch:mainfrom
0xDELUXA:fix/elastic-redirects-windows-support

Conversation

@0xDELUXA
Copy link
Copy Markdown
Contributor

@0xDELUXA 0xDELUXA commented Mar 7, 2026

Summary

redirects.py previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

  1. sys.stdout/sys.stderr - rewired to a new TextIOWrapper so Python's print() writes to the destination file
  2. CRT fd via _dup2 - captures C-level writes through UCRT FILE* handles
  3. Win32 SetStdHandle - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
  4. fflush before each switch - prevents lost output from CRT buffering

os.dup2 is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own _dup/_dup2 from ucrtbase are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

Tested on

  • Windows 11, Python 3.12, ROCm 7.12.0a, PyTorch 2.12.0a0, AMD Radeon RX 9060 XT
  • Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:
test_redirect.py
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")

Output:

Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored

Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

Minor housekeeping

Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the Usage: block to Usage:: per RST convention.

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176789

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 4 Unrelated Failures

As of commit c451210 with merge base bc65f64 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Mar 7, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: 0xDELUXA / name: DELUXA (c451210)

@0xDELUXA 0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch 5 times, most recently from c7332ec to cd41132 Compare March 7, 2026 20:38
@0xDELUXA
Copy link
Copy Markdown
Contributor Author

0xDELUXA commented Mar 8, 2026

@pytorchbot label "topic: not user facing" "module: cuda" "module: windows" "module: rocm"

@pytorch-bot pytorch-bot Bot added module: cuda Related to torch.cuda, and CUDA support in general module: rocm AMD GPU support for Pytorch module: windows Windows support for PyTorch topic: not user facing topic category labels Mar 8, 2026
@0xDELUXA
Copy link
Copy Markdown
Contributor Author

Bumping for visibility - CI is green and no conflicts. Would appreciate a review from any of the cc'd folks when you get a chance.

@jerryzh168 jerryzh168 requested a review from fduwjj March 19, 2026 19:53
@jerryzh168 jerryzh168 added bot-triaged This is a label only to be used by the auto triage bot triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 19, 2026
Comment thread torch/distributed/elastic/multiprocessing/redirects.py Outdated
@0xDELUXA 0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch from cd41132 to 28737bf Compare March 23, 2026 21:16
@0xDELUXA 0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch from 28737bf to c451210 Compare March 23, 2026 21:34
@jeffdaily
Copy link
Copy Markdown
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 24, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1)

Details for Dev Infra team Raised by workflow job

@0xDELUXA
Copy link
Copy Markdown
Contributor Author

I'm not very familiar with PyTorch CI, but the failures seem unrelated. @jeffdaily, could you please confirm if this is a CI issue?

@jeffdaily
Copy link
Copy Markdown
Collaborator

@pytorchbot merge -f "all change are inside if IS_WINDOWS, there is no way this broke macos or rocm CI; stable linter is green"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copilot AI pushed a commit that referenced this pull request Mar 27, 2026
## Summary

`redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file
2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles
3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
4. `fflush` before each switch - prevents lost output from CRT buffering

`os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

## Tested on

- Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT
- Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:

<details>
<summary>test_redirect.py</summary>

```python
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")
```

Output:
```
Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored
```

</details>

## Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

## Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention.

Pull Request resolved: #176789
Approved by: https://github.com/jeffdaily

Co-authored-by: Xia-Weiwen <12522207+Xia-Weiwen@users.noreply.github.com>
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
…6789)

## Summary

`redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file
2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles
3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
4. `fflush` before each switch - prevents lost output from CRT buffering

`os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

## Tested on

- Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT
- Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:

<details>
<summary>test_redirect.py</summary>

```python
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")
```

Output:
```
Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored
```

</details>

## Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

## Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention.

Pull Request resolved: pytorch#176789
Approved by: https://github.com/jeffdaily
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Apr 2, 2026
…6789)

## Summary

`redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file
2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles
3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
4. `fflush` before each switch - prevents lost output from CRT buffering

`os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

## Tested on

- Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT
- Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:

<details>
<summary>test_redirect.py</summary>

```python
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")
```

Output:
```
Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored
```

</details>

## Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

## Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention.

Pull Request resolved: pytorch#176789
Approved by: https://github.com/jeffdaily
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
…6789)

## Summary

`redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file
2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles
3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
4. `fflush` before each switch - prevents lost output from CRT buffering

`os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

## Tested on

- Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT
- Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:

<details>
<summary>test_redirect.py</summary>

```python
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")
```

Output:
```
Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored
```

</details>

## Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

## Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention.

Pull Request resolved: pytorch#176789
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot-triaged This is a label only to be used by the auto triage bot ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda Related to torch.cuda, and CUDA support in general module: rocm AMD GPU support for Pytorch module: windows Windows support for PyTorch open source release notes: distributed (torchelastic) topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants