[elastic] Add Windows support for stdout/stderr redirects by 0xDELUXA · Pull Request #176789 · pytorch/pytorch

0xDELUXA · 2026-03-07T10:15:25Z

Summary

redirects.py previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).

This PR adds a proper Windows implementation that works by performing a four-layer redirect:

sys.stdout/sys.stderr - rewired to a new TextIOWrapper so Python's print() writes to the destination file
CRT fd via _dup2 - captures C-level writes through UCRT FILE* handles
Win32 SetStdHandle - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
fflush before each switch - prevents lost output from CRT buffering

os.dup2 is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own _dup/_dup2 from ucrtbase are used instead, which correctly handle the console-to-file transition.

The Linux code path is functionally unchanged.

Tested on

Windows 11, Python 3.12, ROCm 7.12.0a, PyTorch 2.12.0a0, AMD Radeon RX 9060 XT
Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:

test_redirect.py

import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")

from redirects import redirect_stdout, redirect_stderr

with redirect_stdout("test_stdout.log"):
    print("hello from python stdout")
with open("test_stdout.log") as f:
    assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")

with redirect_stdout("test_cprintf.log"):
    ucrtbase = ctypes.CDLL("ucrtbase")
    msg = b"hello from C _write\n"
    ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
    assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")

with redirect_stderr("test_stderr.log"):
    print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
    assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")

with redirect_stdout("test_restore.log"):
    print("inside redirect")
print("Test 4 PASSED - stdout restored")

Output:

Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored

Does this change break backward compatibility?

No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.

Minor housekeeping

Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the Usage: block to Usage:: per RST convention.

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

pytorch-bot · 2026-03-07T10:15:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176789

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 4 Unrelated Failures

As of commit c451210 with merge base bc65f64 ():

NEW FAILURES - The following jobs have failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1) (gh)
test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper
trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh)
Build left local git repository checkout dirty
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_opaque_obj_training_ir_to_decomp_nonstrict
trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14) (gh)
Build left local git repository checkout dirty
trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m2-15) (gh)
Build left local git repository checkout dirty
trunk / macos-py3-arm64 / test (openreg, 1, 1, macos-m1-stable) (gh)
Build left local git repository checkout dirty

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx950.1) (gh) (trunk failure)
test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_varlen_attn_paged_kv_cache_cuda
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
export/test_retraceability.py::RetraceExportNonStrictTestExport::test_opaque_obj_retraceability_nonstrict

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

Lint OSDC (unstable) / lintrunner-noclang-partial / lint (gh)
Executing the custom container implementation failed. Please contact your self hosted runner administrator.
Lint OSDC (unstable) / lintrunner-pyrefly-partial / lint (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-03-07T10:15:30Z

The committers listed above are authorized under a signed CLA.

✅ login: 0xDELUXA / name: DELUXA (c451210)

0xDELUXA · 2026-03-08T12:57:11Z

@pytorchbot label "topic: not user facing" "module: cuda" "module: windows" "module: rocm"

0xDELUXA · 2026-03-12T14:32:04Z

Bumping for visibility - CI is green and no conflicts. Would appreciate a review from any of the cc'd folks when you get a chance.

jeffdaily · 2026-03-24T17:43:56Z

@pytorchbot merge

pytorchmergebot · 2026-03-24T17:46:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-24T18:28:48Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1)

Details for Dev Infra team

Raised by workflow job

0xDELUXA · 2026-03-24T21:04:48Z

I'm not very familiar with PyTorch CI, but the failures seem unrelated. @jeffdaily, could you please confirm if this is a CI issue?

jeffdaily · 2026-03-24T22:42:31Z

@pytorchbot merge -f "all change are inside if IS_WINDOWS, there is no way this broke macos or rocm CI; stable linter is green"

pytorchmergebot · 2026-03-24T22:44:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

## Summary `redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm). This PR adds a proper Windows implementation that works by performing a four-layer redirect: 1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file 2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles 3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output 4. `fflush` before each switch - prevents lost output from CRT buffering `os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition. The Linux code path is functionally unchanged. ## Tested on - Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT - Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script: <details> <summary>test_redirect.py</summary> ```python import ctypes import sys sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing") from redirects import redirect_stdout, redirect_stderr with redirect_stdout("test_stdout.log"): print("hello from python stdout") with open("test_stdout.log") as f: assert "hello from python stdout" in f.read() print("Test 1 PASSED - Python print") with redirect_stdout("test_cprintf.log"): ucrtbase = ctypes.CDLL("ucrtbase") msg = b"hello from C _write\n" ucrtbase._write(1, msg, len(msg)) with open("test_cprintf.log") as f: assert "hello from C _write" in f.read() print("Test 2 PASSED - C-level _write") with redirect_stderr("test_stderr.log"): print("hello from stderr", file=sys.stderr) with open("test_stderr.log") as f: assert "hello from stderr" in f.read() print("Test 3 PASSED - stderr") with redirect_stdout("test_restore.log"): print("inside redirect") print("Test 4 PASSED - stdout restored") ``` Output: ``` Test 1 PASSED - Python print Test 2 PASSED - C-level _write Test 3 PASSED - stderr Test 4 PASSED - stdout restored ``` </details> ## Does this change break backward compatibility? No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped. ## Minor housekeeping Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention. Pull Request resolved: #176789 Approved by: https://github.com/jeffdaily Co-authored-by: Xia-Weiwen <12522207+Xia-Weiwen@users.noreply.github.com>

…6789) ## Summary `redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm). This PR adds a proper Windows implementation that works by performing a four-layer redirect: 1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file 2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles 3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output 4. `fflush` before each switch - prevents lost output from CRT buffering `os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition. The Linux code path is functionally unchanged. ## Tested on - Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT - Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script: <details> <summary>test_redirect.py</summary> ```python import ctypes import sys sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing") from redirects import redirect_stdout, redirect_stderr with redirect_stdout("test_stdout.log"): print("hello from python stdout") with open("test_stdout.log") as f: assert "hello from python stdout" in f.read() print("Test 1 PASSED - Python print") with redirect_stdout("test_cprintf.log"): ucrtbase = ctypes.CDLL("ucrtbase") msg = b"hello from C _write\n" ucrtbase._write(1, msg, len(msg)) with open("test_cprintf.log") as f: assert "hello from C _write" in f.read() print("Test 2 PASSED - C-level _write") with redirect_stderr("test_stderr.log"): print("hello from stderr", file=sys.stderr) with open("test_stderr.log") as f: assert "hello from stderr" in f.read() print("Test 3 PASSED - stderr") with redirect_stdout("test_restore.log"): print("inside redirect") print("Test 4 PASSED - stdout restored") ``` Output: ``` Test 1 PASSED - Python print Test 2 PASSED - C-level _write Test 3 PASSED - stderr Test 4 PASSED - stdout restored ``` </details> ## Does this change break backward compatibility? No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped. ## Minor housekeeping Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention. Pull Request resolved: pytorch#176789 Approved by: https://github.com/jeffdaily

pytorch-bot Bot added the release notes: distributed (torchelastic) label Mar 7, 2026

pytorchbot added the open source label Mar 7, 2026

0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch 5 times, most recently from c7332ec to cd41132 Compare March 7, 2026 20:38

pytorch-bot Bot added module: cuda Related to torch.cuda, and CUDA support in general module: rocm AMD GPU support for Pytorch module: windows Windows support for PyTorch topic: not user facing topic category labels Mar 8, 2026

jerryzh168 requested a review from fduwjj March 19, 2026 19:53

jerryzh168 added bot-triaged This is a label only to be used by the auto triage bot triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 19, 2026

jeffdaily reviewed Mar 23, 2026

View reviewed changes

Comment thread torch/distributed/elastic/multiprocessing/redirects.py Outdated

0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch from cd41132 to 28737bf Compare March 23, 2026 21:16

[elastic] Add Windows support for stdout/stderr redirects

c451210

0xDELUXA force-pushed the fix/elastic-redirects-windows-support branch from 28737bf to c451210 Compare March 23, 2026 21:34

jeffdaily approved these changes Mar 24, 2026

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 24, 2026

pytorchmergebot added the merging label Mar 24, 2026

pytorchmergebot removed the merging label Mar 24, 2026

pytorchmergebot added the merging label Mar 24, 2026

pytorchmergebot closed this in 71bf2c3 Mar 24, 2026

pytorchmergebot added Merged and removed merging labels Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[elastic] Add Windows support for stdout/stderr redirects#176789

[elastic] Add Windows support for stdout/stderr redirects#176789
0xDELUXA wants to merge 1 commit intopytorch:mainfrom
0xDELUXA:fix/elastic-redirects-windows-support

0xDELUXA commented Mar 7, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

0xDELUXA commented Mar 8, 2026

Uh oh!

0xDELUXA commented Mar 12, 2026

Uh oh!

Uh oh!

jeffdaily commented Mar 24, 2026

Uh oh!

pytorchmergebot commented Mar 24, 2026

Uh oh!

pytorchmergebot commented Mar 24, 2026

Uh oh!

0xDELUXA commented Mar 24, 2026

Uh oh!

jeffdaily commented Mar 24, 2026

Uh oh!

pytorchmergebot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

0xDELUXA commented Mar 7, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tested on

Does this change break backward compatibility?

Minor housekeeping

Uh oh!

pytorch-bot Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176789

❌ 6 New Failures, 4 Unrelated Failures

Uh oh!

linux-foundation-easycla Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xDELUXA commented Mar 8, 2026

Uh oh!

0xDELUXA commented Mar 12, 2026

Uh oh!

Uh oh!

jeffdaily commented Mar 24, 2026

Uh oh!

pytorchmergebot commented Mar 24, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 24, 2026

Merge failed

Uh oh!

0xDELUXA commented Mar 24, 2026

Uh oh!

jeffdaily commented Mar 24, 2026

Uh oh!

pytorchmergebot commented Mar 24, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

0xDELUXA commented Mar 7, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 7, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Mar 7, 2026 •

edited

Loading