[elastic] Add Windows support for stdout/stderr redirects#176789
[elastic] Add Windows support for stdout/stderr redirects#1767890xDELUXA wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176789
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 4 Unrelated FailuresAs of commit c451210 with merge base bc65f64 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
c7332ec to
cd41132
Compare
|
@pytorchbot label "topic: not user facing" "module: cuda" "module: windows" "module: rocm" |
|
Bumping for visibility - CI is green and no conflicts. Would appreciate a review from any of the cc'd folks when you get a chance. |
cd41132 to
28737bf
Compare
28737bf to
c451210
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx950.1) Details for Dev Infra teamRaised by workflow job |
|
I'm not very familiar with PyTorch CI, but the failures seem unrelated. @jeffdaily, could you please confirm if this is a CI issue? |
|
@pytorchbot merge -f "all change are inside if IS_WINDOWS, there is no way this broke macos or rocm CI; stable linter is green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
## Summary
`redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).
This PR adds a proper Windows implementation that works by performing a four-layer redirect:
1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file
2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles
3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output
4. `fflush` before each switch - prevents lost output from CRT buffering
`os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition.
The Linux code path is functionally unchanged.
## Tested on
- Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT
- Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script:
<details>
<summary>test_redirect.py</summary>
```python
import ctypes
import sys
sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing")
from redirects import redirect_stdout, redirect_stderr
with redirect_stdout("test_stdout.log"):
print("hello from python stdout")
with open("test_stdout.log") as f:
assert "hello from python stdout" in f.read()
print("Test 1 PASSED - Python print")
with redirect_stdout("test_cprintf.log"):
ucrtbase = ctypes.CDLL("ucrtbase")
msg = b"hello from C _write\n"
ucrtbase._write(1, msg, len(msg))
with open("test_cprintf.log") as f:
assert "hello from C _write" in f.read()
print("Test 2 PASSED - C-level _write")
with redirect_stderr("test_stderr.log"):
print("hello from stderr", file=sys.stderr)
with open("test_stderr.log") as f:
assert "hello from stderr" in f.read()
print("Test 3 PASSED - stderr")
with redirect_stdout("test_restore.log"):
print("inside redirect")
print("Test 4 PASSED - stdout restored")
```
Output:
```
Test 1 PASSED - Python print
Test 2 PASSED - C-level _write
Test 3 PASSED - stderr
Test 4 PASSED - stdout restored
```
</details>
## Does this change break backward compatibility?
No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.
## Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention.
Pull Request resolved: #176789
Approved by: https://github.com/jeffdaily
Co-authored-by: Xia-Weiwen <12522207+Xia-Weiwen@users.noreply.github.com>
…6789) ## Summary `redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm). This PR adds a proper Windows implementation that works by performing a four-layer redirect: 1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file 2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles 3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output 4. `fflush` before each switch - prevents lost output from CRT buffering `os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition. The Linux code path is functionally unchanged. ## Tested on - Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT - Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script: <details> <summary>test_redirect.py</summary> ```python import ctypes import sys sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing") from redirects import redirect_stdout, redirect_stderr with redirect_stdout("test_stdout.log"): print("hello from python stdout") with open("test_stdout.log") as f: assert "hello from python stdout" in f.read() print("Test 1 PASSED - Python print") with redirect_stdout("test_cprintf.log"): ucrtbase = ctypes.CDLL("ucrtbase") msg = b"hello from C _write\n" ucrtbase._write(1, msg, len(msg)) with open("test_cprintf.log") as f: assert "hello from C _write" in f.read() print("Test 2 PASSED - C-level _write") with redirect_stderr("test_stderr.log"): print("hello from stderr", file=sys.stderr) with open("test_stderr.log") as f: assert "hello from stderr" in f.read() print("Test 3 PASSED - stderr") with redirect_stdout("test_restore.log"): print("inside redirect") print("Test 4 PASSED - stdout restored") ``` Output: ``` Test 1 PASSED - Python print Test 2 PASSED - C-level _write Test 3 PASSED - stderr Test 4 PASSED - stdout restored ``` </details> ## Does this change break backward compatibility? No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped. ## Minor housekeeping Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention. Pull Request resolved: pytorch#176789 Approved by: https://github.com/jeffdaily
…6789) ## Summary `redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm). This PR adds a proper Windows implementation that works by performing a four-layer redirect: 1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file 2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles 3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output 4. `fflush` before each switch - prevents lost output from CRT buffering `os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition. The Linux code path is functionally unchanged. ## Tested on - Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT - Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script: <details> <summary>test_redirect.py</summary> ```python import ctypes import sys sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing") from redirects import redirect_stdout, redirect_stderr with redirect_stdout("test_stdout.log"): print("hello from python stdout") with open("test_stdout.log") as f: assert "hello from python stdout" in f.read() print("Test 1 PASSED - Python print") with redirect_stdout("test_cprintf.log"): ucrtbase = ctypes.CDLL("ucrtbase") msg = b"hello from C _write\n" ucrtbase._write(1, msg, len(msg)) with open("test_cprintf.log") as f: assert "hello from C _write" in f.read() print("Test 2 PASSED - C-level _write") with redirect_stderr("test_stderr.log"): print("hello from stderr", file=sys.stderr) with open("test_stderr.log") as f: assert "hello from stderr" in f.read() print("Test 3 PASSED - stderr") with redirect_stdout("test_restore.log"): print("inside redirect") print("Test 4 PASSED - stdout restored") ``` Output: ``` Test 1 PASSED - Python print Test 2 PASSED - C-level _write Test 3 PASSED - stderr Test 4 PASSED - stdout restored ``` </details> ## Does this change break backward compatibility? No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped. ## Minor housekeeping Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention. Pull Request resolved: pytorch#176789 Approved by: https://github.com/jeffdaily
…6789) ## Summary `redirects.py` previously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm). This PR adds a proper Windows implementation that works by performing a four-layer redirect: 1. `sys.stdout`/`sys.stderr` - rewired to a new TextIOWrapper so Python's `print()` writes to the destination file 2. CRT fd via `_dup2` - captures C-level writes through UCRT FILE* handles 3. Win32 `SetStdHandle` - captures native code using WriteFile/WriteConsole directly, including GPU runtime output 4. `fflush` before each switch - prevents lost output from CRT buffering `os.dup2` is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own `_dup`/`_dup2` from `ucrtbase` are used instead, which correctly handle the console-to-file transition. The Linux code path is functionally unchanged. ## Tested on - Windows 11, Python `3.12`, ROCm `7.12.0a`, PyTorch `2.12.0a0`, AMD Radeon RX 9060 XT - Verified all four cases: Python print, C-level _write, stderr redirect, and stdout restore with the following test script: <details> <summary>test_redirect.py</summary> ```python import ctypes import sys sys.path.insert(0, r"path/to/torch/distributed/elastic/multiprocessing") from redirects import redirect_stdout, redirect_stderr with redirect_stdout("test_stdout.log"): print("hello from python stdout") with open("test_stdout.log") as f: assert "hello from python stdout" in f.read() print("Test 1 PASSED - Python print") with redirect_stdout("test_cprintf.log"): ucrtbase = ctypes.CDLL("ucrtbase") msg = b"hello from C _write\n" ucrtbase._write(1, msg, len(msg)) with open("test_cprintf.log") as f: assert "hello from C _write" in f.read() print("Test 2 PASSED - C-level _write") with redirect_stderr("test_stderr.log"): print("hello from stderr", file=sys.stderr) with open("test_stderr.log") as f: assert "hello from stderr" in f.read() print("Test 3 PASSED - stderr") with redirect_stdout("test_restore.log"): print("inside redirect") print("Test 4 PASSED - stdout restored") ``` Output: ``` Test 1 PASSED - Python print Test 2 PASSED - C-level _write Test 3 PASSED - stderr Test 4 PASSED - stdout restored ``` </details> ## Does this change break backward compatibility? No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped. ## Minor housekeeping Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the `Usage:` block to `Usage::` per RST convention. Pull Request resolved: pytorch#176789 Approved by: https://github.com/jeffdaily
Summary
redirects.pypreviously disabled stdout/stderr redirection on Windows with a warning, affecting all Windows GPU users (both NVIDIA CUDA and AMD ROCm).This PR adds a proper Windows implementation that works by performing a four-layer redirect:
sys.stdout/sys.stderr- rewired to a new TextIOWrapper so Python'sprint()writes to the destination file_dup2- captures C-level writes through UCRT FILE* handlesSetStdHandle- captures native code using WriteFile/WriteConsole directly, including GPU runtime outputfflushbefore each switch - prevents lost output from CRT bufferingos.dup2is intentionally not used on Windows as it silently corrupts file descriptors backed by console HANDLEs. The CRT's own_dup/_dup2fromucrtbaseare used instead, which correctly handle the console-to-file transition.The Linux code path is functionally unchanged.
Tested on
3.12, ROCm7.12.0a, PyTorch2.12.0a0, AMD Radeon RX 9060 XTtest_redirect.py
Output:
Does this change break backward compatibility?
No. Linux behavior is identical to before. On Windows, redirects now work instead of being silently skipped.
Minor housekeeping
Fixed a pre-existing typo in the Linux docstring (missing closing paren in the usage example) and normalized the
Usage:block toUsage::per RST convention.cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang