Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Oct 20, 2025

Summary by CodeRabbit

  • Chores
    • Enhanced platform-specific multiprocessing initialization with improved logging for better diagnostics.
    • Streamlined module logging setup to reduce code redundancy.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings October 20, 2025 11:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a compatibility issue with Python 3.14, which defaults to the 'forkserver' multiprocessing start method instead of 'fork'. The change explicitly sets the multiprocessing start method to 'fork' at the application entry point to maintain consistent behavior across Python versions.

Key Changes:

  • Added multiprocessing start method configuration in the main function to explicitly set it to 'fork'
  • Included error handling with logging for both successful configuration and failures

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The PR adds platform-aware multiprocessing initialization to deepmd/pt/utils/env.py. On non-Windows systems, it attempts to set the multiprocessing start method to "fork" with logging for success/failure. On Windows, it logs a debug message. It also removes redundant logger initialization.

Changes

Cohort / File(s) Summary
Multiprocessing start method initialization
deepmd/pt/utils/env.py
Added sys import and module-level logging. Implemented platform-specific multiprocessing start method configuration: attempts to force "fork" method on non-Windows platforms with appropriate logging; skips fork setup on Windows with debug log. Removed redundant logger reinitialization from conditional branch while preserving existing NUM_WORKERS handling logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

  • Verify that force=True on multiprocessing.set_start_method() doesn't introduce unexpected side effects on different non-Windows platforms (e.g., macOS, Linux variants)
  • Ensure logging levels and messages align with the codebase conventions and are useful for debugging multiprocessing issues
  • Validate interaction between the new start method initialization and the existing NUM_WORKERS conditional logic to confirm no unintended behavioral changes

Possibly related PRs

  • deepmodeling/deepmd-kit#4784: Modifies the same file (deepmd/pt/utils/env.py) to handle multiprocessing start methods—that PR disables workers when start method isn't "fork", while this PR instead attempts to force the "fork" method with logging, addressing the same concern through different strategies.

Suggested reviewers

  • njzjz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The PR title directly addresses the main change: setting multiprocessing start method to 'fork' for Python 3.14 compatibility, which aligns with the file changes and objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
deepmd/main.py (1)

992-993: Past review suggestions not addressed.

Previous reviews suggested:

  1. Line 992: Change logging.info() to logging.debug() since successful initialization typically doesn't need info-level logging
  2. Line 993: Catch specific exceptions (RuntimeError, ValueError) instead of broad Exception

These suggestions remain valid. The static analysis tool also flags the broad exception catching.

As per coding guidelines, please run ruff check . before committing to address the BLE001 warning.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9764f8 and 27e58b3.

📒 Files selected for processing (1)
  • deepmd/main.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Always run ruff check . and ruff format . before committing changes to Python code

Files:

  • deepmd/main.py
🪛 Ruff (0.14.0)
deepmd/main.py

993-993: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
deepmd/main.py (1)

987-994: Logging occurs before CLI configuration is applied.

This code logs messages before parse_args() (line 996) processes the --log-level and --log-path arguments. The log messages will use Python's default logging configuration, so user-specified log levels and paths won't apply to these early messages.

This is acceptable for early initialization messages, but be aware that users won't be able to control these logs via CLI arguments.

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.33%. Comparing base (1ee33c8) to head (c863724).
⚠️ Report is 1 commits behind head on devel.

Files with missing lines Patch % Lines
deepmd/pt/utils/env.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #5019      +/-   ##
==========================================
- Coverage   84.33%   84.33%   -0.01%     
==========================================
  Files         709      709              
  Lines       70435    70443       +8     
  Branches     3618     3619       +1     
==========================================
+ Hits        59402    59407       +5     
- Misses       9867     9869       +2     
- Partials     1166     1167       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this PR?

@OutisLi OutisLi changed the title fix: set multiprocessing start method to 'fork' in main function(python3.14 defaults to forkserver) fix: set multiprocessing start method to 'fork' in pt env (since python3.14 defaults to forkserver) Nov 27, 2025
@OutisLi OutisLi requested a review from Copilot November 27, 2025 05:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@OutisLi OutisLi requested a review from njzjz November 27, 2025 06:24
Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure forkserver is not supported by PyTorch?

@OutisLi
Copy link
Collaborator Author

OutisLi commented Nov 27, 2025

Are you sure forkserver is not supported by PyTorch?

It is supported by pytorch, but num_worker can only be 0 in this case. And python 3.14 defaults to forkserver. @caic99

@njzjz
Copy link
Member

njzjz commented Nov 27, 2025

Are you sure forkserver is not supported by PyTorch?

It is supported by pytorch, but num_worker can only be 0 in this case. And python 3.14 defaults to forkserver. @caic99

I know @caic99 made a commit that sets num_worker to 0 for forkserver, but this may not be correct. We need to figure out if pytorch supports forkserver.

@OutisLi
Copy link
Collaborator Author

OutisLi commented Nov 28, 2025

Are you sure forkserver is not supported by PyTorch?

It is supported by pytorch, but num_worker can only be 0 in this case. And python 3.14 defaults to forkserver. @caic99

I know @caic99 made a commit that sets num_worker to 0 for forkserver, but this may not be correct. We need to figure out if pytorch supports forkserver.

Okay, I will test that later

@OutisLi
Copy link
Collaborator Author

OutisLi commented Nov 30, 2025

Are you sure forkserver is not supported by PyTorch?

It is supported by pytorch, but num_worker can only be 0 in this case. And python 3.14 defaults to forkserver. @caic99

I know @caic99 made a commit that sets num_worker to 0 for forkserver, but this may not be correct. We need to figure out if pytorch supports forkserver.

I tried using default settings in python3.14, and remove limit of NUM_WORKERS. As a result, an error occurred:

Traceback (most recent call last):
  File "/home/outisli/miniforge3/envs/dpmd/bin/dp", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/outisli/Software/deepmd-kit/deepmd/main.py", line 1025, in main
    deepmd_main(args)
    ~~~~~~~~~~~^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/entrypoints/main.py", line 541, in main
    train(
    ~~~~~^
        input_file=FLAGS.INPUT,
        ^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        output=FLAGS.output,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/entrypoints/main.py", line 372, in train
    trainer.run()
    ~~~~~~~~~~~^^
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/train/training.py", line 1137, in run
    step(step_id)
    ~~~~^^^^^^^^^
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/train/training.py", line 764, in step
    input_dict, label_dict, log_dict = self.get_data(
                                       ~~~~~~~~~~~~~^
        is_train=True, task_key=task_key
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/train/training.py", line 1246, in get_data
    batch_data = next(iterator)
  File "/home/outisli/Software/deepmd-kit/deepmd/pt/train/training.py", line 186, in cycle_iterator
    it = iter(iterable)
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/site-packages/torch/utils/data/dataloader.py", line 494, in __iter__
    return self._get_iterator()
           ~~~~~~~~~~~~~~~~~~^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/site-packages/torch/utils/data/dataloader.py", line 427, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/site-packages/torch/utils/data/dataloader.py", line 1170, in __init__
    w.start()
    ~~~~~~~^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ~~~~~~~~~~~^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/context.py", line 300, in _Popen
    return Popen(process_obj)
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
    ~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/home/outisli/miniforge3/envs/dpmd/lib/python3.14/site-packages/torch/utils/data/dataloader.py", line 761, in __getstate__
    raise NotImplementedError("{} cannot be pickled", self.__class__.__name__)
NotImplementedError: ('{} cannot be pickled', '_SingleProcessDataLoaderIter')
when serializing list item 0
when serializing dict item 'iters'
when serializing deepmd.pt.utils.dataloader.DpLoaderSet state
when serializing deepmd.pt.utils.dataloader.DpLoaderSet object
when serializing tuple item 1
when serializing dict item '_args'
when serializing multiprocessing.context.Process state
when serializing multiprocessing.context.Process object

@njzjz njzjz added this pull request to the merge queue Nov 30, 2025
Merged via the queue into deepmodeling:devel with commit 34f03fc Nov 30, 2025
66 checks passed
@OutisLi OutisLi deleted the pr/multi_fork branch December 2, 2025 08:59
ChiahsinChu pushed a commit to ChiahsinChu/deepmd-kit that referenced this pull request Dec 17, 2025
…on3.14 defaults to forkserver) (deepmodeling#5019)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Enhanced platform-specific multiprocessing initialization with
improved logging for better diagnostics.
  * Streamlined module logging setup to reduce code redundancy.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants