-
Notifications
You must be signed in to change notification settings - Fork 584
feat: add PyTorch profiler support to LAMMPS MD #4969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds PyTorch profiler support to the DeepPotPT backend to help diagnose performance issues in LAMMPS MD simulations. The profiler can be enabled via the DP_PROFILER environment variable and automatically saves profiling results to a JSON file.
Key changes include:
- Environment variable-based profiler activation with configurable output file naming
- Automatic profiler lifecycle management (enable on init, disable on destruction)
- GPU-aware profiling with device-specific output file naming
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| source/api_cc/src/DeepPotPT.cc | Implements profiler initialization logic, environment variable checking, and cleanup in destructor |
| source/api_cc/include/DeepPotPT.h | Adds profiler state tracking members to DeepPotPT class |
for more information, see https://pre-commit.ci
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdded Kineto-based PyTorch C++ profiling to DeepPotPT (enabled via DP_PROFILER). Tracked profiler state and output filename in the class, enabled profiling at construction when requested, disabled and saved the JSON trace on destructor, simplified GPU selection/device fallback logic, and documented DP_PROFILER in doc/env.md. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant DeepPotPT
participant CUDA as CUDA Runtime
participant Prof as Kineto Profiler
App->>DeepPotPT: construct()
DeepPotPT->>CUDA: torch::cuda::is_available()
CUDA-->>DeepPotPT: gpu_enabled (true/false)
DeepPotPT->>DeepPotPT: gpu_id = (gpu_num>0) ? (gpu_rank%gpu_num) : 0
alt DP_PROFILER set
DeepPotPT->>Prof: prepareProfiler(cfg, activities)
DeepPotPT->>Prof: enableProfiler(cfg, activities)
Prof-->>DeepPotPT: enabled
Note right of DeepPotPT: profiler_enabled = true\nprofiler_file set (maybe _gpuN)
else
Note right of DeepPotPT: profiler not enabled
end
App-->>DeepPotPT: run inference / model calls
sequenceDiagram
autonumber
participant App
participant DeepPotPT
participant Prof as Kineto Profiler
participant FS as Filesystem
App->>DeepPotPT: destructor()
alt profiler_enabled
DeepPotPT->>Prof: disableProfiler()
Prof-->>DeepPotPT: trace result
DeepPotPT->>FS: write `profiler_file` (.json)
FS-->>DeepPotPT: write complete
else
Note over DeepPotPT: No profiler active
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks (5 passed)✅ Passed checks (5 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codex Review: Here are some suggestions.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
source/api_cc/src/DeepPotPT.cc (3)
5-5: Use public profiler header and add required standard headersAvoid internal headers; they change across PyTorch versions. Also add missing standard headers used below.
Apply this diff around includes:
-#include <torch/csrc/autograd/profiler.h> +#include <torch/autograd/profiler.h> // public C++ API +#include <set> // std::set for activities +#include <cstdlib> // std::getenv +#include <filesystem> // path handling +#ifdef _WIN32 +# include <process.h> +# define getpid _getpid +#else +# include <unistd.h> // getpid +#endifIf you want to make profiler code optional at build time, gate the include and related code with a CMake-defined macro (e.g., DEEPMD_ENABLE_PYTORCH_PROFILER). This aligns with the earlier nit about conditional includes.
#ifdef DEEPMD_ENABLE_PYTORCH_PROFILER #include <torch/autograd/profiler.h> #endifTo confirm the correct public header for RecordProfile for your libtorch version, please check:
Which header exposes torch::autograd::profiler::RecordProfile in the PyTorch C++ API (v2.x)? Is <torch/autograd/profiler.h> the supported public include?
87-118: Switch to RecordProfile (RAII), append PID to filename, and harden path handlingThis block uses impl:: types and manual enable/disable, which is fragile across libtorch releases and deviates from the issue’s ask to use RecordProfile. RAII also fixes leaks if init() throws after enabling. Additionally, append a PID suffix to avoid collisions across MPI ranks/processes and ensure parent dirs exist. Only add “.json” if missing.
Replace the whole block with:
- // Configure PyTorch profiler - const char* env_profiler = std::getenv("DP_PROFILER"); - if (env_profiler && *env_profiler) { - using torch::profiler::impl::ActivityType; - using torch::profiler::impl::ExperimentalConfig; - using torch::profiler::impl::ProfilerConfig; - using torch::profiler::impl::ProfilerState; - std::set<ActivityType> activities{ActivityType::CPU}; - if (gpu_enabled) { - activities.insert(ActivityType::CUDA); - } - profiler_file = std::string(env_profiler); - if (gpu_enabled) { - profiler_file += "_gpu" + std::to_string(gpu_id); - } - profiler_file += ".json"; - ExperimentalConfig exp_cfg; - ProfilerConfig cfg(ProfilerState::KINETO, - false, // report_input_shapes, - false, // profile_memory, - true, // with_stack, - false, // with_flops, - true, // with_modules, - exp_cfg, - std::string() // trace_id - ); - torch::autograd::profiler::prepareProfiler(cfg, activities); - torch::autograd::profiler::enableProfiler(cfg, activities); - std::cout << "PyTorch profiler enabled, output file: " << profiler_file - << std::endl; - profiler_enabled = true; - } + // Configure PyTorch profiler (RAII) + const char* env_profiler = std::getenv("DP_PROFILER"); + if (env_profiler && *env_profiler) { + profiler_file = std::string(env_profiler); + // Avoid double ".json" + if (!std::filesystem::path(profiler_file).has_extension()) { + profiler_file += ".json"; + } + // Add disambiguators: GPU id (if any) and PID + { + std::string stem = std::filesystem::path(profiler_file).stem().string(); + std::string ext = std::filesystem::path(profiler_file).extension().string(); + if (gpu_enabled) stem += "_gpu" + std::to_string(gpu_id); + stem += "_pid" + std::to_string(static_cast<long long>(getpid())); + profiler_file = (std::filesystem::path(profiler_file).parent_path() / + (stem + ext)).string(); + } + // Best-effort create parent dirs + std::error_code ec; + auto parent = std::filesystem::path(profiler_file).parent_path(); + if (!parent.empty()) std::filesystem::create_directories(parent, ec); + // Optional: very light path sanity to avoid surprises + // (keeps absolute paths, but normalizes .. segments) + profiler_file = std::filesystem::weakly_canonical(std::filesystem::path(profiler_file), ec).string(); + // Engage RAII guard (saves to profiler_file on destruction) + profiler_guard = std::make_unique<torch::autograd::profiler::RecordProfile>(profiler_file); + std::cout << "PyTorch profiler enabled, output file: " << profiler_file << std::endl; + }If you still need advanced knobs (shapes, modules), consider verifying the supported C++ API for your libtorch version and toggling them behind macros; avoid torch::profiler::impl where possible.
PyTorch C++ profiler: how to enable Kineto tracing with RecordProfile (RAII) and configure stack/modules in v2.x without touching torch::profiler::impl?Note: Building on a prior comment about path safety, normalizing and restricting to intended dirs mitigates accidental directory traversals from env input.
153-162: Destructor must not throw; prefer relying on RAII or, at minimum, wrap in noexcept-safe try/catchAs previously noted, profiler operations can throw; throwing from a destructor during stack unwinding will call std::terminate. If you adopt RecordProfile above, you can delete this block entirely; otherwise, wrap it.
Option A (preferred, with RAII): delete the body.
-DeepPotPT::~DeepPotPT() { - if (profiler_enabled) { - auto result = torch::autograd::profiler::disableProfiler(); - if (result) { - result->save(profiler_file); - } - std::cout << "PyTorch profiler result saved to " << profiler_file - << std::endl; - } -} +DeepPotPT::~DeepPotPT() = default;Option B (keep manual control): add hard try/catch and silence all exceptions.
DeepPotPT::~DeepPotPT() { - if (profiler_enabled) { - auto result = torch::autograd::profiler::disableProfiler(); - if (result) { - result->save(profiler_file); - } - std::cout << "PyTorch profiler result saved to " << profiler_file - << std::endl; - } + try { + if (profiler_enabled) { + auto result = torch::autograd::profiler::disableProfiler(); + if (result) result->save(profiler_file); + std::cout << "PyTorch profiler result saved to " << profiler_file << std::endl; + } + } catch (const std::exception& e) { + std::cerr << "Profiler teardown error: " << e.what() << std::endl; + } catch (...) { + std::cerr << "Profiler teardown: unknown error" << std::endl; + } }
🧹 Nitpick comments (1)
source/api_cc/src/DeepPotPT.cc (1)
73-76: Construct the device after capability check to avoid kCUDA construction when CUDA is unavailableMinor cleanup; avoids creating a CUDA device object when not needed.
- torch::Device device(torch::kCUDA, gpu_id); - if (!gpu_enabled) { - device = torch::Device(torch::kCPU); + torch::Device device = gpu_enabled ? torch::Device(torch::kCUDA, gpu_id) + : torch::Device(torch::kCPU); + if (!gpu_enabled) { std::cout << "load model from: " << model << " to cpu " << std::endl;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
source/api_cc/include/DeepPotPT.h(1 hunks)source/api_cc/src/DeepPotPT.cc(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
source/api_cc/src/DeepPotPT.cc (1)
source/api_cc/include/DeepPotPT.h (1)
DeepPotPT(13-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4969 +/- ##
==========================================
- Coverage 84.29% 84.20% -0.10%
==========================================
Files 704 705 +1
Lines 68907 69228 +321
Branches 3572 3576 +4
==========================================
+ Hits 58085 58291 +206
- Misses 9681 9798 +117
+ Partials 1141 1139 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
njzjz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation should be added to env.md.
@codex address this feedback. |
|
To use Codex here, create an environment for this repo. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Jinzhe Zeng <njzjz@qq.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
doc/env.md (5)
93-100: Clarify scope and path semantics of DP_PROFILER.Add two short notes so users know where the file lands and which backend this affects.
Apply this diff:
::: {envvar} DP_PROFILER @@ -{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend. +{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend (DeepPotPT only). @@ -**Type**: string (output file stem) +**Type**: string (output file stem or path prefix) @@ -**Default**: unset (disabled) +**Default**: unset (disabled) + +If a relative path is given, the trace is written relative to the current working directory of the process; absolute paths are supported.
101-105: Prevent filename collisions; define GPU ID semantics.On CPU multi-process runs all ranks would target the same
<ENV_VALUE>.json. Also clarify that the GPU ID is the local CUDA ordinal afterCUDA_VISIBLE_DEVICES.Apply this diff:
The final file name is constructed as: -- `<ENV_VALUE>_gpu<ID>.json` if running on GPU -`- `<ENV_VALUE>.json` if running on CPU +- `<ENV_VALUE>_gpu<ID>.json` if running on GPU (where `<ID>` is the local CUDA ordinal after `CUDA_VISIBLE_DEVICES`) +- `<ENV_VALUE>.json` if running on CPU + +Note: On CPU multi-process runs, all ranks would otherwise write the same filename. Ensure each process uses a unique stem (or path) to avoid clobbering.
106-110: Add a link to PyTorch profiler docs.Small cross-reference helps readers interpret traces.
Apply this diff:
-The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing). It includes: +The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing) and tooling in [PyTorch profiler docs](https://pytorch.org/docs/stable/profiler.html). It includes:
113-117: Show per-rank filename example (CPU and GPU).Provide a concrete pattern to avoid clobbering in multi-process runs.
Apply this diff:
```bash export DP_PROFILER=result mpirun -np 4 lmp -in in.lammps # Produces result_gpuX.json, where X is the GPU id used by each MPI rank. + +# Example (OpenMPI) to ensure unique files per rank: +# GPU: +# export DP_PROFILER="result_rank${OMPI_COMM_WORLD_RANK}" +# mpirun -np 4 lmp -in in.lammps +# # Produces result_rank0_gpuX.json, result_rank1_gpuY.json, ... +# CPU: +# export DP_PROFILER="cpu_result_rank${OMPI_COMM_WORLD_RANK}" +# mpirun -np 4 lmp -in in.lammps +# # Produces cpu_result_rank0.json, cpu_result_rank1.json, ...--- `119-123`: **Tighten wording; add explicit CPU multi-process limitation.** Minor grammar fix and explicit limitation avoids surprises. Apply this diff: ```diff -Tips: +Tips: @@ -- Large runs can generate sizable JSON files; consider limiting numbers of MD steps, like 20. -- Currently this feature only supports single process, or multi-process runs where each process uses a distinct GPU on the same node. +- Large runs can generate sizable JSON files; consider limiting the number of MD steps (e.g., ~20). +- This feature currently supports: (a) single-process runs; or (b) multi-process runs where each process uses a distinct GPU on the same node. CPU multi-process runs require unique `DP_PROFILER` stems per process to avoid file overwrites.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/env.md(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test C++ (false)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (2)
doc/env.md (2)
86-92: DP_PLUGIN_PATH block looks consistent now.The layout matches other envvar blocks. No further action needed.
93-117: Overall: nice addition and matches the implementation.Content is accurate and aligned with the C++ changes. With the above clarifications, users should avoid file clobbering and understand where traces are written.
This pull request adds support for enabling the PyTorch profiler in the `DeepPotPT` backend via an environment variable, making it easier to profile and debug performance issues. The profiler can be activated by setting the `DP_PROFILER` environment variable, and will save profiling results to a JSON file when the object is destroyed. Fixes deepmodeling#4431 **Profiler integration:** * Added a `profiler_enabled` flag and `profiler_file` string to the `DeepPotPT` class to track profiler state and output location (`DeepPotPT.h`). * In the `init` method, added logic to check for the `DP_PROFILER` environment variable. If set, the PyTorch profiler is configured and enabled, with output written to a file named according to the environment variable and GPU ID (`DeepPotPT.cc`). * On destruction of `DeepPotPT`, if profiling was enabled, the profiler is properly disabled and results are saved to the specified file (`DeepPotPT.cc`). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Optional runtime profiling via DP_PROFILER environment variable. Produces per-process JSON traces (CPU-only or CPU+GPU) and saves them automatically on shutdown; console messages report profiler status and output filename. * **Refactor** * Improved device selection: auto-detects CUDA, selects a GPU per process when available and falls back to CPU; clearer startup diagnostics with no public API changes. * **Documentation** * Added DP_PROFILER usage, filename rules, examples, and operational tips to the C++ interface docs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jinzhe Zeng <njzjz@qq.com>
This pull request adds support for enabling the PyTorch profiler in the
DeepPotPTbackend via an environment variable, making it easier to profile and debug performance issues.The profiler can be activated by setting the
DP_PROFILERenvironment variable, and will save profiling results to a JSON file when the object is destroyed.Fixes #4431
Profiler integration:
profiler_enabledflag andprofiler_filestring to theDeepPotPTclass to track profiler state and output location (DeepPotPT.h).initmethod, added logic to check for theDP_PROFILERenvironment variable. If set, the PyTorch profiler is configured and enabled, with output written to a file named according to the environment variable and GPU ID (DeepPotPT.cc).DeepPotPT, if profiling was enabled, the profiler is properly disabled and results are saved to the specified file (DeepPotPT.cc).Summary by CodeRabbit
New Features
Refactor
Documentation