feat: add PyTorch profiler support to LAMMPS MD #4969

caic99 · 2025-09-09T06:08:17Z

This pull request adds support for enabling the PyTorch profiler in the DeepPotPT backend via an environment variable, making it easier to profile and debug performance issues.
The profiler can be activated by setting the DP_PROFILER environment variable, and will save profiling results to a JSON file when the object is destroyed.

Fixes #4431

Profiler integration:

Added a profiler_enabled flag and profiler_file string to the DeepPotPT class to track profiler state and output location (DeepPotPT.h).
In the init method, added logic to check for the DP_PROFILER environment variable. If set, the PyTorch profiler is configured and enabled, with output written to a file named according to the environment variable and GPU ID (DeepPotPT.cc).
On destruction of DeepPotPT, if profiling was enabled, the profiler is properly disabled and results are saved to the specified file (DeepPotPT.cc).

Summary by CodeRabbit

New Features
- Optional runtime profiling via DP_PROFILER environment variable. Produces per-process JSON traces (CPU-only or CPU+GPU) and saves them automatically on shutdown; console messages report profiler status and output filename.
Refactor
- Improved device selection: auto-detects CUDA, selects a GPU per process when available and falls back to CPU; clearer startup diagnostics with no public API changes.
Documentation
- Added DP_PROFILER usage, filename rules, examples, and operational tips to the C++ interface docs.

Copilot

Pull Request Overview

This PR adds PyTorch profiler support to the DeepPotPT backend to help diagnose performance issues in LAMMPS MD simulations. The profiler can be enabled via the DP_PROFILER environment variable and automatically saves profiling results to a JSON file.

Key changes include:

Environment variable-based profiler activation with configurable output file naming
Automatic profiler lifecycle management (enable on init, disable on destruction)
GPU-aware profiling with device-specific output file naming

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
source/api_cc/src/DeepPotPT.cc	Implements profiler initialization logic, environment variable checking, and cleanup in destructor
source/api_cc/include/DeepPotPT.h	Adds profiler state tracking members to DeepPotPT class

source/api_cc/src/DeepPotPT.cc

for more information, see https://pre-commit.ci

coderabbitai · 2025-09-09T06:10:23Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Added Kineto-based PyTorch C++ profiling to DeepPotPT (enabled via DP_PROFILER). Tracked profiler state and output filename in the class, enabled profiling at construction when requested, disabled and saved the JSON trace on destructor, simplified GPU selection/device fallback logic, and documented DP_PROFILER in doc/env.md.

Changes

Cohort / File(s)	Summary
Header: profiler state `source/api_cc/include/DeepPotPT.h`	Added private members: `bool profiler_enabled = false;` and `std::string profiler_file;`. No public API changes.
Implementation: profiler wiring & device logic `source/api_cc/src/DeepPotPT.cc`	Added `#include <torch/csrc/autograd/profiler.h>` and logic gated by `DP_PROFILER`: detect CPU/CUDA activities, build ExperimentalConfig/ProfilerConfig, call `prepareProfiler`/`enableProfiler`, set `profiler_enabled` and `profiler_file` (append `_gpuN` when using GPU). Simplified GPU selection to `gpu_id = (gpu_num>0) ? (gpu_rank%gpu_num) : 0`, use `torch::cuda::is_available()` for device choice, print diagnostics, and on destruction call `disableProfiler()` and write JSON trace if present.
Docs: environment `doc/env.md`	Documented `DP_PROFILER` env var, filename construction (`<value>_gpu<ID>.json` on GPU, `<value>.json` on CPU), trace contents (CPU always, CUDA if available), usage example and caveats about file size and multi-process usage.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant App
    participant DeepPotPT
    participant CUDA as CUDA Runtime
    participant Prof as Kineto Profiler

    App->>DeepPotPT: construct()
    DeepPotPT->>CUDA: torch::cuda::is_available()
    CUDA-->>DeepPotPT: gpu_enabled (true/false)
    DeepPotPT->>DeepPotPT: gpu_id = (gpu_num>0) ? (gpu_rank%gpu_num) : 0
    alt DP_PROFILER set
        DeepPotPT->>Prof: prepareProfiler(cfg, activities)
        DeepPotPT->>Prof: enableProfiler(cfg, activities)
        Prof-->>DeepPotPT: enabled
        Note right of DeepPotPT: profiler_enabled = true\nprofiler_file set (maybe _gpuN)
    else
        Note right of DeepPotPT: profiler not enabled
    end
    App-->>DeepPotPT: run inference / model calls

sequenceDiagram
    autonumber
    participant App
    participant DeepPotPT
    participant Prof as Kineto Profiler
    participant FS as Filesystem

    App->>DeepPotPT: destructor()
    alt profiler_enabled
        DeepPotPT->>Prof: disableProfiler()
        Prof-->>DeepPotPT: trace result
        DeepPotPT->>FS: write `profiler_file` (.json)
        FS-->>DeepPotPT: write complete
    else
        Note over DeepPotPT: No profiler active
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (5 passed)

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The current title "feat: add PyTorch profiler support to LAMMPS MD" succinctly and accurately summarizes the primary change: adding PyTorch/Kineto profiling support to the C++ DeepPotPT/LAMMPS MD backend controlled via an environment variable and documented in doc/env.md. It is concise, specific to the main change, and not misleading.
Linked Issues Check	✅ Passed	The PR implements the linked issue #4431 requirements by adding a C++ PyTorch profiler wired into DeepPotPT, making activation controllable via the DP_PROFILER environment variable, implementing profiling via the C++ PyTorch profiling API (prepareProfiler/enableProfiler/disableProfiler and ProfilerConfig/ExperimentalConfig), and adding documentation in doc/env.md. The code-level changes align with the issue's objectives to provide a profiler for the C++ backend and environment-variable control. Therefore the linked-issue coding requirements are satisfied.
Out of Scope Changes Check	✅ Passed	Changes are confined to DeepPotPT.h, DeepPotPT.cc, and doc/env.md and directly support adding profiler functionality; no unrelated modules or public API signatures were modified. There are no detected edits that fall outside the scope defined by the PR objectives and linked issue. Thus no out-of-scope changes were found.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

source/api_cc/src/DeepPotPT.cc

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

source/api_cc/src/DeepPotPT.cc (3)

5-5: Use public profiler header and add required standard headers

Avoid internal headers; they change across PyTorch versions. Also add missing standard headers used below.

Apply this diff around includes:

-#include <torch/csrc/autograd/profiler.h>
+#include <torch/autograd/profiler.h>   // public C++ API
+#include <set>                         // std::set for activities
+#include <cstdlib>                     // std::getenv
+#include <filesystem>                  // path handling
+#ifdef _WIN32
+#  include <process.h>
+#  define getpid _getpid
+#else
+#  include <unistd.h>                  // getpid
+#endif

If you want to make profiler code optional at build time, gate the include and related code with a CMake-defined macro (e.g., DEEPMD_ENABLE_PYTORCH_PROFILER). This aligns with the earlier nit about conditional includes.

#ifdef DEEPMD_ENABLE_PYTORCH_PROFILER
#include <torch/autograd/profiler.h>
#endif

To confirm the correct public header for RecordProfile for your libtorch version, please check:

Which header exposes torch::autograd::profiler::RecordProfile in the PyTorch C++ API (v2.x)? Is <torch/autograd/profiler.h> the supported public include?

87-118: Switch to RecordProfile (RAII), append PID to filename, and harden path handling

This block uses impl:: types and manual enable/disable, which is fragile across libtorch releases and deviates from the issue’s ask to use RecordProfile. RAII also fixes leaks if init() throws after enabling. Additionally, append a PID suffix to avoid collisions across MPI ranks/processes and ensure parent dirs exist. Only add “.json” if missing.

Replace the whole block with:

-  // Configure PyTorch profiler
-  const char* env_profiler = std::getenv("DP_PROFILER");
-  if (env_profiler && *env_profiler) {
-    using torch::profiler::impl::ActivityType;
-    using torch::profiler::impl::ExperimentalConfig;
-    using torch::profiler::impl::ProfilerConfig;
-    using torch::profiler::impl::ProfilerState;
-    std::set<ActivityType> activities{ActivityType::CPU};
-    if (gpu_enabled) {
-      activities.insert(ActivityType::CUDA);
-    }
-    profiler_file = std::string(env_profiler);
-    if (gpu_enabled) {
-      profiler_file += "_gpu" + std::to_string(gpu_id);
-    }
-    profiler_file += ".json";
-    ExperimentalConfig exp_cfg;
-    ProfilerConfig cfg(ProfilerState::KINETO,
-                       false,  // report_input_shapes,
-                       false,  // profile_memory,
-                       true,   // with_stack,
-                       false,  // with_flops,
-                       true,   // with_modules,
-                       exp_cfg,
-                       std::string()  // trace_id
-    );
-    torch::autograd::profiler::prepareProfiler(cfg, activities);
-    torch::autograd::profiler::enableProfiler(cfg, activities);
-    std::cout << "PyTorch profiler enabled, output file: " << profiler_file
-              << std::endl;
-    profiler_enabled = true;
-  }
+  // Configure PyTorch profiler (RAII)
+  const char* env_profiler = std::getenv("DP_PROFILER");
+  if (env_profiler && *env_profiler) {
+    profiler_file = std::string(env_profiler);
+    // Avoid double ".json"
+    if (!std::filesystem::path(profiler_file).has_extension()) {
+      profiler_file += ".json";
+    }
+    // Add disambiguators: GPU id (if any) and PID
+    {
+      std::string stem = std::filesystem::path(profiler_file).stem().string();
+      std::string ext  = std::filesystem::path(profiler_file).extension().string();
+      if (gpu_enabled) stem += "_gpu" + std::to_string(gpu_id);
+      stem += "_pid" + std::to_string(static_cast<long long>(getpid()));
+      profiler_file = (std::filesystem::path(profiler_file).parent_path() /
+                       (stem + ext)).string();
+    }
+    // Best-effort create parent dirs
+    std::error_code ec;
+    auto parent = std::filesystem::path(profiler_file).parent_path();
+    if (!parent.empty()) std::filesystem::create_directories(parent, ec);
+    // Optional: very light path sanity to avoid surprises
+    // (keeps absolute paths, but normalizes .. segments)
+    profiler_file = std::filesystem::weakly_canonical(std::filesystem::path(profiler_file), ec).string();
+    // Engage RAII guard (saves to profiler_file on destruction)
+    profiler_guard = std::make_unique<torch::autograd::profiler::RecordProfile>(profiler_file);
+    std::cout << "PyTorch profiler enabled, output file: " << profiler_file << std::endl;
+  }

If you still need advanced knobs (shapes, modules), consider verifying the supported C++ API for your libtorch version and toggling them behind macros; avoid torch::profiler::impl where possible.

PyTorch C++ profiler: how to enable Kineto tracing with RecordProfile (RAII) and configure stack/modules in v2.x without touching torch::profiler::impl?

Note: Building on a prior comment about path safety, normalizing and restricting to intended dirs mitigates accidental directory traversals from env input.

153-162: Destructor must not throw; prefer relying on RAII or, at minimum, wrap in noexcept-safe try/catch

As previously noted, profiler operations can throw; throwing from a destructor during stack unwinding will call std::terminate. If you adopt RecordProfile above, you can delete this block entirely; otherwise, wrap it.

Option A (preferred, with RAII): delete the body.

-DeepPotPT::~DeepPotPT() {
-  if (profiler_enabled) {
-    auto result = torch::autograd::profiler::disableProfiler();
-    if (result) {
-      result->save(profiler_file);
-    }
-    std::cout << "PyTorch profiler result saved to " << profiler_file
-              << std::endl;
-  }
-}
+DeepPotPT::~DeepPotPT() = default;

Option B (keep manual control): add hard try/catch and silence all exceptions.

 DeepPotPT::~DeepPotPT() {
-  if (profiler_enabled) {
-    auto result = torch::autograd::profiler::disableProfiler();
-    if (result) {
-      result->save(profiler_file);
-    }
-    std::cout << "PyTorch profiler result saved to " << profiler_file
-              << std::endl;
-  }
+  try {
+    if (profiler_enabled) {
+      auto result = torch::autograd::profiler::disableProfiler();
+      if (result) result->save(profiler_file);
+      std::cout << "PyTorch profiler result saved to " << profiler_file << std::endl;
+    }
+  } catch (const std::exception& e) {
+    std::cerr << "Profiler teardown error: " << e.what() << std::endl;
+  } catch (...) {
+    std::cerr << "Profiler teardown: unknown error" << std::endl;
+  }
 }

🧹 Nitpick comments (1)

source/api_cc/src/DeepPotPT.cc (1)

73-76: Construct the device after capability check to avoid kCUDA construction when CUDA is unavailable

Minor cleanup; avoids creating a CUDA device object when not needed.

-  torch::Device device(torch::kCUDA, gpu_id);
-  if (!gpu_enabled) {
-    device = torch::Device(torch::kCPU);
+  torch::Device device = gpu_enabled ? torch::Device(torch::kCUDA, gpu_id)
+                                     : torch::Device(torch::kCPU);
+  if (!gpu_enabled) {
     std::cout << "load model from: " << model << " to cpu " << std::endl;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef5959f and 25a36d1.

📒 Files selected for processing (2)

source/api_cc/include/DeepPotPT.h (1 hunks)
source/api_cc/src/DeepPotPT.cc (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

source/api_cc/src/DeepPotPT.cc (1)

source/api_cc/include/DeepPotPT.h (1)

DeepPotPT (13-199)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

source/api_cc/include/DeepPotPT.h

codecov · 2025-09-09T07:04:37Z

Codecov Report

❌ Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.20%. Comparing base (db22802) to head (7991d86).
⚠️ Report is 82 commits behind head on devel.

Files with missing lines	Patch %	Lines
source/api_cc/src/DeepPotPT.cc	13.04%	19 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4969      +/-   ##
==========================================
- Coverage   84.29%   84.20%   -0.10%     
==========================================
  Files         704      705       +1     
  Lines       68907    69228     +321     
  Branches     3572     3576       +4     
==========================================
+ Hits        58085    58291     +206     
- Misses       9681     9798     +117     
+ Partials     1141     1139       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

source/api_cc/src/DeepPotPT.cc

njzjz

Documentation should be added to env.md.

caic99 · 2025-09-10T06:35:35Z

Documentation should be added to env.md.

@codex address this feedback.

chatgpt-codex-connector · 2025-09-10T06:35:50Z

To use Codex here, create an environment for this repo.

for more information, see https://pre-commit.ci

doc/env.md

for more information, see https://pre-commit.ci

Co-authored-by: Jinzhe Zeng <njzjz@qq.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

doc/env.md (5)

93-100: Clarify scope and path semantics of DP_PROFILER.

Add two short notes so users know where the file lands and which backend this affects.

Apply this diff:

 ::: {envvar} DP_PROFILER
@@
-{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend.
+{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend (DeepPotPT only).
@@
-**Type**: string (output file stem)
+**Type**: string (output file stem or path prefix)
@@
-**Default**: unset (disabled)
+**Default**: unset (disabled)
+
+If a relative path is given, the trace is written relative to the current working directory of the process; absolute paths are supported.

101-105: Prevent filename collisions; define GPU ID semantics.

On CPU multi-process runs all ranks would target the same <ENV_VALUE>.json. Also clarify that the GPU ID is the local CUDA ordinal after CUDA_VISIBLE_DEVICES.

Apply this diff:

 The final file name is constructed as:
 
-- `<ENV_VALUE>_gpu<ID>.json` if running on GPU
-`- `<ENV_VALUE>.json` if running on CPU
+- `<ENV_VALUE>_gpu<ID>.json` if running on GPU (where `<ID>` is the local CUDA ordinal after `CUDA_VISIBLE_DEVICES`)
+- `<ENV_VALUE>.json` if running on CPU
+
+Note: On CPU multi-process runs, all ranks would otherwise write the same filename. Ensure each process uses a unique stem (or path) to avoid clobbering.

106-110: Add a link to PyTorch profiler docs.

Small cross-reference helps readers interpret traces.

Apply this diff:

-The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing). It includes:
+The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing) and tooling in [PyTorch profiler docs](https://pytorch.org/docs/stable/profiler.html). It includes:

113-117: Show per-rank filename example (CPU and GPU).

Provide a concrete pattern to avoid clobbering in multi-process runs.

Apply this diff:

 ```bash
 export DP_PROFILER=result
 mpirun -np 4 lmp -in in.lammps
 # Produces result_gpuX.json, where X is the GPU id used by each MPI rank.
+
+# Example (OpenMPI) to ensure unique files per rank:
+# GPU:
+#   export DP_PROFILER="result_rank${OMPI_COMM_WORLD_RANK}"
+#   mpirun -np 4 lmp -in in.lammps
+#   # Produces result_rank0_gpuX.json, result_rank1_gpuY.json, ...
+# CPU:
+#   export DP_PROFILER="cpu_result_rank${OMPI_COMM_WORLD_RANK}"
+#   mpirun -np 4 lmp -in in.lammps
+#   # Produces cpu_result_rank0.json, cpu_result_rank1.json, ...


---

`119-123`: **Tighten wording; add explicit CPU multi-process limitation.**

Minor grammar fix and explicit limitation avoids surprises.

Apply this diff:

```diff
-Tips:
+Tips:
@@
-- Large runs can generate sizable JSON files; consider limiting numbers of MD steps, like 20.
-- Currently this feature only supports single process, or multi-process runs where each process uses a distinct GPU on the same node.
+- Large runs can generate sizable JSON files; consider limiting the number of MD steps (e.g., ~20).
+- This feature currently supports: (a) single-process runs; or (b) multi-process runs where each process uses a distinct GPU on the same node. CPU multi-process runs require unique `DP_PROFILER` stems per process to avoid file overwrites.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdf254c and 7991d86.

📒 Files selected for processing (1)

doc/env.md (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (2)

doc/env.md (2)

86-92: DP_PLUGIN_PATH block looks consistent now.

The layout matches other envvar blocks. No further action needed.

93-117: Overall: nice addition and matches the implementation.

Content is accurate and aligned with the C++ changes. With the above clarifications, users should avoid file clobbering and understand where traces are written.

This pull request adds support for enabling the PyTorch profiler in the `DeepPotPT` backend via an environment variable, making it easier to profile and debug performance issues. The profiler can be activated by setting the `DP_PROFILER` environment variable, and will save profiling results to a JSON file when the object is destroyed. Fixes deepmodeling#4431 **Profiler integration:** * Added a `profiler_enabled` flag and `profiler_file` string to the `DeepPotPT` class to track profiler state and output location (`DeepPotPT.h`). * In the `init` method, added logic to check for the `DP_PROFILER` environment variable. If set, the PyTorch profiler is configured and enabled, with output written to a file named according to the environment variable and GPU ID (`DeepPotPT.cc`). * On destruction of `DeepPotPT`, if profiling was enabled, the profiler is properly disabled and results are saved to the specified file (`DeepPotPT.cc`).  ## Summary by CodeRabbit * **New Features** * Optional runtime profiling via DP_PROFILER environment variable. Produces per-process JSON traces (CPU-only or CPU+GPU) and saves them automatically on shutdown; console messages report profiler status and output filename. * **Refactor** * Improved device selection: auto-detects CUDA, selects a GPU per process when available and falls back to CPU; clearer startup diagnostics with no public API changes. * **Documentation** * Added DP_PROFILER usage, filename rules, examples, and operational tips to the C++ interface docs.  --------- Signed-off-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jinzhe Zeng <njzjz@qq.com>

feat: add PyTorch profiler support to LAMMPS MD

1a2792b

caic99 requested review from CaRoLZhangxy, Copilot and njzjz September 9, 2025 06:08

github-actions bot added the C++ label Sep 9, 2025

Copilot AI reviewed Sep 9, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Show resolved Hide resolved

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

source/api_cc/src/DeepPotPT.cc Show resolved Hide resolved

[pre-commit.ci] auto fixes from pre-commit.com hooks

25a36d1

for more information, see https://pre-commit.ci

chatgpt-codex-connector bot reviewed Sep 9, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

source/api_cc/src/DeepPotPT.cc Show resolved Hide resolved

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

source/api_cc/include/DeepPotPT.h Show resolved Hide resolved

njzjz reviewed Sep 9, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Show resolved Hide resolved

njzjz reviewed Sep 9, 2025

View reviewed changes

add docs

a830d10

github-actions bot added the Docs label Sep 10, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

63a8138

for more information, see https://pre-commit.ci

njzjz reviewed Sep 10, 2025

View reviewed changes

doc/env.md Outdated Show resolved Hide resolved

caic99 and others added 6 commits September 11, 2025 04:52

remove trace_id field for compatibility on macosx_x86 build

5739d8e

[pre-commit.ci] auto fixes from pre-commit.com hooks

6778a14

for more information, see https://pre-commit.ci

Update doc/env.md

9293e99

Co-authored-by: Jinzhe Zeng <njzjz@qq.com> Signed-off-by: Chun Cai <amoycaic@gmail.com>

add missed separator

3d0072a

[pre-commit.ci] auto fixes from pre-commit.com hooks

fdf254c

for more information, see https://pre-commit.ci

update docs

7991d86

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

njzjz approved these changes Sep 12, 2025

View reviewed changes

njzjz mentioned this pull request Sep 12, 2025

Add C++ PyTorch profiler using modern torch::profiler API with cross-platform support and MPI integration #4904

Closed

caic99 added this pull request to the merge queue Sep 15, 2025

Merged via the queue into deepmodeling:devel with commit 6d04648 Sep 15, 2025
60 checks passed

caic99 deleted the kineto branch September 15, 2025 10:36

feat: add PyTorch profiler support to LAMMPS MD #4969

feat: add PyTorch profiler support to LAMMPS MD #4969

Uh oh!

Conversation

caic99 commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (5 passed)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

njzjz left a comment

Choose a reason for hiding this comment

Uh oh!

caic99 commented Sep 10, 2025

Uh oh!

chatgpt-codex-connector bot commented Sep 10, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

caic99 commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

codecov bot commented Sep 9, 2025 •

edited

Loading