Skip to content

Conversation

@caic99
Copy link
Member

@caic99 caic99 commented Sep 9, 2025

This pull request adds support for enabling the PyTorch profiler in the DeepPotPT backend via an environment variable, making it easier to profile and debug performance issues.
The profiler can be activated by setting the DP_PROFILER environment variable, and will save profiling results to a JSON file when the object is destroyed.

Fixes #4431

Profiler integration:

  • Added a profiler_enabled flag and profiler_file string to the DeepPotPT class to track profiler state and output location (DeepPotPT.h).
  • In the init method, added logic to check for the DP_PROFILER environment variable. If set, the PyTorch profiler is configured and enabled, with output written to a file named according to the environment variable and GPU ID (DeepPotPT.cc).
  • On destruction of DeepPotPT, if profiling was enabled, the profiler is properly disabled and results are saved to the specified file (DeepPotPT.cc).

Summary by CodeRabbit

  • New Features

    • Optional runtime profiling via DP_PROFILER environment variable. Produces per-process JSON traces (CPU-only or CPU+GPU) and saves them automatically on shutdown; console messages report profiler status and output filename.
  • Refactor

    • Improved device selection: auto-detects CUDA, selects a GPU per process when available and falls back to CPU; clearer startup diagnostics with no public API changes.
  • Documentation

    • Added DP_PROFILER usage, filename rules, examples, and operational tips to the C++ interface docs.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds PyTorch profiler support to the DeepPotPT backend to help diagnose performance issues in LAMMPS MD simulations. The profiler can be enabled via the DP_PROFILER environment variable and automatically saves profiling results to a JSON file.

Key changes include:

  • Environment variable-based profiler activation with configurable output file naming
  • Automatic profiler lifecycle management (enable on init, disable on destruction)
  • GPU-aware profiling with device-specific output file naming

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
source/api_cc/src/DeepPotPT.cc Implements profiler initialization logic, environment variable checking, and cleanup in destructor
source/api_cc/include/DeepPotPT.h Adds profiler state tracking members to DeepPotPT class

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Added Kineto-based PyTorch C++ profiling to DeepPotPT (enabled via DP_PROFILER). Tracked profiler state and output filename in the class, enabled profiling at construction when requested, disabled and saved the JSON trace on destructor, simplified GPU selection/device fallback logic, and documented DP_PROFILER in doc/env.md.

Changes

Cohort / File(s) Summary
Header: profiler state
source/api_cc/include/DeepPotPT.h
Added private members: bool profiler_enabled = false; and std::string profiler_file;. No public API changes.
Implementation: profiler wiring & device logic
source/api_cc/src/DeepPotPT.cc
Added #include <torch/csrc/autograd/profiler.h> and logic gated by DP_PROFILER: detect CPU/CUDA activities, build ExperimentalConfig/ProfilerConfig, call prepareProfiler/enableProfiler, set profiler_enabled and profiler_file (append _gpuN when using GPU). Simplified GPU selection to gpu_id = (gpu_num>0) ? (gpu_rank%gpu_num) : 0, use torch::cuda::is_available() for device choice, print diagnostics, and on destruction call disableProfiler() and write JSON trace if present.
Docs: environment
doc/env.md
Documented DP_PROFILER env var, filename construction (<value>_gpu<ID>.json on GPU, <value>.json on CPU), trace contents (CPU always, CUDA if available), usage example and caveats about file size and multi-process usage.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant App
    participant DeepPotPT
    participant CUDA as CUDA Runtime
    participant Prof as Kineto Profiler

    App->>DeepPotPT: construct()
    DeepPotPT->>CUDA: torch::cuda::is_available()
    CUDA-->>DeepPotPT: gpu_enabled (true/false)
    DeepPotPT->>DeepPotPT: gpu_id = (gpu_num>0) ? (gpu_rank%gpu_num) : 0
    alt DP_PROFILER set
        DeepPotPT->>Prof: prepareProfiler(cfg, activities)
        DeepPotPT->>Prof: enableProfiler(cfg, activities)
        Prof-->>DeepPotPT: enabled
        Note right of DeepPotPT: profiler_enabled = true\nprofiler_file set (maybe _gpuN)
    else
        Note right of DeepPotPT: profiler not enabled
    end
    App-->>DeepPotPT: run inference / model calls
Loading
sequenceDiagram
    autonumber
    participant App
    participant DeepPotPT
    participant Prof as Kineto Profiler
    participant FS as Filesystem

    App->>DeepPotPT: destructor()
    alt profiler_enabled
        DeepPotPT->>Prof: disableProfiler()
        Prof-->>DeepPotPT: trace result
        DeepPotPT->>FS: write `profiler_file` (.json)
        FS-->>DeepPotPT: write complete
    else
        Note over DeepPotPT: No profiler active
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (5 passed)

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The current title "feat: add PyTorch profiler support to LAMMPS MD" succinctly and accurately summarizes the primary change: adding PyTorch/Kineto profiling support to the C++ DeepPotPT/LAMMPS MD backend controlled via an environment variable and documented in doc/env.md. It is concise, specific to the main change, and not misleading.
Linked Issues Check ✅ Passed The PR implements the linked issue #4431 requirements by adding a C++ PyTorch profiler wired into DeepPotPT, making activation controllable via the DP_PROFILER environment variable, implementing profiling via the C++ PyTorch profiling API (prepareProfiler/enableProfiler/disableProfiler and ProfilerConfig/ExperimentalConfig), and adding documentation in doc/env.md. The code-level changes align with the issue's objectives to provide a profiler for the C++ backend and environment-variable control. Therefore the linked-issue coding requirements are satisfied.
Out of Scope Changes Check ✅ Passed Changes are confined to DeepPotPT.h, DeepPotPT.cc, and doc/env.md and directly support adding profiler functionality; no unrelated modules or public API signatures were modified. There are no detected edits that fall outside the scope defined by the PR objectives and linked issue. Thus no out-of-scope changes were found.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
source/api_cc/src/DeepPotPT.cc (3)

5-5: Use public profiler header and add required standard headers

Avoid internal headers; they change across PyTorch versions. Also add missing standard headers used below.

Apply this diff around includes:

-#include <torch/csrc/autograd/profiler.h>
+#include <torch/autograd/profiler.h>   // public C++ API
+#include <set>                         // std::set for activities
+#include <cstdlib>                     // std::getenv
+#include <filesystem>                  // path handling
+#ifdef _WIN32
+#  include <process.h>
+#  define getpid _getpid
+#else
+#  include <unistd.h>                  // getpid
+#endif

If you want to make profiler code optional at build time, gate the include and related code with a CMake-defined macro (e.g., DEEPMD_ENABLE_PYTORCH_PROFILER). This aligns with the earlier nit about conditional includes.

#ifdef DEEPMD_ENABLE_PYTORCH_PROFILER
#include <torch/autograd/profiler.h>
#endif

To confirm the correct public header for RecordProfile for your libtorch version, please check:

Which header exposes torch::autograd::profiler::RecordProfile in the PyTorch C++ API (v2.x)? Is <torch/autograd/profiler.h> the supported public include?

87-118: Switch to RecordProfile (RAII), append PID to filename, and harden path handling

This block uses impl:: types and manual enable/disable, which is fragile across libtorch releases and deviates from the issue’s ask to use RecordProfile. RAII also fixes leaks if init() throws after enabling. Additionally, append a PID suffix to avoid collisions across MPI ranks/processes and ensure parent dirs exist. Only add “.json” if missing.

Replace the whole block with:

-  // Configure PyTorch profiler
-  const char* env_profiler = std::getenv("DP_PROFILER");
-  if (env_profiler && *env_profiler) {
-    using torch::profiler::impl::ActivityType;
-    using torch::profiler::impl::ExperimentalConfig;
-    using torch::profiler::impl::ProfilerConfig;
-    using torch::profiler::impl::ProfilerState;
-    std::set<ActivityType> activities{ActivityType::CPU};
-    if (gpu_enabled) {
-      activities.insert(ActivityType::CUDA);
-    }
-    profiler_file = std::string(env_profiler);
-    if (gpu_enabled) {
-      profiler_file += "_gpu" + std::to_string(gpu_id);
-    }
-    profiler_file += ".json";
-    ExperimentalConfig exp_cfg;
-    ProfilerConfig cfg(ProfilerState::KINETO,
-                       false,  // report_input_shapes,
-                       false,  // profile_memory,
-                       true,   // with_stack,
-                       false,  // with_flops,
-                       true,   // with_modules,
-                       exp_cfg,
-                       std::string()  // trace_id
-    );
-    torch::autograd::profiler::prepareProfiler(cfg, activities);
-    torch::autograd::profiler::enableProfiler(cfg, activities);
-    std::cout << "PyTorch profiler enabled, output file: " << profiler_file
-              << std::endl;
-    profiler_enabled = true;
-  }
+  // Configure PyTorch profiler (RAII)
+  const char* env_profiler = std::getenv("DP_PROFILER");
+  if (env_profiler && *env_profiler) {
+    profiler_file = std::string(env_profiler);
+    // Avoid double ".json"
+    if (!std::filesystem::path(profiler_file).has_extension()) {
+      profiler_file += ".json";
+    }
+    // Add disambiguators: GPU id (if any) and PID
+    {
+      std::string stem = std::filesystem::path(profiler_file).stem().string();
+      std::string ext  = std::filesystem::path(profiler_file).extension().string();
+      if (gpu_enabled) stem += "_gpu" + std::to_string(gpu_id);
+      stem += "_pid" + std::to_string(static_cast<long long>(getpid()));
+      profiler_file = (std::filesystem::path(profiler_file).parent_path() /
+                       (stem + ext)).string();
+    }
+    // Best-effort create parent dirs
+    std::error_code ec;
+    auto parent = std::filesystem::path(profiler_file).parent_path();
+    if (!parent.empty()) std::filesystem::create_directories(parent, ec);
+    // Optional: very light path sanity to avoid surprises
+    // (keeps absolute paths, but normalizes .. segments)
+    profiler_file = std::filesystem::weakly_canonical(std::filesystem::path(profiler_file), ec).string();
+    // Engage RAII guard (saves to profiler_file on destruction)
+    profiler_guard = std::make_unique<torch::autograd::profiler::RecordProfile>(profiler_file);
+    std::cout << "PyTorch profiler enabled, output file: " << profiler_file << std::endl;
+  }

If you still need advanced knobs (shapes, modules), consider verifying the supported C++ API for your libtorch version and toggling them behind macros; avoid torch::profiler::impl where possible.

PyTorch C++ profiler: how to enable Kineto tracing with RecordProfile (RAII) and configure stack/modules in v2.x without touching torch::profiler::impl?

Note: Building on a prior comment about path safety, normalizing and restricting to intended dirs mitigates accidental directory traversals from env input.


153-162: Destructor must not throw; prefer relying on RAII or, at minimum, wrap in noexcept-safe try/catch

As previously noted, profiler operations can throw; throwing from a destructor during stack unwinding will call std::terminate. If you adopt RecordProfile above, you can delete this block entirely; otherwise, wrap it.

Option A (preferred, with RAII): delete the body.

-DeepPotPT::~DeepPotPT() {
-  if (profiler_enabled) {
-    auto result = torch::autograd::profiler::disableProfiler();
-    if (result) {
-      result->save(profiler_file);
-    }
-    std::cout << "PyTorch profiler result saved to " << profiler_file
-              << std::endl;
-  }
-}
+DeepPotPT::~DeepPotPT() = default;

Option B (keep manual control): add hard try/catch and silence all exceptions.

 DeepPotPT::~DeepPotPT() {
-  if (profiler_enabled) {
-    auto result = torch::autograd::profiler::disableProfiler();
-    if (result) {
-      result->save(profiler_file);
-    }
-    std::cout << "PyTorch profiler result saved to " << profiler_file
-              << std::endl;
-  }
+  try {
+    if (profiler_enabled) {
+      auto result = torch::autograd::profiler::disableProfiler();
+      if (result) result->save(profiler_file);
+      std::cout << "PyTorch profiler result saved to " << profiler_file << std::endl;
+    }
+  } catch (const std::exception& e) {
+    std::cerr << "Profiler teardown error: " << e.what() << std::endl;
+  } catch (...) {
+    std::cerr << "Profiler teardown: unknown error" << std::endl;
+  }
 }
🧹 Nitpick comments (1)
source/api_cc/src/DeepPotPT.cc (1)

73-76: Construct the device after capability check to avoid kCUDA construction when CUDA is unavailable

Minor cleanup; avoids creating a CUDA device object when not needed.

-  torch::Device device(torch::kCUDA, gpu_id);
-  if (!gpu_enabled) {
-    device = torch::Device(torch::kCPU);
+  torch::Device device = gpu_enabled ? torch::Device(torch::kCUDA, gpu_id)
+                                     : torch::Device(torch::kCPU);
+  if (!gpu_enabled) {
     std::cout << "load model from: " << model << " to cpu " << std::endl;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef5959f and 25a36d1.

📒 Files selected for processing (2)
  • source/api_cc/include/DeepPotPT.h (1 hunks)
  • source/api_cc/src/DeepPotPT.cc (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
source/api_cc/src/DeepPotPT.cc (1)
source/api_cc/include/DeepPotPT.h (1)
  • DeepPotPT (13-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)

@codecov
Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.20%. Comparing base (db22802) to head (7991d86).
⚠️ Report is 82 commits behind head on devel.

Files with missing lines Patch % Lines
source/api_cc/src/DeepPotPT.cc 13.04% 19 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4969      +/-   ##
==========================================
- Coverage   84.29%   84.20%   -0.10%     
==========================================
  Files         704      705       +1     
  Lines       68907    69228     +321     
  Branches     3572     3576       +4     
==========================================
+ Hits        58085    58291     +206     
- Misses       9681     9798     +117     
+ Partials     1141     1139       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation should be added to env.md.

@caic99
Copy link
Member Author

caic99 commented Sep 10, 2025

Documentation should be added to env.md.

@codex address this feedback.

@chatgpt-codex-connector
Copy link

To use Codex here, create an environment for this repo.

@github-actions github-actions bot added the Docs label Sep 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
doc/env.md (5)

93-100: Clarify scope and path semantics of DP_PROFILER.

Add two short notes so users know where the file lands and which backend this affects.

Apply this diff:

 ::: {envvar} DP_PROFILER
@@
-{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend.
+{{ pytorch_icon }} Enable the built-in PyTorch Kineto profiler for the PyTorch C++ (inference) backend (DeepPotPT only).
@@
-**Type**: string (output file stem)
+**Type**: string (output file stem or path prefix)
@@
-**Default**: unset (disabled)
+**Default**: unset (disabled)
+
+If a relative path is given, the trace is written relative to the current working directory of the process; absolute paths are supported.

101-105: Prevent filename collisions; define GPU ID semantics.

On CPU multi-process runs all ranks would target the same <ENV_VALUE>.json. Also clarify that the GPU ID is the local CUDA ordinal after CUDA_VISIBLE_DEVICES.

Apply this diff:

 The final file name is constructed as:
 
-- `<ENV_VALUE>_gpu<ID>.json` if running on GPU
-`- `<ENV_VALUE>.json` if running on CPU
+- `<ENV_VALUE>_gpu<ID>.json` if running on GPU (where `<ID>` is the local CUDA ordinal after `CUDA_VISIBLE_DEVICES`)
+- `<ENV_VALUE>.json` if running on CPU
+
+Note: On CPU multi-process runs, all ranks would otherwise write the same filename. Ensure each process uses a unique stem (or path) to avoid clobbering.

106-110: Add a link to PyTorch profiler docs.

Small cross-reference helps readers interpret traces.

Apply this diff:

-The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing). It includes:
+The trace can be examined with [Chrome trace viewer](https://ui.perfetto.dev/) (alternatively chrome://tracing) and tooling in [PyTorch profiler docs](https://pytorch.org/docs/stable/profiler.html). It includes:

113-117: Show per-rank filename example (CPU and GPU).

Provide a concrete pattern to avoid clobbering in multi-process runs.

Apply this diff:

 ```bash
 export DP_PROFILER=result
 mpirun -np 4 lmp -in in.lammps
 # Produces result_gpuX.json, where X is the GPU id used by each MPI rank.
+
+# Example (OpenMPI) to ensure unique files per rank:
+# GPU:
+#   export DP_PROFILER="result_rank${OMPI_COMM_WORLD_RANK}"
+#   mpirun -np 4 lmp -in in.lammps
+#   # Produces result_rank0_gpuX.json, result_rank1_gpuY.json, ...
+# CPU:
+#   export DP_PROFILER="cpu_result_rank${OMPI_COMM_WORLD_RANK}"
+#   mpirun -np 4 lmp -in in.lammps
+#   # Produces cpu_result_rank0.json, cpu_result_rank1.json, ...

---

`119-123`: **Tighten wording; add explicit CPU multi-process limitation.**

Minor grammar fix and explicit limitation avoids surprises.

Apply this diff:

```diff
-Tips:
+Tips:
@@
-- Large runs can generate sizable JSON files; consider limiting numbers of MD steps, like 20.
-- Currently this feature only supports single process, or multi-process runs where each process uses a distinct GPU on the same node.
+- Large runs can generate sizable JSON files; consider limiting the number of MD steps (e.g., ~20).
+- This feature currently supports: (a) single-process runs; or (b) multi-process runs where each process uses a distinct GPU on the same node. CPU multi-process runs require unique `DP_PROFILER` stems per process to avoid file overwrites.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdf254c and 7991d86.

📒 Files selected for processing (1)
  • doc/env.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (2)
doc/env.md (2)

86-92: DP_PLUGIN_PATH block looks consistent now.

The layout matches other envvar blocks. No further action needed.


93-117: Overall: nice addition and matches the implementation.

Content is accurate and aligned with the C++ changes. With the above clarifications, users should avoid file clobbering and understand where traces are written.

Merged via the queue into deepmodeling:devel with commit 6d04648 Sep 15, 2025
60 checks passed
@caic99 caic99 deleted the kineto branch September 15, 2025 10:36
ChiahsinChu pushed a commit to ChiahsinChu/deepmd-kit that referenced this pull request Dec 17, 2025
This pull request adds support for enabling the PyTorch profiler in the
`DeepPotPT` backend via an environment variable, making it easier to
profile and debug performance issues.
The profiler can be activated by setting the `DP_PROFILER` environment
variable, and will save profiling results to a JSON file when the object
is destroyed.

Fixes deepmodeling#4431 


**Profiler integration:**

* Added a `profiler_enabled` flag and `profiler_file` string to the
`DeepPotPT` class to track profiler state and output location
(`DeepPotPT.h`).
* In the `init` method, added logic to check for the `DP_PROFILER`
environment variable. If set, the PyTorch profiler is configured and
enabled, with output written to a file named according to the
environment variable and GPU ID (`DeepPotPT.cc`).
* On destruction of `DeepPotPT`, if profiling was enabled, the profiler
is properly disabled and results are saved to the specified file
(`DeepPotPT.cc`).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Optional runtime profiling via DP_PROFILER environment variable.
Produces per-process JSON traces (CPU-only or CPU+GPU) and saves them
automatically on shutdown; console messages report profiler status and
output filename.

* **Refactor**
* Improved device selection: auto-detects CUDA, selects a GPU per
process when available and falls back to CPU; clearer startup
diagnostics with no public API changes.

* **Documentation**
* Added DP_PROFILER usage, filename rules, examples, and operational
tips to the C++ interface docs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Chun Cai <amoycaic@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jinzhe Zeng <njzjz@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] C++ PyTorch profiler

2 participants