Skip to content

Conversation

@anyangml
Copy link
Collaborator

@anyangml anyangml commented May 26, 2025

Currently, when passing a single str as the root data directory, the expand_sys_str function will automatically perform rglob to grab all systems. However, this depends on the structure of the data folder. There are scenarios where train/val folders are nested, i.e. "root/dataset_*/trn" & "root/dataset_*/val".

A customizable rglob function is needed to provide more flexibility when constructing datasets, and to remove unnecessarily long data lists in the input file.

Summary by CodeRabbit

  • New Features
    • Added support for specifying custom glob patterns to filter training and validation datasets, allowing more flexible and targeted data selection (PyTorch backend only).
    • Introduced recursive pattern matching to improve system directory selection based on user-defined criteria.
  • Tests
    • Added new test cases to validate the customized glob pattern functionality for training and validation datasets.

Copilot AI review requested due to automatic review settings May 26, 2025 05:32
@anyangml anyangml marked this pull request as draft May 26, 2025 05:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for customized recursive globbing to search system directories based on user-specified patterns.

  • Added a new function, rglob_sys_str, to recursively search directories using provided glob patterns.
  • Updated process_systems to accept an optional patterns parameter and modified the main entrypoint to pass custom rglob patterns.
  • Adjusted relevant imports and function calls to incorporate the new behavior.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
deepmd/utils/data_system.py Updated process_systems to accept custom patterns and invoke rglob_sys_str when provided.
deepmd/pt/entrypoints/main.py Modified training/validation system processing to pass rglob_patterns to process_systems.
deepmd/common.py Added rglob_sys_str to support customized recursive globbing based on given patterns.
Comments suppressed due to low confidence (1)

deepmd/utils/data_system.py:753

  • Consider clarifying in the docstring that the 'patterns' parameter only affects cases where 'systems' is a single directory string, as list inputs will bypass custom pattern filtering.
if isinstance(systems, str):

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 26, 2025

📝 Walkthrough

Walkthrough

A new function for recursive directory searching with glob patterns was added. The system processing logic was updated to optionally use this new function when patterns are provided. The main training entrypoint now retrieves and passes these patterns from dataset parameters, enabling pattern-based filtering for training and validation system selection. Argument checking functions were extended to accept these new pattern parameters. A new test class was added to verify the customized rglob pattern functionality.

Changes

File(s) Change Summary
deepmd/common.py Added rglob_sys_str to recursively search directories by multiple glob patterns and filter by file presence.
deepmd/utils/data_system.py Updated process_systems to accept optional patterns and use rglob_sys_str when patterns are provided; updated get_data accordingly.
deepmd/pt/entrypoints/main.py Modified to extract rglob_patterns from dataset params and pass them to process_systems for filtering.
deepmd/utils/argcheck.py Added optional rglob_patterns argument with PyTorch-specific docs to training_data_args and validation_data_args.
source/tests/pt/test_training.py Added TestCustomizedRGLOB class to test training and validation with customized rglob patterns.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MainEntrypoint
    participant DataSystem
    participant Common

    User->>MainEntrypoint: Initiate training/validation
    MainEntrypoint->>MainEntrypoint: Retrieve rglob_patterns from dataset params
    MainEntrypoint->>DataSystem: process_systems(systems, patterns)
    alt patterns provided
        DataSystem->>Common: rglob_sys_str(systems, patterns)
        Common-->>DataSystem: List of matching system paths
    else patterns not provided
        DataSystem->>Common: expand_sys_str(systems)
        Common-->>DataSystem: List of system paths
    end
    DataSystem-->>MainEntrypoint: Processed system list
    MainEntrypoint-->>User: Continue with training/validation
Loading

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (3.3.7)
source/tests/pt/test_training.py

No files to lint: exiting.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16fd12d and b8c5db6.

📒 Files selected for processing (1)
  • source/tests/pt/test_training.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt/test_training.py
⏰ Context from checks skipped due to timeout of 90000ms (22)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
deepmd/common.py (1)

210-231: ⚠️ Potential issue

Address type annotation inconsistency and implement deduplication.

Several issues identified in the new function:

  1. Type annotation mismatch: The docstring mentions root_dir : str, Path but the parameter type annotation only shows str.
  2. Missing deduplication: As noted in the past review comment, multiple patterns could match the same directory, leading to duplicate entries.
  3. No input validation: The function doesn't validate that root_dir exists or that patterns is not empty.

Apply this diff to fix the issues:

-def rglob_sys_str(root_dir: str, patterns: list[str]) -> list[str]:
+def rglob_sys_str(root_dir: Union[str, Path], patterns: list[str]) -> list[str]:
     """Recursively iterate over directories taking those that contain `type.raw` file.

     Parameters
     ----------
-    root_dir : str, Path
+    root_dir : Union[str, Path]
         starting directory
     patterns : list[str]
         list of glob patterns to match directories

     Returns
     -------
     list[str]
         list of string pointing to system directories
     """
+    if not patterns:
+        return []
     root_dir = Path(root_dir)
+    if not root_dir.exists():
+        raise FileNotFoundError(f"Root directory {root_dir} does not exist")
     matches = []
     for pattern in patterns:
         matches.extend(
             [str(d) for d in root_dir.rglob(pattern) if (d / "type.raw").is_file()]
         )
-    return matches
+    return list(set(matches))
🧹 Nitpick comments (2)
deepmd/pt/entrypoints/main.py (1)

117-121: Fix inconsistent parameter passing style.

There's an inconsistency in how patterns are passed to process_systems:

  • Line 118 uses keyword argument: patterns=trn_patterns
  • Line 121 uses positional argument: val_patterns

Apply this diff for consistency:

         if validation_systems is not None:
             val_patterns = validation_dataset_params.get("rglob_patterns", None)
-            validation_systems = process_systems(validation_systems, val_patterns)
+            validation_systems = process_systems(validation_systems, patterns=val_patterns)
deepmd/utils/data_system.py (1)

734-758: Well-implemented integration with new rglob functionality.

The implementation is excellent:

  1. Backward compatibility: The optional patterns parameter maintains backward compatibility
  2. Clear logic: The conditional branching between rglob_sys_str and expand_sys_str is intuitive
  3. Updated documentation: The docstring correctly reflects the new parameter

Consider adding input validation for the patterns parameter:

     if isinstance(systems, str):
         if patterns is None:
             systems = expand_sys_str(systems)
         else:
+            if not isinstance(patterns, list) or not patterns:
+                raise ValueError("patterns must be a non-empty list of strings")
             systems = rglob_sys_str(systems, patterns)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95ca4ad and b5c6be4.

📒 Files selected for processing (3)
  • deepmd/common.py (1 hunks)
  • deepmd/pt/entrypoints/main.py (1 hunks)
  • deepmd/utils/data_system.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
deepmd/utils/data_system.py (1)
deepmd/common.py (2)
  • rglob_sys_str (210-231)
  • expand_sys_str (190-207)
deepmd/common.py (1)
deepmd/utils/path.py (6)
  • rglob (94-107)
  • rglob (228-242)
  • rglob (395-409)
  • is_file (110-111)
  • is_file (244-246)
  • is_file (432-436)
⏰ Context from checks skipped due to timeout of 90000ms (27)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test C++ (false)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (c-cpp)
🔇 Additional comments (1)
deepmd/utils/data_system.py (1)

20-20: LGTM!

The import addition is correctly placed and necessary for the new functionality.

@codecov
Copy link

codecov bot commented May 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.80%. Comparing base (265d094) to head (b8c5db6).
⚠️ Report is 82 commits behind head on devel.

Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #4763   +/-   ##
=======================================
  Coverage   84.79%   84.80%           
=======================================
  Files         698      698           
  Lines       67775    67786   +11     
  Branches     3544     3542    -2     
=======================================
+ Hits        57472    57484   +12     
+ Misses       9169     9167    -2     
- Partials     1134     1135    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anyangml anyangml marked this pull request as ready for review May 26, 2025 07:09
@anyangml anyangml requested a review from Copilot May 26, 2025 08:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds customizable recursive globbing for dataset directories, enabling users to specify patterns when collecting training and validation systems.

  • Introduce rglob_sys_str in common.py for pattern-based discovery
  • Update process_systems to accept optional pattern list
  • Expose rglob_patterns in argument checks and pass through entrypoints

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
deepmd/common.py New rglob_sys_str function to match directories by glob list
deepmd/utils/data_system.py Updated process_systems to take patterns and call rglob
deepmd/utils/argcheck.py Added rglob_patterns Argument to training/validation schemas
deepmd/pt/entrypoints/main.py Pass rglob_patterns into process_systems for both datasets
Comments suppressed due to low confidence (5)

deepmd/common.py:210

  • The new function rglob_sys_str currently has no associated unit tests; consider adding tests to cover pattern matching and duplicate removal.
def rglob_sys_str(root_dir: str, patterns: list[str]) -> list[str]:

deepmd/utils/data_system.py:734

  • The signature uses Optional[list[str]] for patterns but Optional is not imported; please add from typing import Optional.
def process_systems(

deepmd/common.py:210

  • The parameter root_dir is annotated as str but docstring mentions Path; consider using Union[str, Path] for consistency.
def rglob_sys_str(root_dir: str, patterns: list[str]) -> list[str]:

deepmd/common.py:231

  • [nitpick] Converting to a set and back loses original ordering; consider deduplicating into a set and returning a sorted list for deterministic behavior.
    return list(set(matches))  # remove duplicates

deepmd/pt/entrypoints/main.py:121

  • [nitpick] Use a keyword argument (patterns=val_patterns) when calling process_systems to match the earlier call and improve readability.
            validation_systems = process_systems(validation_systems, val_patterns)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/utils/data_system.py (1)

788-789: Apply minor style improvement suggested by static analysis.

The functionality is correct, but there's a minor style improvement that can be applied.

-    rglob_patterns = jdata.get("rglob_patterns", None)
+    rglob_patterns = jdata.get("rglob_patterns")

Since dict.get() returns None by default when the key doesn't exist, explicitly passing None as the default value is redundant.

🧰 Tools
🪛 Ruff (0.11.9)

788-788: Use jdata.get("rglob_patterns") instead of jdata.get("rglob_patterns", None)

Replace jdata.get("rglob_patterns", None) with jdata.get("rglob_patterns")

(SIM910)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b70487 and 7c247d5.

📒 Files selected for processing (1)
  • deepmd/utils/data_system.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
deepmd/utils/data_system.py

788-788: Use jdata.get("rglob_patterns") instead of jdata.get("rglob_patterns", None)

Replace jdata.get("rglob_patterns", None) with jdata.get("rglob_patterns")

(SIM910)

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (2)
deepmd/utils/data_system.py (2)

734-761: LGTM: Well-designed function enhancement with backward compatibility.

The process_systems function enhancement correctly:

  • Maintains backward compatibility when patterns is None
  • Adds the new functionality when patterns are provided
  • Uses appropriate type hints and documentation
  • Follows a clear conditional logic flow

20-20:

Details

✅ Verification successful

LGTM: Import addition for new functionality.

The import of rglob_sys_str is correctly added to support the new customizable rglob functionality.

However, let me verify that this function exists in the expected location:


🏁 Script executed:

#!/bin/bash
# Description: Verify that rglob_sys_str function exists in deepmd.common
# Expected: Function definition should be found

ast-grep --pattern $'def rglob_sys_str($$$) {
  $$$
}'

Length of output: 55


To ensure we’re importing a valid symbol, let’s search the repository for its definition:


🏁 Script executed:

#!/bin/bash
# Search for the definition of rglob_sys_str in the codebase
rg -n "def rglob_sys_str" .

Length of output: 120


Import Verified: rglob_sys_str Is Defined
Confirmed that rglob_sys_str is implemented at line 210 in deepmd/common.py. The import in deepmd/utils/data_system.py is correct—no further changes needed. LGTM.

@anyangml anyangml requested a review from njzjz May 27, 2025 08:05
Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR lacks UT

@anyangml
Copy link
Collaborator Author

This PR lacks UT

What do you think would be the best way to handle the tests? A separate UT for the individual function or an end-to-end integration test in PT test training?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c247d5 and e621ce2.

📒 Files selected for processing (2)
  • deepmd/utils/argcheck.py (4 hunks)
  • source/tests/pt/test_training.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/utils/argcheck.py
⏰ Context from checks skipped due to timeout of 90000ms (22)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Analyze (c-cpp)

@anyangml anyangml requested review from Copilot and njzjz June 3, 2025 06:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for customized recursive globbing (rglob) to enable flexible dataset construction by allowing users to specify custom glob patterns for training and validation data directories.

  • Introduced new parameters ("rglob_patterns") in data processing functions and argument configurations.
  • Added a new helper function (rglob_sys_str) in the common module.
  • Updated the main entry and tests to utilize the customizable rglob functionality.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
source/tests/pt/test_training.py Added a test class for verifying customized rglob functionality.
deepmd/utils/data_system.py Updated process_systems to accept custom glob patterns for system search.
deepmd/utils/argcheck.py Added new argument definitions for rglob_patterns in training and validation.
deepmd/pt/entrypoints/main.py Modified system processing calls to include custom glob patterns.
deepmd/common.py Added the helper function rglob_sys_str for globbing with custom patterns.

@njzjz njzjz added this pull request to the merge queue Jun 3, 2025
Merged via the queue into deepmodeling:devel with commit 31373bc Jun 3, 2025
60 checks passed
@anyangml anyangml deleted the feat/support-customized-rglob branch December 23, 2025 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants