Skip to content

feat(gds): add gds_path_sharding config for multi-path strategy#2922

Merged
deng451e merged 1 commit intoLMCache:devfrom
glimchb:gds_path_sharding
Apr 4, 2026
Merged

feat(gds): add gds_path_sharding config for multi-path strategy#2922
deng451e merged 1 commit intoLMCache:devfrom
glimchb:gds_path_sharding

Conversation

@glimchb
Copy link
Copy Markdown
Contributor

@glimchb glimchb commented Mar 31, 2026

What this PR does / why we need it:

Add a top-level gds_path_sharding config field (default: "by_gpu")
that controls how GPUs are assigned to storage paths when multiple
comma-separated paths are provided in gds_path. This replaces the
previously hardcoded by_gpu logic with an explicit, extensible setting.

Currently only "by_gpu" is supported (selects path via
device_id % num_paths); unsupported values raise AssertionError.

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Medium Risk
Adds a new top-level config/env var that affects GDS path selection and introduces runtime validation that will assert on unsupported values, potentially breaking misconfigured deployments.

Overview
Makes multi-path GDS selection explicitly configurable via new gds_path_sharding (env: LMCACHE_GDS_PATH_SHARDING) with default "by_gpu", replacing previously hardcoded GPU-ID modulo behavior with a validated strategy.

Updates GDS backend initialization to read/validate the sharding strategy (currently only by_gpu supported) and extends unit tests and docs to cover the default, explicit setting, and failure on unsupported values.

Written by Cursor Bugbot for commit 2a90190. This will update automatically on new commits. Configure here.

@glimchb glimchb force-pushed the gds_path_sharding branch from e68a869 to 3f09875 Compare March 31, 2026 15:47
@glimchb glimchb changed the title Gds path sharding feat(gds): add gds_path_sharding config for multi-path strategy Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the GDS (GPU Direct Storage) configuration by renaming cufile_buffer_size to gds_buffer_size and introducing explicit fields for use_gds, gds_backend, and gds_path_sharding. These changes are reflected across the documentation, configuration schema, and storage backend implementation. Feedback was provided regarding a violation of the style guide's rule against cross-class private member access when checking if a configuration key was explicitly set by the user.

Comment on lines +250 to +251
user_set_keys: set[str] = getattr(config, "_user_set_keys", set())
use_gds_explicitly_set = "use_gds" in user_set_keys
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Accessing the _user_set_keys attribute directly violates the style guide's rule against accessing private members (_-prefixed attributes) across class boundaries. This creates a tight coupling between GdsBackend and the internal implementation of LMCacheEngineConfig, making the code less maintainable.

A better approach would be to add a public method to LMCacheEngineConfig to encapsulate this logic. This can be done by defining a helper function and adding it to the namespace_extras dictionary in create_config_class in lmcache/v1/config.py:

# In lmcache/v1/config.py

def _is_user_set(self, key: str) -> bool:
    """Check if a configuration key was explicitly set by the user."""
    return key in self._user_set_keys

# ... later, in create_config_class call:
LMCacheEngineConfig = create_config_class(
    # ...
    namespace_extras={
        "is_user_set": _is_user_set,
        # ... other methods
    },
)

Then, you could use it here like this:

Suggested change
user_set_keys: set[str] = getattr(config, "_user_set_keys", set())
use_gds_explicitly_set = "use_gds" in user_set_keys
use_gds_explicitly_set = config.is_user_set("use_gds")
References
  1. The code accesses a private attribute _user_set_keys of the LMCacheEngineConfig object from within the GdsBackend class. This violates the style guide rule on line 27: 'No private member access (_-prefixed attributes) across class boundaries'. (link)

@glimchb glimchb force-pushed the gds_path_sharding branch from 3f09875 to 3434b0a Compare April 2, 2026 13:50
@glimchb glimchb marked this pull request as ready for review April 2, 2026 13:50
Add a top-level `gds_path_sharding` config field (default: "by_gpu")
that controls how GPUs are assigned to storage paths when multiple
comma-separated paths are provided in `gds_path`. This replaces the
previously hardcoded by_gpu logic with an explicit, extensible setting.

Currently only "by_gpu" is supported (selects path via
`device_id % num_paths`); unsupported values raise AssertionError.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com>
@glimchb glimchb force-pushed the gds_path_sharding branch from 3434b0a to 2a90190 Compare April 2, 2026 13:55
@glimchb glimchb closed this Apr 2, 2026
@glimchb glimchb reopened this Apr 2, 2026
@sammshen
Copy link
Copy Markdown
Contributor

sammshen commented Apr 3, 2026

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its basically same PR like #2418.
@glimchb Could you left some performance number here for the reference to others?
otherwise LGTM.

@glimchb
Copy link
Copy Markdown
Contributor Author

glimchb commented Apr 3, 2026

its basically same PR like #2418. @glimchb Could you left some performance number here for the reference to others? otherwise LGTM.

sure @DongDongJu. The performance numbers already committed in the docs/source/kv_cache/storage_backends/gds.rst, see:

**Why this helps:** a single PCIe Gen 4 x4 NVMe tops out at ~7 GB/s. With four
drives the aggregate bandwidth can reach ~28 GB/s, matching what multi-GPU
systems need for KV cache eviction and prefetch.

Copy link
Copy Markdown
Collaborator

@deng451e deng451e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deng451e deng451e added the full Run comprehensive tests on this PR label Apr 3, 2026
@deng451e deng451e merged commit 8baa42d into LMCache:dev Apr 4, 2026
70 of 72 checks passed
@glimchb glimchb deleted the gds_path_sharding branch April 4, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants