Skip to content

feat: Validate PEFT target modules#1747

Merged
cuichenx merged 6 commits into
mainfrom
bugfix/lora-target-validation
Apr 28, 2026
Merged

feat: Validate PEFT target modules#1747
cuichenx merged 6 commits into
mainfrom
bugfix/lora-target-validation

Conversation

@yaoyu-33

@yaoyu-33 yaoyu-33 commented Dec 16, 2025

Copy link
Copy Markdown
Contributor

Summary

  • add alias tracking in ModuleMatcher so mis-typed target_modules raise clearer errors
  • propagate alias registration to CanonicalLoRA/DoRA and validate in PEFT base call
  • add regression test ensuring LoRA raises when a target module is missing

Summary by CodeRabbit

  • Bug Fixes

    • PEFT configurations now validate that all specified target modules exist in the model, raising an error with clear feedback when requested targets are missing.
  • Tests

    • Added validation test for missing target module handling in LoRA configurations.

@copy-pr-bot

copy-pr-bot Bot commented Dec 16, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 changed the title Validate PEFT target modules feat: Validate PEFT target modules Dec 16, 2025
@HollowMan6

Copy link
Copy Markdown
Member

Maybe this can be coordinated with #1799 so that Canonical LoRA also supports targeting at MLA linear layers.

yaoyu-33 added 2 commits April 2, 2026 16:48
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 force-pushed the bugfix/lora-target-validation branch from 7f2fe2a to 5ec088e Compare April 2, 2026 23:49
@yaoyu-33

yaoyu-33 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 5ec088e

@coderabbitai

coderabbitai Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request introduces a validation mechanism for PEFT target module matching. The system now tracks which target modules successfully match during model traversal and validates that all requested targets are satisfied, raising an error if any fail to match.

Changes

Cohort / File(s) Summary
Module Matching Validation Infrastructure
src/megatron/bridge/peft/module_matcher.py, src/megatron/bridge/peft/base.py
Added internal tracking fields and helper methods to ModuleMatcher for recording successful target matches and validating all requested targets are matched. Updated PEFT.__call__() to reset match state before traversal and validate matches after transformation.
PEFT Subclass Initialization
src/megatron/bridge/peft/canonical_lora.py, src/megatron/bridge/peft/dora.py
Added super().__post_init__() calls to both CanonicalLoRA and DoRA to invoke parent-class initialization. Refactored CanonicalLoRA canonical mapping logic from suffix-based to substring-based component detection and introduced alias registration via register_target_alias().
Test Coverage
tests/unit_tests/peft/test_lora.py
Added test_lora_raises_when_target_missing test case to verify that LoRA raises ValueError when a target module name does not exist in the model.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR adds major PEFT validation changes but lacks documented test execution results, test output, or evidence that new tests pass. Include test execution results showing the new test passes and existing tests have no regressions.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Validate PEFT target modules' directly reflects the main change: adding validation for PEFT target modules with alias tracking and error reporting.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/lora-target-validation

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/megatron/bridge/peft/base.py (1)

99-120: ⚠️ Potential issue | 🟠 Major

Preflight target validation before mutating the model.

Running _validate_target_matches() only after self._walk_model(model, self.transform) makes the error path unsafe: a config like ["linear_qkv", "typo"] will already have frozen/wrapped the valid targets before the ValueError is raised. It also breaks reapplying matcher-based PEFT configs, because CanonicalLoRA.transform() and DoRA.transform() both return early for already wrapped modules before calling match(), so the second pass records no hits and fails validation. A read-only match walk before freeze_model() / self.transform avoids both behaviors.

💡 Suggested approach
     def __call__(self, model: ModelType, training: bool = True) -> ModelType:
         if isinstance(self, ModuleMatcher):
             self._reset_target_match_state()
+
+            def _validate_only(
+                module: nn.Module, name: Optional[str] = None, prefix: Optional[str] = None
+            ) -> nn.Module:
+                self.match(module, name, prefix)
+                return module
+
+            self._walk_model(model, _validate_only)
+            self._validate_target_matches()
+            self._reset_target_match_state()
 
         self.freeze_model(model, training=training)
         self._walk_model(model, self.transform)
@@
-        if isinstance(self, ModuleMatcher):
-            self._validate_target_matches()
-
         return model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/peft/base.py` around lines 99 - 120, Run a read-only
preflight match pass for ModuleMatcher before mutating the model: if
isinstance(self, ModuleMatcher) call a non-mutating walk that invokes the
matcher (i.e., calls match() via the same walk logic) and then call
self._validate_target_matches() immediately after that preflight; only after
validation proceed to call self.freeze_model(model, training=training),
self._walk_model(model, self.transform) and the rest (including
maybe_enable_recompute_inputs_grad and setting training modes). This ensures
CanonicalLoRA.transform and DoRA.transform won’t short-circuit earlier matches
and prevents partial mutation before validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/peft/canonical_lora.py`:
- Around line 273-293: The code is doing substring checks and replace() across
the entire target string which can rewrite earlier path segments; instead split
the target into its leaf token (e.g., leaf = target.rsplit(".", 1)[-1] or use
the appropriate separator), perform the canonicalization on that leaf only to
set canonical_component and canonical_leaf (e.g., replace "linear_q" ->
"linear_qkv" on the leaf), then reconstruct canonical_target by joining the
original prefix (if any) with the canonical_leaf; finally call
register_target_alias(original_target, canonical_target) and update
self.canonical_mapping[canonical_target].add(canonical_component) so wildcard
patterns and parent path segments are preserved.

---

Outside diff comments:
In `@src/megatron/bridge/peft/base.py`:
- Around line 99-120: Run a read-only preflight match pass for ModuleMatcher
before mutating the model: if isinstance(self, ModuleMatcher) call a
non-mutating walk that invokes the matcher (i.e., calls match() via the same
walk logic) and then call self._validate_target_matches() immediately after that
preflight; only after validation proceed to call self.freeze_model(model,
training=training), self._walk_model(model, self.transform) and the rest
(including maybe_enable_recompute_inputs_grad and setting training modes). This
ensures CanonicalLoRA.transform and DoRA.transform won’t short-circuit earlier
matches and prevents partial mutation before validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b11d6812-d386-4063-a823-e2132ce8a3f6

📥 Commits

Reviewing files that changed from the base of the PR and between 03c137d and 5ec088e.

📒 Files selected for processing (5)
  • src/megatron/bridge/peft/base.py
  • src/megatron/bridge/peft/canonical_lora.py
  • src/megatron/bridge/peft/dora.py
  • src/megatron/bridge/peft/module_matcher.py
  • tests/unit_tests/peft/test_lora.py

Comment thread src/megatron/bridge/peft/canonical_lora.py
yaoyu-33 and others added 2 commits April 16, 2026 12:25
Move target-module validation to a read-only preflight pass that runs
before freeze_model/transform. This fixes two issues:

1. Idempotent re-application: transform() returns early for
   already-wrapped modules without calling match(), so post-transform
   validation saw zero matches and raised ValueError on second pass.

2. Partial mutation on error: the previous flow froze and wrapped valid
   targets before discovering unmatched ones, leaving the model in a
   half-transformed state.

The preflight walk calls match() on every module (name-based, type-
independent) so it correctly records hits even for wrapped modules.

Also fixes test_canonical_lora_training_vs_inference_mode which used
default CanonicalLoRA targets on SimpleModel (individual layers, not
fused linear_qkv/linear_fc1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/claude review

@yaoyu-33 yaoyu-33 added the needs-more-tests Requires additional L0 and L1 test coverage before merge label Apr 16, 2026
Comment thread src/megatron/bridge/peft/canonical_lora.py Outdated
Comment thread tests/unit_tests/peft/test_lora.py
yaoyu-33 and others added 2 commits April 16, 2026 14:02
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Add regression test ensuring CanonicalLoRA raises ValueError when a
target module is not found in the model, covering the alias→pattern→match
chain end-to-end (complements the existing LoRA equivalent).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test d666375

@cuichenx cuichenx merged commit 1396c27 into main Apr 28, 2026
74 checks passed
@cuichenx cuichenx deleted the bugfix/lora-target-validation branch April 28, 2026 00:54
ko3n1g pushed a commit that referenced this pull request Apr 29, 2026
Signed-off-by: Chen Cui <chcui@nvidia.com>
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
…A-NeMo#3583)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-more-tests Requires additional L0 and L1 test coverage before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants