feat: Validate PEFT target modules by yaoyu-33 · Pull Request #1747 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2025-12-16T19:54:28Z

Summary

add alias tracking in ModuleMatcher so mis-typed target_modules raise clearer errors
propagate alias registration to CanonicalLoRA/DoRA and validate in PEFT base call
add regression test ensuring LoRA raises when a target module is missing

Summary by CodeRabbit

Bug Fixes
- PEFT configurations now validate that all specified target modules exist in the model, raising an error with clear feedback when requested targets are missing.
Tests
- Added validation test for missing target module handling in LoRA configurations.

copy-pr-bot · 2025-12-16T19:54:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HollowMan6 · 2025-12-23T19:41:26Z

Maybe this can be coordinated with #1799 so that Canonical LoRA also supports targeting at MLA linear layers.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-02T23:49:09Z

/ok to test 5ec088e

coderabbitai · 2026-04-02T23:59:16Z

📝 Walkthrough

Walkthrough

This pull request introduces a validation mechanism for PEFT target module matching. The system now tracks which target modules successfully match during model traversal and validates that all requested targets are satisfied, raising an error if any fail to match.

Changes

Cohort / File(s)	Summary
Module Matching Validation Infrastructure `src/megatron/bridge/peft/module_matcher.py`, `src/megatron/bridge/peft/base.py`	Added internal tracking fields and helper methods to `ModuleMatcher` for recording successful target matches and validating all requested targets are matched. Updated `PEFT.__call__()` to reset match state before traversal and validate matches after transformation.
PEFT Subclass Initialization `src/megatron/bridge/peft/canonical_lora.py`, `src/megatron/bridge/peft/dora.py`	Added `super().__post_init__()` calls to both `CanonicalLoRA` and `DoRA` to invoke parent-class initialization. Refactored `CanonicalLoRA` canonical mapping logic from suffix-based to substring-based component detection and introduced alias registration via `register_target_alias()`.
Test Coverage `tests/unit_tests/peft/test_lora.py`	Added `test_lora_raises_when_target_missing` test case to verify that `LoRA` raises `ValueError` when a target module name does not exist in the model.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR adds major PEFT validation changes but lacks documented test execution results, test output, or evidence that new tests pass.	Include test execution results showing the new test passes and existing tests have no regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Validate PEFT target modules' directly reflects the main change: adding validation for PEFT target modules with alias tracking and error reporting.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bugfix/lora-target-validation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/megatron/bridge/peft/base.py (1)

99-120: ⚠️ Potential issue | 🟠 Major

Preflight target validation before mutating the model.

Running _validate_target_matches() only after self._walk_model(model, self.transform) makes the error path unsafe: a config like ["linear_qkv", "typo"] will already have frozen/wrapped the valid targets before the ValueError is raised. It also breaks reapplying matcher-based PEFT configs, because CanonicalLoRA.transform() and DoRA.transform() both return early for already wrapped modules before calling match(), so the second pass records no hits and fails validation. A read-only match walk before freeze_model() / self.transform avoids both behaviors.

💡 Suggested approach

     def __call__(self, model: ModelType, training: bool = True) -> ModelType:
         if isinstance(self, ModuleMatcher):
             self._reset_target_match_state()
+
+            def _validate_only(
+                module: nn.Module, name: Optional[str] = None, prefix: Optional[str] = None
+            ) -> nn.Module:
+                self.match(module, name, prefix)
+                return module
+
+            self._walk_model(model, _validate_only)
+            self._validate_target_matches()
+            self._reset_target_match_state()
 
         self.freeze_model(model, training=training)
         self._walk_model(model, self.transform)
@@
-        if isinstance(self, ModuleMatcher):
-            self._validate_target_matches()
-
         return model

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/peft/base.py` around lines 99 - 120, Run a read-only
preflight match pass for ModuleMatcher before mutating the model: if
isinstance(self, ModuleMatcher) call a non-mutating walk that invokes the
matcher (i.e., calls match() via the same walk logic) and then call
self._validate_target_matches() immediately after that preflight; only after
validation proceed to call self.freeze_model(model, training=training),
self._walk_model(model, self.transform) and the rest (including
maybe_enable_recompute_inputs_grad and setting training modes). This ensures
CanonicalLoRA.transform and DoRA.transform won’t short-circuit earlier matches
and prevents partial mutation before validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/peft/canonical_lora.py`:
- Around line 273-293: The code is doing substring checks and replace() across
the entire target string which can rewrite earlier path segments; instead split
the target into its leaf token (e.g., leaf = target.rsplit(".", 1)[-1] or use
the appropriate separator), perform the canonicalization on that leaf only to
set canonical_component and canonical_leaf (e.g., replace "linear_q" ->
"linear_qkv" on the leaf), then reconstruct canonical_target by joining the
original prefix (if any) with the canonical_leaf; finally call
register_target_alias(original_target, canonical_target) and update
self.canonical_mapping[canonical_target].add(canonical_component) so wildcard
patterns and parent path segments are preserved.

---

Outside diff comments:
In `@src/megatron/bridge/peft/base.py`:
- Around line 99-120: Run a read-only preflight match pass for ModuleMatcher
before mutating the model: if isinstance(self, ModuleMatcher) call a
non-mutating walk that invokes the matcher (i.e., calls match() via the same
walk logic) and then call self._validate_target_matches() immediately after that
preflight; only after validation proceed to call self.freeze_model(model,
training=training), self._walk_model(model, self.transform) and the rest
(including maybe_enable_recompute_inputs_grad and setting training modes). This
ensures CanonicalLoRA.transform and DoRA.transform won’t short-circuit earlier
matches and prevents partial mutation before validation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b11d6812-d386-4063-a823-e2132ce8a3f6

📥 Commits

Reviewing files that changed from the base of the PR and between 03c137d and 5ec088e.

📒 Files selected for processing (5)

src/megatron/bridge/peft/base.py
src/megatron/bridge/peft/canonical_lora.py
src/megatron/bridge/peft/dora.py
src/megatron/bridge/peft/module_matcher.py
tests/unit_tests/peft/test_lora.py

Move target-module validation to a read-only preflight pass that runs before freeze_model/transform. This fixes two issues: 1. Idempotent re-application: transform() returns early for already-wrapped modules without calling match(), so post-transform validation saw zero matches and raised ValueError on second pass. 2. Partial mutation on error: the previous flow froze and wrapped valid targets before discovering unmatched ones, leaving the model in a half-transformed state. The preflight walk calls match() on every module (name-based, type- independent) so it correctly records hits even for wrapped modules. Also fixes test_canonical_lora_training_vs_inference_mode which used default CanonicalLoRA targets on SimpleModel (individual layers, not fused linear_qkv/linear_fc1). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-16T20:59:31Z

/claude review

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

Add regression test ensuring CanonicalLoRA raises ValueError when a target module is not found in the model, covering the alias→pattern→match chain end-to-end (complements the existing LoRA equivalent). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-16T21:10:27Z

/ok to test d666375

Signed-off-by: Chen Cui <chcui@nvidia.com>

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>

…A-NeMo#3583) Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>

yaoyu-33 changed the title ~~Validate PEFT target modules~~ feat: Validate PEFT target modules Dec 16, 2025

yaoyu-33 mentioned this pull request Dec 17, 2025

[LoRA/PEFT] misspecified target_modules fails silently #1738

Closed

yaoyu-33 added 2 commits April 2, 2026 16:48

Validate PEFT target modules and add LoRA test

900a75c

lint

5ec088e

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 force-pushed the bugfix/lora-target-validation branch from 7f2fe2a to 5ec088e Compare April 2, 2026 23:49

copy-pr-bot Bot temporarily deployed to test April 2, 2026 23:49 Inactive

coderabbitai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/megatron/bridge/peft/canonical_lora.py

yaoyu-33 and others added 2 commits April 16, 2026 12:25

Merge branch 'main' into bugfix/lora-target-validation

5cc991e

yaoyu-33 added the needs-more-tests Requires additional L0 and L1 test coverage before merge label Apr 16, 2026

claude Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread src/megatron/bridge/peft/canonical_lora.py Outdated

claude Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit_tests/peft/test_lora.py

yaoyu-33 and others added 2 commits April 16, 2026 14:02

Update src/megatron/bridge/peft/canonical_lora.py

4400283

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to test April 16, 2026 21:11 Inactive

cuichenx approved these changes Apr 28, 2026

View reviewed changes

cuichenx merged commit 1396c27 into main Apr 28, 2026
74 checks passed

cuichenx deleted the bugfix/lora-target-validation branch April 28, 2026 00:54

cuichenx mentioned this pull request Apr 29, 2026

[peft] revert: Validate PEFT target modules (#1747) #3583

Merged

2 tasks

ko3n1g pushed a commit that referenced this pull request Apr 29, 2026

[peft] revert: Validate PEFT target modules (#1747) (#3583)

9adb642

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx mentioned this pull request Apr 29, 2026

[peft] feat: validate PEFT target modules with deferred init #3587

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Validate PEFT target modules#1747

feat: Validate PEFT target modules#1747
cuichenx merged 6 commits into
mainfrom
bugfix/lora-target-validation

yaoyu-33 commented Dec 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Dec 16, 2025

Uh oh!

HollowMan6 commented Dec 23, 2025

Uh oh!

yaoyu-33 commented Apr 2, 2026

Uh oh!

coderabbitai Bot commented Apr 2, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

yaoyu-33 commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yaoyu-33 commented Dec 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Dec 16, 2025

Uh oh!

HollowMan6 commented Dec 23, 2025

Uh oh!

yaoyu-33 commented Apr 2, 2026

Uh oh!

coderabbitai Bot commented Apr 2, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaoyu-33 commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaoyu-33 commented Dec 16, 2025 •

edited by coderabbitai Bot

Loading