Add worker and policy hooks for cross-tokenizer distillation flow by avenkateshha · Pull Request #2 · avenkateshha/RL

avenkateshha · 2026-04-12T20:16:48Z

Extend LMPolicy and DTensorPolicyWorkerV2 with teacher-forward and cross-tokenizer state update paths while preserving the current worker architecture and IPC-based distillation flow.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Extend LMPolicy and DTensorPolicyWorkerV2 with teacher-forward and cross-tokenizer state update paths while preserving the current worker architecture and IPC-based distillation flow. Made-with: Cursor

github-actions · 2026-04-27T04:52:16Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2026-05-04T05:01:05Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

@RayenTian

PR NVIDIA-NeMo#2508 review (@RayenTian): - #2: Fold data["sample_mask"] into the gold-loss path's valid-chunk mask (chunk_mask & sample_mask.bool().unsqueeze(-1)) so samples with loss_multiplier=0 stop contributing to KL-on-common, L1-on-uncommon, top-1 accuracy, and the returned valid-count. Mirrors the P-KL path. - #3: Size both projection-matrix axes from the configured tokenizer vocabs (student + teacher), not max(observed_idx) + 1. CrossTokenizerDistillationLossConfig declares student_vocab_size and teacher_vocab_size; xtoken_distillation.setup() injects both at runtime from len(student_tokenizer) / len(teacher_tokenizer). get_sparse_projection_matrix now takes both as keyword-only args and clamps V_s / V_t up against the projection's observed maxima as a defensive fallback. Same-magnitude-int positional swap is guarded by the keyword-only signature. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

@RayenTian

PR NVIDIA-NeMo#2508 review (@RayenTian): - #2: Fold data["sample_mask"] into the gold-loss path's valid-chunk mask (chunk_mask & sample_mask.bool().unsqueeze(-1)) so samples with loss_multiplier=0 stop contributing to KL-on-common, L1-on-uncommon, top-1 accuracy, and the returned valid-count. Mirrors the P-KL path. - #3: Size both projection-matrix axes from the configured tokenizer vocabs (student + teacher), not max(observed_idx) + 1. CrossTokenizerDistillationLossConfig declares student_vocab_size and teacher_vocab_size; xtoken_distillation.setup() injects both at runtime from len(student_tokenizer) / len(teacher_tokenizer). get_sparse_projection_matrix now takes both as keyword-only args and clamps V_s / V_t up against the projection's observed maxima as a defensive fallback. Same-magnitude-int positional swap is guarded by the keyword-only signature. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

@RayenTian

PR NVIDIA-NeMo#2508 review (@RayenTian): - #2: Fold data["sample_mask"] into the gold-loss path's valid-chunk mask (chunk_mask & sample_mask.bool().unsqueeze(-1)) so samples with loss_multiplier=0 stop contributing to KL-on-common, L1-on-uncommon, top-1 accuracy, and the returned valid-count. Mirrors the P-KL path. - #3: Size both projection-matrix axes from the configured tokenizer vocabs (student + teacher), not max(observed_idx) + 1. CrossTokenizerDistillationLossConfig declares student_vocab_size and teacher_vocab_size; xtoken_distillation.setup() injects both at runtime from len(student_tokenizer) / len(teacher_tokenizer). get_sparse_projection_matrix now takes both as keyword-only args and clamps V_s / V_t up against the projection's observed maxima as a defensive fallback. Same-magnitude-int positional swap is guarded by the keyword-only signature. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

@RayenTian

PR NVIDIA-NeMo#2508 review (@RayenTian): - #2: Fold data["sample_mask"] into the gold-loss path's valid-chunk mask (chunk_mask & sample_mask.bool().unsqueeze(-1)) so samples with loss_multiplier=0 stop contributing to KL-on-common, L1-on-uncommon, top-1 accuracy, and the returned valid-count. Mirrors the P-KL path. - #3: Size both projection-matrix axes from the configured tokenizer vocabs (student + teacher), not max(observed_idx) + 1. CrossTokenizerDistillationLossConfig declares student_vocab_size and teacher_vocab_size; xtoken_distillation.setup() injects both at runtime from len(student_tokenizer) / len(teacher_tokenizer). get_sparse_projection_matrix now takes both as keyword-only args and clamps V_s / V_t up against the projection's observed maxima as a defensive fallback. Same-magnitude-int positional swap is guarded by the keyword-only signature. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

add worker and policy hooks for cross-tokenizer distillation flow

15806fa

Extend LMPolicy and DTensorPolicyWorkerV2 with teacher-forward and cross-tokenizer state update paths while preserving the current worker architecture and IPC-based distillation flow. Made-with: Cursor

github-actions Bot added the Stale label Apr 27, 2026

github-actions Bot closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add worker and policy hooks for cross-tokenizer distillation flow#2

Add worker and policy hooks for cross-tokenizer distillation flow#2
avenkateshha wants to merge 1 commit into
xtoken/stack-pr2-lossfrom
xtoken/stack-pr3-worker-policy

avenkateshha commented Apr 12, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avenkateshha commented Apr 12, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant