Skip to content

Feat linear cross entropy kernel dev#2

Merged
Jianbing-D merged 3 commits into
feat-torch-interfacefrom
feat-linear-cross-entropy-kernel-dev
Nov 6, 2025
Merged

Feat linear cross entropy kernel dev#2
Jianbing-D merged 3 commits into
feat-torch-interfacefrom
feat-linear-cross-entropy-kernel-dev

Conversation

@Jianbing-D

@Jianbing-D Jianbing-D commented Nov 6, 2025

Copy link
Copy Markdown
Owner

Implemented critical kernels with cutedsl.

  • only DP was tested.
  • backward with only partial_dlogits available.

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
@Jianbing-D

Copy link
Copy Markdown
Owner Author

Perf and storage:
image

@Jianbing-D Jianbing-D merged commit ee8fd21 into feat-torch-interface Nov 6, 2025
Jianbing-D added a commit that referenced this pull request Nov 14, 2025
* add forward-mainloop and bwd_partial_dlogits kernel

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>

* skip TestFusedLinearCrossEntropyOnGptModel for single GPU

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>

* added unit-test for linear_cross_entropy on dp

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>

---------

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant