Skip to content

Fix: reduce memory overuse and fix poll cqe hang#60

Merged
isytwu merged 4 commits intomainfrom
fix_train_cqe_error_rebase
Sep 3, 2025
Merged

Fix: reduce memory overuse and fix poll cqe hang#60
isytwu merged 4 commits intomainfrom
fix_train_cqe_error_rebase

Conversation

@isytwu
Copy link
Copy Markdown
Collaborator

@isytwu isytwu commented Sep 3, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@isytwu isytwu requested review from TianDi101 and jhchouuu September 3, 2025 09:40
@TianDi101
Copy link
Copy Markdown
Collaborator

LGTM

@isytwu isytwu merged commit e99ef9e into main Sep 3, 2025
jhchouuu added a commit that referenced this pull request Sep 3, 2025
*  poll cqe hang WA

* fix internode segFault

* reduce memory overuse

* add ShmemQuietThread in dispatch

---------

Co-authored-by: jhchouuu <jiahzhou@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants