Skip to content

Use f32 scratch for output so we only need to transfer output with desired dtype back to HBM.#8924

Merged
vanbasten23 merged 4 commits intomasterfrom
xiowei/migrate_kernel_change
Apr 3, 2025
Merged

Use f32 scratch for output so we only need to transfer output with desired dtype back to HBM.#8924
vanbasten23 merged 4 commits intomasterfrom
xiowei/migrate_kernel_change

Conversation

@vanbasten23
Copy link
Copy Markdown
Collaborator

@vanbasten23 vanbasten23 commented Apr 2, 2025

Make the kernel code in-sync by migrating jax-ml/jax@b719ac0.

@vanbasten23
Copy link
Copy Markdown
Collaborator Author

cc: @bythew3i

@vanbasten23 vanbasten23 marked this pull request as ready for review April 2, 2025 18:09
Copy link
Copy Markdown
Contributor

@bythew3i bythew3i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Xiongfei!

@vanbasten23
Copy link
Copy Markdown
Collaborator Author

the CI failure seems legit. Taking a look.

Copy link
Copy Markdown
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@vanbasten23
Copy link
Copy Markdown
Collaborator Author

Thanks for the review! Let me wait for the TPU CI.

@vanbasten23
Copy link
Copy Markdown
Collaborator Author

GPU CI failure is irrelevant to this PR.

@vanbasten23 vanbasten23 merged commit f0881b5 into master Apr 3, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants