Skip to content

Unify the return type of w8a8 matmul between fallback and the actual impl.#9452

Merged
vanbasten23 merged 5 commits intomasterfrom
xiowei/fix_nonxla_return_type
Jul 10, 2025
Merged

Unify the return type of w8a8 matmul between fallback and the actual impl.#9452
vanbasten23 merged 5 commits intomasterfrom
xiowei/fix_nonxla_return_type

Conversation

@vanbasten23
Copy link
Copy Markdown
Collaborator

@vanbasten23 vanbasten23 commented Jul 8, 2025

We need to unify the return type between fallback and the actual impl. Specifically, in the fallback impl quantized_matmul_int8_non_xla, if we don't specify a dtype, it'll use the default dtype torch.float32. This can cause issue in vLLM.

Test plan:

  • python pytorch/xla/test/test_pallas.py -k test_quantized_matmul_int8
  • pytest pytorch/xla/test/test_quantized_matmul_pallas_kernel.py -s

@vanbasten23 vanbasten23 requested a review from yaochengji July 8, 2025 22:10
@vanbasten23 vanbasten23 marked this pull request as ready for review July 8, 2025 22:10
Copy link
Copy Markdown
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this!

@vanbasten23 vanbasten23 force-pushed the xiowei/fix_nonxla_return_type branch from e527723 to 9c2c0b2 Compare July 9, 2025 20:07
@vanbasten23 vanbasten23 force-pushed the xiowei/fix_nonxla_return_type branch from 8f36349 to aac2352 Compare July 10, 2025 00:01
@vanbasten23
Copy link
Copy Markdown
Collaborator Author

Thanks for the review!

@vanbasten23 vanbasten23 merged commit 52569ec into master Jul 10, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants