Skip to content

[IR] Support integer subbyte#403

Merged
xiaocenxiaocen merged 4 commits intohidet-org:mainfrom
xiaocenxiaocen:support-integer-subbyte
Jan 26, 2024
Merged

[IR] Support integer subbyte#403
xiaocenxiaocen merged 4 commits intohidet-org:mainfrom
xiaocenxiaocen:support-integer-subbyte

Conversation

@xiaocenxiaocen
Copy link
Copy Markdown
Contributor

  • support sub byte integers in Hidet
a = register_tensor("int4b", [4, 4])
b = a[0, 2]
a[2, 2] = int4b(-5)
ptr = &a[0, 0]
ptr = ptr + 8

@yaoyaoding
Copy link
Copy Markdown
Member

Hi @xiaocenxiaocen, let me know when the PR is ready to be reviewed, thanks!

@yaoyaoding yaoyaoding changed the title [Ir] support integer subbyte [IR] Support integer subbyte Jan 9, 2024
@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

Hi @xiaocenxiaocen, let me know when the PR is ready to be reviewed, thanks!

Sure. I will work on this in this week and the next week.

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch from a2b7795 to 803b1c2 Compare January 20, 2024 15:47
@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

Hi, @yaoyaoding. This PR is ready for review. Please take a look at it. Thanks.

@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

hidet-ci launch

Copy link
Copy Markdown
Member

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiaocenxiaocen for the support of sub-integer type!

It looks good to me overall. And I left some minor suggestions to make some part more consistent with the existing implementation (like the data type).

Feel free to merge this PR by yourself after you resolve those comments.

@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

$hidet-ci launch

1 similar comment
@hjjq
Copy link
Copy Markdown
Collaborator

hjjq commented Jan 25, 2024

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch 2 times, most recently from 25e9a56 to 87cc2b7 Compare January 25, 2024 22:42
@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch from 87cc2b7 to 7121c88 Compare January 26, 2024 01:23
@xiaocenxiaocen
Copy link
Copy Markdown
Contributor Author

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen merged commit 8befb62 into hidet-org:main Jan 26, 2024
vadiklyutiy pushed a commit that referenced this pull request Dec 19, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 20, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants