fix vulkan ggml_acc only works in 3d but not 4d#19426
fix vulkan ggml_acc only works in 3d but not 4d#194260cc4m merged 10 commits intoggml-org:masterfrom
Conversation
|
The latest commit seems to fix the potential problems pointed out by Jeff |
|
The metal failures will be resolved with #19427 after merging this PR. |
|
Added TODO for the CUDA implementation to support this |
| if (ox < p.ne10 && oy < p.ne11 && oz < p.ne12) { | ||
| data_d[get_doffset() + dst_idx(i00, i01, i02, i03)] = D_TYPE(FLOAT_TYPE(data_a[get_aoffset() + src0_idx(i00, i01, i02, i03)]) + FLOAT_TYPE(data_b[get_boffset() + ox + oy * p.ne10 + oz * p.ne10 * p.ne11])); | ||
| if (i0 < p.ne10 && i1 < p.ne11 && i2 < p.ne12 && i3 < p.ne13) { | ||
| data_d[get_doffset() + dst_idx(i00, i01, i02, i03)] = D_TYPE(FLOAT_TYPE(data_a[get_aoffset() + src0_idx(i00, i01, i02, i03)]) + FLOAT_TYPE(data_b[get_boffset() + i0 + i1 * p.nb11 + i2 * p.nb12 + i3 * p.nb13])); |
There was a problem hiding this comment.
You could use src1_idx here and it would also handle permuted tensors (by applying nb10). Would be nice to test permuted tensors, too.
| const uint i0 = rem1 % p.nb01; | ||
|
|
||
| uint i00, i01, i02, i03; | ||
| get_indices(idx, i00, i01, i02, i03); |
There was a problem hiding this comment.
Looking at the CPU reference, seems like there should only be one set of indices and then you can apply offset/sizeof(float) to the final index for src0 and dst.
| int offset = dst->op_params[3] / src0_type_size; // offset in bytes | ||
|
|
||
| ggml_vk_op_f32<vk_op_binary_push_constants>(ctx, subctx, src0, src1, nullptr, nullptr, dst, GGML_OP_ACC, { | ||
| (uint32_t)ggml_nelements(src0), |
There was a problem hiding this comment.
I think this should be src1?
There was a problem hiding this comment.
My suggestion was to change line 9810, not 9807. I think this change is needed to make it possible to remove the bound check.
There was a problem hiding this comment.
I just tried changing 9810 but not 9807 and remove boundary check, it still fails.
Changing 9810 not 9807 and keep boundary check, it also fails.
…s suggestion except to keep the boundary check
|
Latest commit should address Jeff's suggestions except for the boundary check. Without the boundary check, the code simply doesn't pass the tests. |
|
Committed a version without the check that passes my tests and measured the performance by adding my test cases to test_cases_perf However, it seems to be significantly slower than previous code: I think it is better to revert to the previous boundary check version unless Jeff knows how to speed it up. |
|
Sorry, yes, it's better to do it in a single shader and have the branch. I somehow missed that there was an else case and that it was an "inside vs outside" check rather than just an out of bounds check. |
|
Back to boundary check version but keep the test cases added in perf. |
* fix vulkan ggml_acc only works in 3d but not 4d * removed clamp in test_acc_block * use the correct stride and its test case * cuda : fix "supports op" condition * change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check * version without boundary check * revert back to boundary check version --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> (cherry picked from commit 0e21991)
* fix vulkan ggml_acc only works in 3d but not 4d * removed clamp in test_acc_block * use the correct stride and its test case * cuda : fix "supports op" condition * change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check * version without boundary check * revert back to boundary check version --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix vulkan ggml_acc only works in 3d but not 4d * removed clamp in test_acc_block * use the correct stride and its test case * cuda : fix "supports op" condition * change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check * version without boundary check * revert back to boundary check version --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix vulkan ggml_acc only works in 3d but not 4d * removed clamp in test_acc_block * use the correct stride and its test case * cuda : fix "supports op" condition * change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check * version without boundary check * revert back to boundary check version --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Make sure to read the contributing guidelines before submitting a PR
Discovered ggml_acc for vulkan only works in 3d not 4d while working on
#18792
So fixed it and also added test case to test-backend-ops