Applying the jetson fixes#847
Applying the jetson fixes#847ivansmith7795 wants to merge 1 commit intobitsandbytes-foundation:mainfrom
Conversation
|
Merging changes from jetson branch |
|
Did this work out for you? It seems a straightforward fix and a good contribution if this would make the library jetson compatible. |
|
Do you have opinions on this PR? Could one of you two do the review? |
| CC_CUDA11x += -gencode arch=compute_80,code=sm_80 | ||
| CC_CUDA11x += -gencode arch=compute_86,code=sm_86 | ||
|
|
||
| CC_CUDA11x += -gencode arch=compute_87,code=sm_87 |
There was a problem hiding this comment.
Can we confirm that the cmake file works with the Jetson devices? It compiles, but I do not have a device to test with.
Wheels can be taken from the latest build from here
https://github.com/TimDettmers/bitsandbytes/actions/workflows/python-package.yml
|
|
||
|
|
||
| template <int THREADS, int ITEMS_PER_THREAD, int TILE_ROWS, int TILE_COLS, int SPARSE_DECOMP> __global__ void kDoubleRowColQuant(half *__restrict__ const A, float *__restrict__ const rowStats, float * __restrict__ const colStats, char *out_col_normed, char *out_row_normed, int *rowidx, int *colidx, half *val, int * __restrict__ nnz_block_ptr, float threshold, int rows, int cols, int tiledCols) | ||
| template <int THREADS, int ITEMS_PER_THREAD, int TILE_ROWS, int TILE_COLS, int SPARSE_DECOMP> __global__ void kDoubleRowColQuant(half *__restrict__ const A, float *__restrict__ const rowStats, float * __restrict__ const colStats, int8_t *out_col_normed, int8_t *out_row_normed, int *rowidx, int *colidx, half *val, int * __restrict__ nnz_block_ptr, float threshold, int rows, int cols, int tiledCols) |
There was a problem hiding this comment.
Not sure why this is needed, but as long as it compiles on all platforms (looking at you, MSVC :) ), I don't see a problem with the change either .IIRC, int8_t is exactly 8 bits, while char is at least 8 bits
| #include <immintrin.h> | ||
| #elif defined __aarch64__ | ||
| #warning "--- THIS IS AARCH64" | ||
| #include <sse2neon.h> |
There was a problem hiding this comment.
We are going to need to support Neon one way or the other. I am pondering if this is the right approach though, or if we should implement the Neon intrinsics directly? If it saves us time in the short run, maybe a viable option?
| @@ -1,3 +1,4 @@ | |||
| #!/usr/bin/python3 | |||
There was a problem hiding this comment.
If we want this, /usr/bin/env python3 is more portable.
Also, the file is not executable. Need to chmod 755 and commit if this is to make sense
9b72679 to
7800734
Compare
|
Hi all, As far as Jetson support is concerned, we're not yet distributing wheels for that, but it can be built. E.g. there is a distribution being published that has been mentioned here: #1437 (comment). |
No description provided.