Zh/xpu support by jiqing-feng · Pull Request #9 · jiqing-feng/bitsandbytes

jiqing-feng · 2024-10-30T01:25:02Z

No description provided.

jiqing-feng · 2024-11-12T02:53:14Z

bitsandbytes/backends/cpu_xpu_common.py

+    out_dq = torch.empty(out_uint8.shape).to(quant_state.dtype).to(A.device)
    for i in range(len(quant_state.code)):
+        # quant_state.code is fp32, cast to quant_state dtype to avoid the mismatch issue
+        quant_state.code = quant_state.code.to(quant_state.dtype)


Why in the for loop?

Why in the for loop?

Thanks. Put it out of the loop.

jiqing-feng · 2024-11-12T02:55:43Z

bitsandbytes/backends/xpu.py

+    for t in tensors:
+        if t is None:
+            continue  # NULL pointers are fine
+        on_xpu &= t.device.type == "xpu"


Please avoid binary operation if the codes belong to logic instead of computation.

jiqing-feng · 2024-11-12T02:56:06Z

bitsandbytes/backends/xpu.py

+        on_xpu &= t.device.type == "xpu"
+    if not on_xpu:
+        raise TypeError(
+            "All input tensors need to be on CPU, but found some tensors to not be on XPU:\n"


Log mismatch with if logic.

Log mismatch with if logic.

Thanks for the reminder. corrected.

jiqing-feng · 2024-11-12T02:56:54Z

bitsandbytes/backends/xpu.py

-        raise NotImplementedError
+        """
+        Transform tensor A to to_order. It is originally designed for CUDA.
+        For CPU, it returns the original tensor if transpose=False.


It's XPU's OP so need to change the comments.

* enable new ipex API ipex weight is 4D so we cannot transpose fix dequant check require grad * use ipex op in backward * enable backward * Multi backend refactor (#8) * AMD: Clarify diagnostic messages; free up disk space for CI build * Add build job for rocm * Add rocm build script * Copy shared obj file into output_dir * upload build artifacts and enable wheels build * Remove cuda build temporarily * Add ROCm version to .so filename * Add rocm_version to whls build * Revert "Remove cuda build temporarily" This reverts commit 1413c5f. * Add rocm_version env var * Remove thrush header files * Print node info * print cuda node info * Revert "print cuda node info" This reverts commit cdb209a. * Revert "Print node info" This reverts commit 7e9a65c. * Add rocm arch to compile command * Rename .so files to rocm * Update default gpu arch * Skip cpu based igemmlt int tests on ROCm * Update Documentation * Update upstream repo name * Update docs * Update string format Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove pre-release option for torch install * Update pytorch install path Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> * Add messages for Heuristics error * Remove toolcache for disk space * print disk usage * Clean disk space for linux * Fix for ubuntu * Add sudo for apt clean * Update clean up disk list * remove disk usage print * Add BNB_BACKEND variable * Update diagnostic functions for ROCm * Fix tuple error * Fix library detection bug for recursive and symlink cases * fix pre-commit errors * Remove recursive path lib search * Create function for runtime lib patterns * Update logger format Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove commented code Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting * Create hip diagnostics functions * Fix Typo * Fix pre-commit checks --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> * check grad before using ipex (bitsandbytes-foundation#1358) * Enable packaging for ROCm 6.2 (bitsandbytes-foundation#1367) * Enable 6.2 build * Update documentation for 6.2.0 pip install * Update for VS2022 17.11 compatibility with CUDA < 12.4 (bitsandbytes-foundation#1341) * Update for VS2022 17.11 compatibility with CUDA < 12.4 * Try again * Enable continuous releases for multi-backend-refactor branch * Update release workflow * Publish continuous release for multi-backend * continuous release: revert wheel renaming due to install err * Revert "continuous release: revert wheel renaming due to install err" This reverts commit 0a2b539. * add dynamic tag-based versioning + git hash for dev vers * docs: update w/ changes from `main` * get tags for dynamic versioning * fine-tune continuous release params * reduce the pkg size + build times for the preview release * refine docs for multi-backend alpha release (bitsandbytes-foundation#1380) * refine docs for multi-backend alpha release * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: add multi-backend feedback links * docs: add request for contributions * docs: small fixes * docs: small fixes * docs: add info about `main` continuous build * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: remove 2 obsolete lines --------- Co-authored-by: pnunna93 <104791500+pnunna93@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Revert "enable backward" This reverts commit cd7bf21. * Revert "use ipex op in backward" This reverts commit b8df1aa. * fix finetune * check training * fix gemv check * reformat * avoid double quant in backward if not needed * Zh/xpu support (#9) * Add xpu support * Add xpu support for int8 * Add xpu dequant kernel support * update code * remove debug comments * remove redundant comments * Add xpu integration for woqlinear * correct the comments * Update cpu_xpu_common.py --------- Co-authored-by: zhuhong61 <hong.zhu@intel.com> Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com> * avoid import triton if CPU and XPU backend * fix setup in docker without git config * xpu do not support compile for now Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update 4bit compute dtype * fix xpu int8 path Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * optimize 4bit dequant Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu dequant Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add empty cache in each xpu op * add nf4 dequant ipex kernel * fix dequant 4bit op * empty cache has negative effect on 4bit gemv * fix xpu save * fix save * xpu use float16 default Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm empty cache as it cause slower perf Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu save Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format * update readme for Intel CPU and XPU do not need make csrc codes * fix format * fix import --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: pnunna93 <104791500+pnunna93@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: zhuhong61 <hong.zhu@intel.com> Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com>

Enable hip_bfloat16 for optim tests

zhuhong61 and others added 7 commits November 10, 2024 05:40

Add xpu support

883eb11

Add xpu support for int8

255434f

Add xpu dequant kernel support

421bcb0

update code

f075a8a

remove debug comments

be8babc

remove redundant comments

dc01ef9

Add xpu integration for woqlinear

c88901a

zhuhong61 force-pushed the zh/xpu_support branch from 4920332 to c88901a Compare November 10, 2024 13:44

jiqing-feng commented Nov 12, 2024

View reviewed changes

zhuhong61 added 2 commits November 12, 2024 15:57

correct the comments

c7af359

Update cpu_xpu_common.py

89a5ed8

jiqing-feng merged commit 1bde567 into jiqing-feng:new_ipex Nov 12, 2024

jiqing-feng pushed a commit that referenced this pull request Dec 3, 2024

Merge pull request #9 from ROCm/rocm_enabled_fix_bfloat16

b6770bf

Enable hip_bfloat16 for optim tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zh/xpu support#9

Zh/xpu support#9
jiqing-feng merged 9 commits intojiqing-feng:new_ipexfrom
zhuhong61:zh/xpu_support

jiqing-feng commented Oct 30, 2024

Uh oh!

jiqing-feng Nov 12, 2024

Uh oh!

zhuhong61 Nov 12, 2024

Uh oh!

jiqing-feng Nov 12, 2024

Uh oh!

jiqing-feng Nov 12, 2024

Uh oh!

zhuhong61 Nov 12, 2024

Uh oh!

jiqing-feng Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiqing-feng commented Oct 30, 2024

Uh oh!

jiqing-feng Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

zhuhong61 Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

zhuhong61 Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants