Support w8a8 int8 quantization config by ispobock · Pull Request #2881 · sgl-project/sglang

ispobock · 2025-01-14T07:09:14Z

Motivation

Add quantization config for w8a8 int8 with int8 GEMM in sgl-kernel and int8 quant kernel.

w8a8_int8 can achieve ~10% higher output throughput and without accuracy loss compared to the original compressed-tensors config. (Tested on A100)

Meta-Llama-3-8B-Instruct W8A8:

# compressed-tensors
python3 -m sglang.launch_server --model neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a8 --disable-radix
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.735
Invalid: 0.002
Latency: 60.049 s
Output throughput: 2155.323 token/s

# w8a8_int8
python3 -m sglang.launch_server --model neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a8 --disable-radix --quantization w8a8_int8
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.738
Invalid: 0.002
Latency: 54.627 s
Output throughput: 2363.601 token/s

Qwen2-7B-Instruct W8A8:

# compressed-tensors
python3 -m sglang.launch_server --model neuralmagic/Qwen2-7B-Instruct-quantized.w8a8 --disable-radix
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.820
Invalid: 0.001
Latency: 61.788 s
Output throughput: 2959.476 token/s

# w8a8_int8
python3 -m sglang.launch_server --model neuralmagic/Qwen2-7B-Instruct-quantized.w8a8 --disable-radix --quantization w8a8_int8
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.827
Invalid: 0.001
Latency: 56.783 s
Output throughput: 3254.543 token/s

Usage: Add --quantization w8a8_int8 server args. Compatible with HF checkpoints with per-channel symmetric int8 quantization.

cc: @merrymercy @zhyncs @HandH1998

zhyncs

LGTM minor issues, I left some comments.
BTW please provide the results for bench_serving request rates 8/16/32/inf
We should ensure that we have an advantage not only with large batches but also with small batches, whether in terms of throughput or latency.
Also when will support for SM90 be released?

zhyncs · 2025-01-14T07:15:54Z

                "bitsandbytes",
                "gguf",
                "modelopt",
+                "w8a8_int8",


Is w8a8_int8 for int8 and w8a8_fp8 for fp8?

Maybe fp8 for w8a8_fp8?

I use w8a8_int8 since it's more clear. While there is already a fp8 config for w8a8_fp8.

cc: @HandH1998

zhyncs · 2025-01-14T07:21:04Z

Finally, when this feature is stable, we should default to using our implementation instead of the compressed-tensors implementation when using w8a8 int8.

zhyncs · 2025-01-14T07:22:43Z

note: Per-channel symmetric int8 is sufficient for most cases, and asymmetric can be temporarily unsupported.

ispobock · 2025-01-14T08:18:23Z

Output throughput of bench serving with 8/16/32/inf request rate:
Meta-Llama-3-8B-Instruct W8A8:

	8	16	32	inf
compressed-tensors	1466.39	2706.45	3974.99	5033.05
w8a8_int8	1474.78	2766.6	4079.65	5458.3

Qwen2-7B-Instruct W8A8:

	8	16	32	inf
compressed-tensors	1495.06	2838.5	4340.89	5875.63
w8a8_int8	1504.16	2874.64	4441.89	6342.17

zhyncs · 2025-01-14T08:28:37Z

Output throughput of bench serving with 8/16/32/inf request rate: Meta-Llama-3-8B-Instruct W8A8:

8 16 32 inf
compressed-tensors 1466.39 2706.45 3974.99 5033.05
w8a8_int8 1474.78 2766.6 4079.65 5458.3
Qwen2-7B-Instruct W8A8:

8 16 32 inf
compressed-tensors 1495.06 2838.5 4340.89 5875.63
w8a8_int8 1504.16 2874.64 4441.89 6342.17

Looks good. How about the latency (TTFT and ITL)?

ispobock · 2025-01-14T08:38:44Z

Both the TTFT/TPOT/ITL are reduced for w8a8_int8 config. Better acceleration can be achieved for higher QPS workload.

The benchmark results are attached here for reference.

benchmark results

Meta-Llama-3-8B-Instruct W8A8

compressed-tensors:

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    8.0       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  129.97    
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190060    
Request throughput (req/s):              7.69      
Input token throughput (tok/s):          1726.81   
Output token throughput (tok/s):         1466.39   
Total token throughput (tok/s):          3193.20   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2164.38   
Median E2E Latency (ms):                 1362.63   
---------------Time to First Token----------------
Mean TTFT (ms):                          44.60     
Median TTFT (ms):                        38.92     
P99 TTFT (ms):                           101.38    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          11.05     
Median TPOT (ms):                        11.05     
P99 TPOT (ms):                           15.87     
---------------Inter-token Latency----------------
Mean ITL (ms):                           11.19     
Median ITL (ms):                         9.47      
P99 ITL (ms):                            33.77     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    16.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  70.42     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190072    
Request throughput (req/s):              14.20     
Input token throughput (tok/s):          3187.09   
Output token throughput (tok/s):         2706.45   
Total token throughput (tok/s):          5893.54   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   3180.88   
Median E2E Latency (ms):                 2046.36   
---------------Time to First Token----------------
Mean TTFT (ms):                          55.14     
Median TTFT (ms):                        46.00     
P99 TTFT (ms):                           165.67    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          16.62     
Median TPOT (ms):                        16.60     
P99 TPOT (ms):                           27.06     
---------------Inter-token Latency----------------
Mean ITL (ms):                           16.56     
Median ITL (ms):                         11.84     
P99 ITL (ms):                            69.71     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    32.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  47.95     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190118    
Request throughput (req/s):              20.86     
Input token throughput (tok/s):          4680.91   
Output token throughput (tok/s):         3974.99   
Total token throughput (tok/s):          8655.90   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   7566.96   
Median E2E Latency (ms):                 5266.79   
---------------Time to First Token----------------
Mean TTFT (ms):                          81.65     
Median TTFT (ms):                        68.12     
P99 TTFT (ms):                           343.40    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          43.78     
Median TPOT (ms):                        44.08     
P99 TPOT (ms):                           79.32     
---------------Inter-token Latency----------------
Mean ITL (ms):                           40.18     
Median ITL (ms):                         26.59     
P99 ITL (ms):                            197.54    
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  37.87     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190057    
Request throughput (req/s):              26.41     
Input token throughput (tok/s):          5926.89   
Output token throughput (tok/s):         5033.05   
Total token throughput (tok/s):          10959.94  
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   17136.67  
Median E2E Latency (ms):                 15738.97  
---------------Time to First Token----------------
Mean TTFT (ms):                          5921.58   
Median TTFT (ms):                        6002.31   
P99 TTFT (ms):                           10340.84  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          182.56    
Median TPOT (ms):                        71.13     
P99 TPOT (ms):                           1750.90   
---------------Inter-token Latency----------------
Mean ITL (ms):                           62.16     
Median ITL (ms):                         35.83     
P99 ITL (ms):                            374.61    
==================================================

w8a8_int8:
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    8.0       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  129.24    
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190031    
Request throughput (req/s):              7.74      
Input token throughput (tok/s):          1736.69   
Output token throughput (tok/s):         1474.78   
Total token throughput (tok/s):          3211.46   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1925.56   
Median E2E Latency (ms):                 1192.22   
---------------Time to First Token----------------
Mean TTFT (ms):                          43.66     
Median TTFT (ms):                        38.34     
P99 TTFT (ms):                           94.66     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          9.79      
Median TPOT (ms):                        9.77      
P99 TPOT (ms):                           13.94     
---------------Inter-token Latency----------------
Mean ITL (ms):                           9.94      
Median ITL (ms):                         8.25      
P99 ITL (ms):                            30.87     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    16.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  68.89     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190062    
Request throughput (req/s):              14.52     
Input token throughput (tok/s):          3257.93   
Output token throughput (tok/s):         2766.60   
Total token throughput (tok/s):          6024.52   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2793.15   
Median E2E Latency (ms):                 1805.46   
---------------Time to First Token----------------
Mean TTFT (ms):                          53.05     
Median TTFT (ms):                        47.57     
P99 TTFT (ms):                           132.42    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          14.58     
Median TPOT (ms):                        14.48     
P99 TPOT (ms):                           23.72     
---------------Inter-token Latency----------------
Mean ITL (ms):                           14.47     
Median ITL (ms):                         10.12     
P99 ITL (ms):                            59.69     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    32.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  46.72     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190126    
Request throughput (req/s):              21.40     
Input token throughput (tok/s):          4804.17   
Output token throughput (tok/s):         4079.65   
Total token throughput (tok/s):          8883.82   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   7320.94   
Median E2E Latency (ms):                 5101.72   
---------------Time to First Token----------------
Mean TTFT (ms):                          80.55     
Median TTFT (ms):                        67.42     
P99 TTFT (ms):                           315.84    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          42.43     
Median TPOT (ms):                        42.42     
P99 TPOT (ms):                           80.71     
---------------Inter-token Latency----------------
Mean ITL (ms):                           38.88     
Median ITL (ms):                         26.50     
P99 ITL (ms):                            186.94    
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  34.92     
Total input tokens:                      224442    
Total generated tokens:                  190594    
Total generated tokens (retokenized):    190135    
Request throughput (req/s):              28.64     
Input token throughput (tok/s):          6427.65   
Output token throughput (tok/s):         5458.30   
Total token throughput (tok/s):          11885.94  
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   15999.06  
Median E2E Latency (ms):                 14633.12  
---------------Time to First Token----------------
Mean TTFT (ms):                          5452.74   
Median TTFT (ms):                        5644.27   
P99 TTFT (ms):                           9997.80   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          171.75    
Median TPOT (ms):                        63.93     
P99 TPOT (ms):                           1672.20   
---------------Inter-token Latency----------------
Mean ITL (ms):                           59.74     
Median ITL (ms):                         34.26     
P99 ITL (ms):                            408.06    
==================================================


Qwen2-7B-Instruct W8A8

compressed-tensors:
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    8.0       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  129.54    
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192969    
Request throughput (req/s):              7.72      
Input token throughput (tok/s):          1733.29   
Output token throughput (tok/s):         1495.06   
Total token throughput (tok/s):          3228.35   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1938.27   
Median E2E Latency (ms):                 1256.01   
---------------Time to First Token----------------
Mean TTFT (ms):                          40.38     
Median TTFT (ms):                        36.29     
P99 TTFT (ms):                           77.22     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          9.82      
Median TPOT (ms):                        9.75      
P99 TPOT (ms):                           14.36     
---------------Inter-token Latency----------------
Mean ITL (ms):                           9.86      
Median ITL (ms):                         8.37      
P99 ITL (ms):                            29.11     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    16.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  68.23     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192936    
Request throughput (req/s):              14.66     
Input token throughput (tok/s):          3290.79   
Output token throughput (tok/s):         2838.50   
Total token throughput (tok/s):          6129.29   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2665.23   
Median E2E Latency (ms):                 1736.74   
---------------Time to First Token----------------
Mean TTFT (ms):                          46.62     
Median TTFT (ms):                        43.87     
P99 TTFT (ms):                           97.28     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          13.75     
Median TPOT (ms):                        13.62     
P99 TPOT (ms):                           20.99     
---------------Inter-token Latency----------------
Mean ITL (ms):                           13.61     
Median ITL (ms):                         9.88      
P99 ITL (ms):                            57.90     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    32.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  44.62     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192961    
Request throughput (req/s):              22.41     
Input token throughput (tok/s):          5032.58   
Output token throughput (tok/s):         4340.89   
Total token throughput (tok/s):          9373.47   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   5616.23   
Median E2E Latency (ms):                 3741.99   
---------------Time to First Token----------------
Mean TTFT (ms):                          74.94     
Median TTFT (ms):                        59.93     
P99 TTFT (ms):                           349.78    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          30.86     
Median TPOT (ms):                        29.60     
P99 TPOT (ms):                           58.55     
---------------Inter-token Latency----------------
Mean ITL (ms):                           29.13     
Median ITL (ms):                         18.86     
P99 ITL (ms):                            155.11    
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  32.96     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192876    
Request throughput (req/s):              30.34     
Input token throughput (tok/s):          6811.87   
Output token throughput (tok/s):         5875.63   
Total token throughput (tok/s):          12687.50  
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   15538.36  
Median E2E Latency (ms):                 14530.61  
---------------Time to First Token----------------
Mean TTFT (ms):                          5997.06   
Median TTFT (ms):                        6093.12   
P99 TTFT (ms):                           10020.09  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          157.43    
Median TPOT (ms):                        59.89     
P99 TPOT (ms):                           1378.39   
---------------Inter-token Latency----------------
Mean ITL (ms):                           53.97     
Median ITL (ms):                         29.90     
P99 ITL (ms):                            449.47    
==================================================

w8a8_int8：
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    8.0       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  128.76    
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192926    
Request throughput (req/s):              7.77      
Input token throughput (tok/s):          1743.83   
Output token throughput (tok/s):         1504.16   
Total token throughput (tok/s):          3247.99   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1703.65   
Median E2E Latency (ms):                 1104.48   
---------------Time to First Token----------------
Mean TTFT (ms):                          39.22     
Median TTFT (ms):                        35.41     
P99 TTFT (ms):                           71.44     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          8.59      
Median TPOT (ms):                        8.51      
P99 TPOT (ms):                           13.29     
---------------Inter-token Latency----------------
Mean ITL (ms):                           8.64      
Median ITL (ms):                         7.26      
P99 ITL (ms):                            26.98     
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    16.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  67.37     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    192974    
Request throughput (req/s):              14.84     
Input token throughput (tok/s):          3332.70   
Output token throughput (tok/s):         2874.64   
Total token throughput (tok/s):          6207.34   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2281.17   
Median E2E Latency (ms):                 1485.09   
---------------Time to First Token----------------
Mean TTFT (ms):                          45.36     
Median TTFT (ms):                        42.50     
P99 TTFT (ms):                           87.35     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          11.62     
Median TPOT (ms):                        11.56     
P99 TPOT (ms):                           17.97     
---------------Inter-token Latency----------------
Mean ITL (ms):                           11.61     
Median ITL (ms):                         8.31      
P99 ITL (ms):                            48.37     
==================================================


============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    32.0      
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  43.60     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    193006    
Request throughput (req/s):              22.94     
Input token throughput (tok/s):          5149.67   
Output token throughput (tok/s):         4441.89   
Total token throughput (tok/s):          9591.56   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   5355.07   
Median E2E Latency (ms):                 3592.69   
---------------Time to First Token----------------
Mean TTFT (ms):                          66.25     
Median TTFT (ms):                        58.20     
P99 TTFT (ms):                           250.74    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          29.76     
Median TPOT (ms):                        26.80     
P99 TPOT (ms):                           59.81     
---------------Inter-token Latency----------------
Mean ITL (ms):                           27.83     
Median ITL (ms):                         19.33     
P99 ITL (ms):                            160.13    
==================================================

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max reqeuest concurrency:                not set   
Successful requests:                     1000      
Benchmark duration (s):                  30.54     
Total input tokens:                      224530    
Total generated tokens:                  193670    
Total generated tokens (retokenized):    193005    
Request throughput (req/s):              32.75     
Input token throughput (tok/s):          7352.75   
Output token throughput (tok/s):         6342.17   
Total token throughput (tok/s):          13694.93  
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   14421.96  
Median E2E Latency (ms):                 13297.67  
---------------Time to First Token----------------
Mean TTFT (ms):                          5540.09   
Median TTFT (ms):                        5794.65   
P99 TTFT (ms):                           9121.20   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          142.23    
Median TPOT (ms):                        54.38     
P99 TPOT (ms):                           1293.88   
---------------Inter-token Latency----------------
Mean ITL (ms):                           49.37     
Median ITL (ms):                         29.35     
P99 ITL (ms):                            380.42    
==================================================

add w8a8 int8 config

baee9cc

ispobock requested review from ByronHsu, Ying1123, hnyls2002, merrymercy and zhyncs as code owners January 14, 2025 07:09

zhyncs approved these changes Jan 14, 2025

View reviewed changes

zhyncs added enhancement New feature or request high priority performance quant LLM Quantization labels Jan 14, 2025

zhyncs requested a review from BBuf January 14, 2025 07:24

zhyncs assigned BBuf and zhyncs Jan 14, 2025

update import

0e0313c

zhyncs reviewed Jan 14, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/quantization/w8a8_int8.py Outdated

zhyncs added 2 commits January 14, 2025 17:01

upd

a430089

Merge branch 'main' into w8a8-int8

0e58e60

zhyncs reviewed Jan 14, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/quantization/w8a8_int8.py

Comment thread python/sglang/srt/layers/quantization/w8a8_int8.py

zhyncs added 2 commits January 14, 2025 17:06

upd

83bcdb7

upd

f86d903

zhyncs merged commit cc0485b into sgl-project:main Jan 14, 2025

ispobock mentioned this pull request Jan 26, 2025

[Feature] request smoothquant (int8, W8A8) quantization on 40G A100 #2474

Closed

2 tasks

HandH1998 mentioned this pull request Feb 12, 2025

Apply sgl w8a8 fp8 kernel #3148

Merged

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Support w8a8 int8 quantization config (sgl-project#2881)

4c8d0d7

ispobock mentioned this pull request Aug 26, 2025

[model] added support for w8a8int8 used by neuralmagic/Qwen2-0.5B-Ins… #9642

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support w8a8 int8 quantization config#2881

Support w8a8 int8 quantization config#2881
zhyncs merged 6 commits intosgl-project:mainfrom
ispobock:w8a8-int8

ispobock commented Jan 14, 2025 •

edited

Loading

Uh oh!

zhyncs left a comment

Uh oh!

Uh oh!

zhyncs Jan 14, 2025

Uh oh!

ispobock Jan 14, 2025

Uh oh!

ispobock Jan 14, 2025

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

ispobock commented Jan 14, 2025

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

ispobock commented Jan 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ispobock commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

zhyncs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhyncs Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

ispobock commented Jan 14, 2025

Uh oh!

zhyncs commented Jan 14, 2025

Uh oh!

ispobock commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ispobock commented Jan 14, 2025 •

edited

Loading

ispobock commented Jan 14, 2025 •

edited

Loading