Skip to content

misc: speedup load safetensors#1319

Merged
zhyncs merged 1 commit intomainfrom
speedup
Sep 3, 2024
Merged

misc: speedup load safetensors#1319
zhyncs merged 1 commit intomainfrom
speedup

Conversation

@zhyncs
Copy link
Copy Markdown
Collaborator

@zhyncs zhyncs commented Sep 3, 2024

Motivation

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Co-authored-by: ispobock <ISPObaoke@163.com>
@zhyncs
Copy link
Copy Markdown
Collaborator Author

zhyncs commented Sep 3, 2024

python3 benchmark/gsm8k/bench_sglang.py

Latency: 92.746
Invalid: 0.000
Accuracy: 0.935

python3 -m sglang.bench_serving --backend sglang --num-prompts 5000

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Successful requests:                     5000
Benchmark duration (s):                  346.41
Total input tokens:                      1224620
Total generated tokens:                  1061203
Total generated tokens (retokenized):    1055493
Request throughput (req/s):              14.43
Input token throughput (tok/s):          3535.22
Output token throughput (tok/s):         3063.47
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   145289.76
Median E2E Latency (ms):                 143439.03
---------------Time to First Token----------------
Mean TTFT (ms):                          60141.13
Median TTFT (ms):                        55335.66
P99 TTFT (ms):                           131151.83
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          742.79
Median TPOT (ms):                        549.99
P99 TPOT (ms):                           4744.62
---------------Inter-token Latency----------------
Mean ITL (ms):                           424.83
Median ITL (ms):                         237.24
P99 ITL (ms):                            1699.88
=================================================

@zhyncs zhyncs merged commit dc67d97 into main Sep 3, 2024
@zhyncs zhyncs deleted the speedup branch September 3, 2024 18:29
@zhyncs
Copy link
Copy Markdown
Collaborator Author

zhyncs commented Sep 3, 2024

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Co-authored-by: ispobock <ISPObaoke@163.com>
@Hexq0210 Hexq0210 mentioned this pull request Dec 13, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants