Skip to content

feat: update nightly gsm8k eval#1304

Merged
zhyncs merged 3 commits intomainfrom
night
Sep 2, 2024
Merged

feat: update nightly gsm8k eval#1304
zhyncs merged 3 commits intomainfrom
night

Conversation

@zhyncs
Copy link
Copy Markdown
Collaborator

@zhyncs zhyncs commented Sep 2, 2024

Motivation

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs
Copy link
Copy Markdown
Collaborator Author

zhyncs commented Sep 2, 2024

# H100 TP 2, latest v0.2.15
python3 -m sglang.launch_server --model neuralmagic/Qwen2-72B-Instruct-FP8 --quantization fp8  --trust-remote-code --tp 2 --kv-cache-dtype fp8_e5m2
python3 -m sglang.bench_serving --backend sglang
Traceback (most recent call last):
[14:47:01 TP0] Exception in ModelTpServer:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 244, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 260, in forward_step
    self.forward_prefill_batch(new_batch)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 507, in forward_prefill_batch
    sample_output, logits_output = self.model_runner.forward(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 584, in forward
    return self.forward_extend(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 542, in forward_extend
    input_metadata = InputMetadata.from_schedule_batch(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 215, in from_schedule_batch
    ret.init_flashinfer_handlers(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 245, in init_flashinfer_handlers
    update_flashinfer_indices(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 374, in update_flashinfer_indices
    model_runner.flashinfer_prefill_wrapper_paged.begin_forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 832, in plan
    self._wrapper.plan(
RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 599785472 and alignment 16 in AlignedAllocator

  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 896, in run_tp_server
    model_server.exposed_step(recv_reqs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 244, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 260, in forward_step
    self.forward_prefill_batch(new_batch)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 507, in forward_prefill_batch
    sample_output, logits_output = self.model_runner.forward(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 584, in forward
    return self.forward_extend(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 542, in forward_extend
    input_metadata = InputMetadata.from_schedule_batch(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 215, in from_schedule_batch
    ret.init_flashinfer_handlers(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 245, in init_flashinfer_handlers
    update_flashinfer_indices(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 374, in update_flashinfer_indices
    model_runner.flashinfer_prefill_wrapper_paged.begin_forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 832, in plan
    self._wrapper.plan(
RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 599785472 and alignment 16 in AlignedAllocator

It works well without --kv-cache-dtype fp8_e5m2. @ispobock @yzh119 may help take a look

@zhyncs zhyncs removed the wip label Sep 2, 2024
Comment thread test/srt/test_nightly_gsm8k_eval.py
@zhyncs
Copy link
Copy Markdown
Collaborator Author

zhyncs commented Sep 2, 2024

fix #1272

@zhyncs
Copy link
Copy Markdown
Collaborator Author

zhyncs commented Sep 2, 2024

@zhyncs zhyncs self-assigned this Sep 2, 2024
@zhyncs zhyncs enabled auto-merge (squash) September 2, 2024 14:57
@zhyncs zhyncs disabled auto-merge September 2, 2024 15:18
@zhyncs zhyncs merged commit 2561ed0 into main Sep 2, 2024
@zhyncs zhyncs deleted the night branch September 2, 2024 15:18
@zhyncs zhyncs mentioned this pull request Sep 2, 2024
3 tasks
@ispobock ispobock mentioned this pull request Sep 7, 2024
5 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant