Skip to content

[Bug] disk cache io error when simultaneously loading lots of sglang offline engine #2090

@LeeSureman

Description

@LeeSureman

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

when I use slurm to launch 32 or 192 jobs for offline batch inference, which simultaneously load sgl.engine. I met the following error although I set disable_disk_cache=True. If I only run one job for this, it will not meet this error.

The error is as follows:

Traceback (most recent call last):
  File "/home/xiaonan/mycode/code_data_synthesis/generate_python_docstring_slurm_task.py", line 64, in <module>
    llm = sgl.Engine(model_path=args.model_name, tp_size=args.tp_size, disable_disk_cache=True)
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/api.py", line 48, in Engine
    from sglang.srt.server import Engine
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/server.py", line 49, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/data_parallel_controller.py", line 24, in <module>
    from sglang.srt.managers.io_struct import (
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/io_struct.py", line 26, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/schedule_batch.py", line 40, in <module>
    from sglang.srt.constrained.grammar import Grammar
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/constrained/__init__.py", line 24, in <module>
    from outlines.caching import cache as disk_cache
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/__init__.py", line 2, in <module>
    import outlines.generate
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/__init__.py", line 2, in <module>
    from .cfg import cfg
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/cfg.py", line 3, in <module>
    from outlines.fsm.guide import CFGGuide
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/fsm/guide.py", line 109, in <module>
    def create_states_mapping(
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 93, in decorator
    memory = get_cache()
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 55, in get_cache
    memory = Cache(
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 499, in __init__
    sql(query, (key, value))
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 666, in _execute_with_retry
    return sql(statement, *args, **kwargs)
sqlite3.OperationalError: disk I/O error

Reproduction

Python:
slurm_task.py

import sglang as sgl
llm = sgl.Engine(model_path='Qwen/Qwen2.5-Coder-32B-Instruct', tp_size=2, disable_disk_cache=True)

Sbatch Script:

#!/bin/bash
#SBATCH --job-name=task1  # job name
#SBATCH --output=slurm_logs/%A_%a/output.txt     # output file 
#SBATCH --error=slurm_logs/%A_%a/error.txt       # error file
#SBATCH --array=0-191%192
#SBATCH --ntasks=1
#SBATCH --gres=gpu:2
#SBATCH --cpus-per-task=16            

Python slurm_task.py

Environment

2024-11-19 08:47:37.576574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: False
PyTorch: 2.4.0
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.2
requests: 2.32.3
tqdm: 4.67.0
numpy: 1.23.0
aiohttp: 3.10.5
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.24.6
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.54.4
anthropic: 0.39.0
Hypervisor vendor: KVM
ulimit soft: 1024

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions