Checklist
Describe the bug
when I use slurm to launch 32 or 192 jobs for offline batch inference, which simultaneously load sgl.engine. I met the following error although I set disable_disk_cache=True. If I only run one job for this, it will not meet this error.
The error is as follows:
Traceback (most recent call last):
File "/home/xiaonan/mycode/code_data_synthesis/generate_python_docstring_slurm_task.py", line 64, in <module>
llm = sgl.Engine(model_path=args.model_name, tp_size=args.tp_size, disable_disk_cache=True)
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/api.py", line 48, in Engine
from sglang.srt.server import Engine
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/server.py", line 49, in <module>
from sglang.srt.managers.data_parallel_controller import (
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/data_parallel_controller.py", line 24, in <module>
from sglang.srt.managers.io_struct import (
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/io_struct.py", line 26, in <module>
from sglang.srt.managers.schedule_batch import BaseFinishReason
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/schedule_batch.py", line 40, in <module>
from sglang.srt.constrained.grammar import Grammar
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/constrained/__init__.py", line 24, in <module>
from outlines.caching import cache as disk_cache
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/__init__.py", line 2, in <module>
import outlines.generate
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/__init__.py", line 2, in <module>
from .cfg import cfg
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/cfg.py", line 3, in <module>
from outlines.fsm.guide import CFGGuide
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/fsm/guide.py", line 109, in <module>
def create_states_mapping(
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 93, in decorator
memory = get_cache()
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 55, in get_cache
memory = Cache(
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 499, in __init__
sql(query, (key, value))
File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 666, in _execute_with_retry
return sql(statement, *args, **kwargs)
sqlite3.OperationalError: disk I/O error
Reproduction
Python:
slurm_task.py
import sglang as sgl
llm = sgl.Engine(model_path='Qwen/Qwen2.5-Coder-32B-Instruct', tp_size=2, disable_disk_cache=True)
Sbatch Script:
#!/bin/bash
#SBATCH --job-name=task1 # job name
#SBATCH --output=slurm_logs/%A_%a/output.txt # output file
#SBATCH --error=slurm_logs/%A_%a/error.txt # error file
#SBATCH --array=0-191%192
#SBATCH --ntasks=1
#SBATCH --gres=gpu:2
#SBATCH --cpus-per-task=16
Python slurm_task.py
Environment
2024-11-19 08:47:37.576574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: False
PyTorch: 2.4.0
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.2
requests: 2.32.3
tqdm: 4.67.0
numpy: 1.23.0
aiohttp: 3.10.5
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.24.6
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.54.4
anthropic: 0.39.0
Hypervisor vendor: KVM
ulimit soft: 1024
Checklist
Describe the bug
when I use slurm to launch 32 or 192 jobs for offline batch inference, which simultaneously load sgl.engine. I met the following error although I set disable_disk_cache=True. If I only run one job for this, it will not meet this error.
The error is as follows:
Reproduction
Python:
slurm_task.py
Sbatch Script:
Environment
2024-11-19 08:47:37.576574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: False
PyTorch: 2.4.0
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.2
requests: 2.32.3
tqdm: 4.67.0
numpy: 1.23.0
aiohttp: 3.10.5
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.24.6
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.54.4
anthropic: 0.39.0
Hypervisor vendor: KVM
ulimit soft: 1024