Skip to content

[Bug] The Prometheus metric avg_request_queue_latency is not being collected #6357

@jorgeantonio21

Description

@jorgeantonio21

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I have integrate Sglang metrics with Grafana dashboards, and the metric avg_request_queue_latency is not currently available, while all the other metrics are being collected.

The avg_request_queue_latency is of importance for understanding the behavior of the system in high load situations.

Reproduction

You can query Prometheus metrics at http://localhost:3000/metrics

Environment

I have deployed using docker compose:

sglang:
    container_name: sglang
    profiles: [chat_completions_sglang]
    image: lmsysorg/sglang:latest
    volumes:
      - ${HOME}/.cache/huggingface:/root/.cache/huggingface
    restart: always
    network_mode: host # required by RDMA
    privileged: true # required by RDMA
    environment:
      HF_TOKEN: ${HF_TOKEN}
    entrypoint: python3 -m sglang.launch_server
    command:
      --model-path ${SGLANG_MODEL_PATH} --tp 8 --enable-dp-attention --dp 8 --trust-remote-code --max-running-requests 128 --enable-metrics
      --host 0.0.0.0
      --port 3000
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions