Skip to content

[1/2] L2 CI: End to End Performance#2884

Merged
ApostaC merged 6 commits intoLMCache:devfrom
Oasis-Git:l2ci-1
Mar 27, 2026
Merged

[1/2] L2 CI: End to End Performance#2884
ApostaC merged 6 commits intoLMCache:devfrom
Oasis-Git:l2ci-1

Conversation

@Oasis-Git
Copy link
Copy Markdown
Member

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
@Oasis-Git Oasis-Git added the full Run comprehensive tests on this PR label Mar 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new performance test script, run-long-doc-qa-l2.sh, which evaluates LMCache's L2 caching capabilities by restarting the server with specific L2 configurations and verifying speedup thresholds. The main test runner, run-mp-test.sh, has been updated to include this new test step. Feedback focuses on improving script consistency by using python3, enhancing modularity by moving a large embedded Python heredoc into a standalone file, and ensuring the test suite continues executing subsequent tests even if the L2 test fails.

GPU_DEVICE="${GPU_FOR_VLLM:-0}"

CUDA_VISIBLE_DEVICES="${GPU_DEVICE}" \
python -m lmcache.v1.multiprocess.server \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and to avoid potential issues in environments where python might point to Python 2, it's better to use python3 here. The rest of the script already uses python3 (e.g., lines 129, 150, 195).

Suggested change
python -m lmcache.v1.multiprocess.server \
python3 -m lmcache.v1.multiprocess.server \

Comment on lines +195 to +267
python3 << EOF
import sys

def sf(val):
try: return float(val)
except: return None

bqt = sf("$baseline_query_ttft")
bqrt = sf("$baseline_query_round_time")
bwrt = sf("$baseline_warmup_round_time")
lqt = sf("$l2_query_ttft")
lqrt = sf("$l2_query_round_time")
lwrt = sf("$l2_warmup_round_time")

min_spd = float("$MIN_L2_SPEEDUP")
min_ttft = float("$MIN_L2_TTFT_SPEEDUP")
max_oh = float("$MAX_WARMUP_OVERHEAD")

failed = False

print("=" * 60)
print("L2 Performance Summary")
print("=" * 60)
print(f"{'Metric':<35} {'Baseline':>12} {'L2':>12}")
print("-" * 60)
for name, bv, lv in [
("query_ttft_per_prompt (s)", bqt, lqt),
("query_round_time_per_prompt (s)", bqrt, lqrt),
("warmup_round_time_per_prompt (s)", bwrt, lwrt),
]:
bs = f"{bv:.4f}" if bv else "N/A"
ls = f"{lv:.4f}" if lv else "N/A"
print(f"{name:<35} {bs:>12} {ls:>12}")

print()
print("=" * 60)
print("Threshold Verification")
print("=" * 60)

# 1. L2 query round-time speedup
if lqrt and bqrt and lqrt > 0:
s = bqrt / lqrt
ok = s >= min_spd
print(f"[{'PASS' if ok else 'FAIL'}] L2 query speedup: {s:.2f}x (need >= {min_spd}x)")
if not ok: failed = True
else:
print("[FAIL] Cannot compute L2 query speedup"); failed = True

# 2. L2 TTFT speedup
if lqt and bqt and lqt > 0:
s = bqt / lqt
ok = s >= min_ttft
print(f"[{'PASS' if ok else 'FAIL'}] L2 TTFT speedup: {s:.2f}x (need >= {min_ttft}x)")
if not ok: failed = True
else:
print("[FAIL] Cannot compute L2 TTFT speedup"); failed = True

# 3. Warmup overhead
if lwrt and bwrt and bwrt > 0:
o = lwrt / bwrt
ok = o <= max_oh
print(f"[{'PASS' if ok else 'FAIL'}] Warmup overhead: {o:.2f}x (need <= {max_oh}x)")
if not ok: failed = True
else:
print("[FAIL] Cannot compute warmup overhead"); failed = True

print()
if failed:
print("[FAIL] L2 performance verification FAILED")
sys.exit(1)
else:
print("[PASS] All L2 performance thresholds passed")
EOF
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This large Python script embedded using a heredoc is difficult to maintain, lint, and debug. For better code cleanliness and modularity, as per the repository's style guide, this logic should be moved into a separate Python script. The new script could then be called with the performance metrics and thresholds as command-line arguments.

References
  1. The style guide (lines 10-11, 61) emphasizes code cleanliness, modularity, and maintainability. Embedding a large script within another script goes against these principles, and the guide suggests that code that could be more modular should be fixed. (link)

Comment on lines +102 to +106
if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then
echo "long doc QA L2 test failed"
TEST_RESULT=1
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The exit 1 on line 105 causes the entire test suite to terminate prematurely if this test fails. This is inconsistent with the error handling in other test steps in this script (e.g., for run-long-doc-qa.sh), which only set TEST_RESULT=1 and allow subsequent tests to run. To ensure all tests are executed and all failures are reported, the exit 1 should be removed.

Suggested change
if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then
echo "long doc QA L2 test failed"
TEST_RESULT=1
exit 1
fi
if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then
echo "long doc QA L2 test failed"
TEST_RESULT=1
fi

@sammshen
Copy link
Copy Markdown
Contributor

unblocked by: #2886

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sammshen
Copy link
Copy Markdown
Contributor

1;20m[2026-03-27 10:25:38,265] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:38,313] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:38,315] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:38,315] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:38,315] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:38,316] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:39,301] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:39,302] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:39,302] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:39,303] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:39,303] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:47,590] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:47,618] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:48,422] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:48,438] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:48,439] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:49,277] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:49,292] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:49,293] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:50,144] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:50,159] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:50,160] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:51,026] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:25:51,040] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:51,042] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:51,042] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:51,931] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:51,932] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:25:52,811] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:01,132] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:26:01,148] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:01,994] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:02,878] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:26:02,892] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:03,744] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:04,075] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:26:04,091] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:04,942] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:05,169] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:26:05,184] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:06,033] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m
�[31;20m[2026-03-27 10:26:06,048] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:06,049] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:06,923] LMCache ERROR:�[0m Error in blocking handler �[3m(mq.py:433:lmcache.v1.multiprocess.mq)�[0m
Traceback (most recent call last):
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/mq.py", line 420, in _notify_response
    response = fut.result()
               ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/affinity_pool.py", line 73, in _worker
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/workspace/build/buildkite/lmcache/v1/multiprocess/server.py", line 281, in store
    assert instance_id in self.gpu_contexts, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: KV cache not registered for GPU ID 2514
�[31;20m[2026-03-27 10:26:12,195] LMCache ERROR:�[0m No GPU context found for model Qwen/Qwen3-14B with world size 1 during lookup! �[3m(server.py:587:__main__)�[0m

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) March 27, 2026 22:38
@ApostaC ApostaC merged commit d48d488 into LMCache:dev Mar 27, 2026
34 checks passed
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants