ci: update container registry location by saturley-hall · Pull Request #3 · ai-dynamo/dynamo

saturley-hall · 2025-03-04T00:49:26Z

What does the PR do?

Updates the container registry used for caching

Splits CodeQL and Copyright-Checks into separate workflows.

Update image that Copyright checks uses to a new location in GHCR.io

github-actions · 2025-03-04T01:17:07Z

Test Results

2 files 2 suites 25s ⏱️
71 tests 71 ✅ 0 💤 0 ❌
89 runs 88 ✅ 1 💤 0 ❌

Results for commit c7118b9.

♻️ This comment has been updated with latest results.

Adds TestQwen235BugReproCase with the exact two picks from the original triage report: * Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4 * Decode: tp=1, dp=8, moe_tp=2, moe_ep=4 Each test pins down a specific symptom that the old profiler path produced: * test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs) and Bug #2 (using .tp_size property) would break the identity for this pick. Verifies all five kwargs and the MoE identity. * test_decode_pick_satisfies_aic_identity — same for the decode pick, which was the exact shape that triggered Bug #3 (missing attention_dp_size). * test_tp_size_property_differs_from_aic_tp_size — direct regression guard: documents that the property returns 1 here while AIC's tp_size is 4. Anyone who wires .tp_size back into AIC kwargs will break this. * test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both sweeps and asserts every AIC call gets complete kwargs. This is the direct reproducer of the originally-reported failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Fixes all actionable items from the second review: Bug fixes: - #1: Change returncode=4 → returncode=2 in pytest_configure exit (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED) - #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original) Test quality: - #7: Add missing assertions to test_apply_hf_home_layout (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE) - #8: Use monkeypatch in tests 3 & 4 for proper env isolation (prevents pre-existing env vars from leaking on test failure) Design / correctness: - #3: Fix _models_dir_env docstring ("exactly once" → "once per worker") - #4: Add comment noting TRANSFORMERS_CACHE deprecation - #5: Update --models-dir help text and docs to reflect both supported layouts (bare HF_HUB_CACHE and HF_HOME), not just bare - #10: Restore pytest.skip() in download_lora() (test-only infra); remove now-redundant guard from minio_lora_service fixture - #11: Raise hub/ detection log to WARNING with guidance - #12: Replace shutil.rmtree(ignore_errors=True) with try/except so cleanup failures are logged rather than silently swallowed Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester test deferred — complex due to conftest dependencies, low severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

Bug fixes: - #1: Add monkeypatch env isolation to test_apply_sets_writable_transformers_cache - #2: Add TRANSFORMERS_CACHE assertion to test_apply_bare_cache_layout (bare-layout path was missing the writable-dir check present in HF_HOME test) Minor cleanups: - #4: Move `import pytest` from inside download_lora() to module top-level (lora_utils.py is test-only infra; pytest is always available) - #5: Replace pytestconfig.getoption("--models-dir") re-checks in predownload_models/predownload_tokenizers with os.environ.get("DYNAMO_MODELS_DIR") (_models_dir_env runs first and sets the var; single source of truth) New coverage tests: - #7: test_models_dir_nonexistent_exits_with_code_2 — subprocess test verifying pytest_configure exits with returncode=2 on bad path - #8: test_download_lora_skips_in_models_dir_mode — verifies download_lora() raises pytest.skip.Exception when DYNAMO_MODELS_DIR is set Not addressed: #3 (keep gpu_0 per project guidelines and previous review retraction), #6 (hook ordering is guaranteed), #9 (complex, low priority) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

…code-waiter cleanup Addresses two correctness issues raised by dynamo-ops review on PR 8965. 1. **Race on PrefillReady re-insert** (model_manager.rs) Both `register_prefill_router` and `activate_prefill_router` previously used a `remove → process → insert` pattern on `prefill_router_activators`. Between the `remove` and the `insert` the entry is gone from the map, so a concurrent `remove_prefill_activator` call (called by the watcher on prefill-component teardown) hits the empty map, skips its cleanup, and we then re-insert a stale `PrefillReady` for a prefill that's already gone. The next decode rebuild's lookup finds the resurrected cache and activates a `PrefillRouter` against a dead endpoint. Refactor both functions to use DashMap's `Entry` API. The shard lock is held for the duration of the OccupiedEntry, so any concurrent `remove_prefill_activator` serializes after us and observes the entry it needs to clear. The PrefillReady-consume path in `register_prefill_router` now reads-and-clones in place without removing; the DecodeWaiting/PrefillReady → PrefillReady transition in `activate_prefill_router` uses `OccupiedEntry::insert(...)` for an atomic value swap that returns the old value. 2. **Decode-waiter cleanup gated on `removed.is_some()`** (watcher.rs) `handle_delete_helper`'s decode branch only called `remove_decode_prefill_waiter` when `remove_worker_set` returned `Some(_)`. If decode registered (creating a `DecodeWaiting` activator entry via `register_prefill_router`) but `handle_add_helper` failed later (e.g., on `kv_chooser_for`, monitor setup, or `build_routed_pipeline`) before `add_worker_set`, no WorkerSet ever landed in the manager — and on later teardown `remove_worker_set` returns `None`, leaving the stale `DecodeWaiting` entry orphaned in the activator map. Drop the `removed.is_some()` guard. The helper itself is state-safe (uses `DashMap::remove_if(|_, v| matches!(v, DecodeWaiting(_)))`) so calling it on a key that's vacant or holds `PrefillReady` is a no-op. The prefill-teardown branch keeps its `removed.is_some()` guard because `remove_prefill_activator` blanket-removes both states, and calling it without a real WorkerSet teardown could orphan a live decode-side `DecodeWaiting`. Behavior is unchanged for all four reproducers (ai-dynamo#1 burst-sweep canary OFF, ai-dynamo#2 Pattern 1 multi-decode restart, ai-dynamo#3 Pattern 2 stale-DecodeWaiting, ai-dynamo#4 burst-sweep canary ON); will re-verify on a fresh build.

…rop shebang Three CI failures on PR #9360 addressed: 1. ``check-shebang-scripts-are-executable`` (pre-commit): ``tests/vllm/test_self_benchmark_gpu.py`` carried a ``#!/usr/bin/env python3`` shebang but isn't intended to be directly executed (pytest collects it). Drop the shebang to satisfy the hook without granting exec bits. 2. ``pytest-marker-report`` (pre-commit) -- 9 collection errors: ``tests/vllm/__init__.py`` makes ``tests/vllm`` a Python package that shadows the real ``vllm`` distribution when ``tests`` is on sys.path during marker discovery. Every other vLLM-suite test that does ``from vllm import LLM`` (or similar) broke with ``ImportError: cannot import name 'LLM' from 'vllm'`` pointing at our ``tests/vllm/__init__.py``. Renamed to ``tests/vllm_self_benchmark/`` to remove the name collision; all 9 shadowed imports resolve to the real ``vllm`` again. 3. ``vllm-runtime / Multi-GPU Test`` -- disagg test failed with: (EngineCore pid=...) zmq.error.ZMQError: Address already in use (addr='tcp://*:20380') ``InstrumentedScheduler`` (auto-injected by ``--benchmark-mode``) binds the FPM publisher to ``tcp://*:DYN_FORWARDPASS_METRIC_PORT + dp_rank``, defaulting to 20380. Two workers on the same host (the disagg prefill + decode pair) both try to bind 20380 and the second loses. The operator works around this by injecting ``DYN_FORWARDPASS_METRIC_PORT`` per engine (deploy/operator/internal/dynamo/component_worker.go:111); the single-GPU agg test happens to avoid the collision because only one worker runs. Apply the same fix in the test: allocate a per-worker port via ``allocate_port(20380)`` and inject it as the env var, with matching ``deallocate_port`` on teardown. (The cancellation/migration tests don't hit this because they don't pass ``--benchmark-mode`` and so don't get InstrumentedScheduler.) Verified locally: * ``import vllm`` resolves to the real site-packages vllm again. * pytest collection of the renamed test passes; marker selectors ``pre_merge and vllm and gpu_1`` (agg test) and ``pre_merge and vllm and (gpu_2 or gpu_4)`` (disagg test) still match -- so both stay wired to ``pr.yaml``'s vllm-test and vllm-multi-gpu-test GPU stages and continue to gate the PR. The single-GPU agg test passed in the prior CI run (only the disagg test failed) so issue #3 is specifically what was needed for disagg to pass. Signed-off-by: Hongkuan Zhou <hongkuanz@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

…later Reader-first ordering. Old order buried Quick start at position 8 of 9; new order surfaces the runnable commands above all the reference sections (FSx vs EBS, label conventions, aiperf SHA rationale). New section order: 1. Title + 3-config table 2. Pre-requisites (was #3) 3. Quick start (was #8 — promoted) 4. Directory layout (was #7 — now serves as map for the rest) 5. Hardware targets (was #2 — now pure reference; invocation examples moved into Quick start) 6. Storage (was #5) 7. aiperf install (was #6) 8. Naming & ownership (was #4) 9. Notes (unchanged) Also drop the stray "We use ebs by default" sentence — it contradicted both the Storage section and the actual yaml (where the PVC block is fully commented out, no default storage class is set). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cycle Address senior-architect review of the metrics infrastructure: #1 (two parallel Prometheus paths): merge `register_prometheus` + `setup_component_metrics` into one `setup_metrics(ctx) -> MetricsBindings` method on the Rust `LLMEngine` trait. The trait surface shrinks from 7 → 6 methods; both expfmt bridging and the structured component publisher go through one place. The Python ABC keeps its two methods — the PyO3 bridge adapts (engine-author API unchanged for Python). #3 + #8 (framework lifecycle gauges gated on engine opt-in): - `cleanup_time_seconds` and `drain_time_seconds` move to Rust-side `LifecycleGauges` (prometheus crate gauges registered via the runtime's `MetricsRegistry`). Worker constructs them after `engine.start()` succeeds; observes during cleanup/drain. Emits regardless of engine opt-in. - Drop `set_cleanup_time` / `set_drain_time` from the `ComponentMetricsPublisher` trait — engine no longer responsible. - Drop the corresponding methods from `PyComponentMetricsPublisher`. - `model_load_time_seconds` STAYS on Python `LLMBackendMetrics` for parity with the legacy entry points (both legacy `main.py` and the unified bridge populate it). The unified bridge now constructs `LLMBackendMetrics` UNCONDITIONALLY so the gauge emits even when the engine returns no per-rank sources. #5 (kv_cache_hit_rate Optional semantics): clearer docstring on both Rust `ComponentSnapshot` and the Python ABC — tri-state explicitly documented (`None` = no data, `0.0` = legitimate zero-hit). Deferred with rationale (explained in the post-review architect's note): - #2 (HashMap<MetricKey, f64> ComponentSnapshot): real future-proofing but no current bug; 5 fields for 5 signals isn't worth the refactor cost yet. Revisit at ~10+ signals. - #4 (per-source cadence): explicit defer in original review. - #7 (conformance in CI): workflow change, separate from trait surface. Conformance kit collapses `check_register_prometheus` + `check_component_metrics` into `check_setup_metrics`. All 68 backend-common unit tests pass. Mocker example updated. PyO3 bridge wheel rebuilt successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When rbac.namespaceRestricted=true the chart renders a namespace-scoped Role + RoleBinding (templates/role.yaml + rolebinding.yaml) instead of the cluster-scoped pair, but the DaemonSet's container command never passes --namespace to the agent. power_agent.py:438-446 therefore takes its else branch — list_pod_for_all_namespaces — which the namespace-only token rejects with 403 Forbidden on every reconcile. Reconciles never advance and no caps are ever applied (silent failure: the agent doesn't exit; pods are just never enumerated). The --namespace CLI flag itself already exists (power_agent.py:542) and the list_namespaced_pod branch (:439-442) works correctly — the dev-pod template (templates/dev-pod.yaml:57) has been wiring this all along via --namespace=$POD_NAMESPACE. This commit extends the same downward-API POD_NAMESPACE env-var pattern to the production DaemonSet, gated on the existing power-agent.effectiveNamespaceRestricted helper so both: - production mode with rbac.namespaceRestricted=true - dev mode (forces effective=true unless overridden) produce a DaemonSet whose argv matches the RBAC scope its token holds. Verified by helm template against both modes: - default (false) → no --namespace, no POD_NAMESPACE, ClusterRole - true → --namespace=$(POD_NAMESPACE), POD_NAMESPACE env via downward API, Role + RoleBinding - helm lint passes in both modes Refs: PR #9682 review, Power Agent live-test finding #3. Signed-off-by: Kai Ma <kaim@nvidia.com>

- decode_handler._extract_mm_hashes docstring now states 16-char hex for the SGLang path (was incorrectly claiming the vLLM 64-char shape; Devin #5). - sglang launch banner drops the "Lightseek" prefix; uses "MM Exact Routing (SGLang)" to match the public name (Ryan suggestion #1). - ModelRuntimeConfig.backend_framework field gains a doc-block on its motivation — frontend uses it for backend-specific routing hints (Ryan #3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When rbac.namespaceRestricted=true the chart renders a namespace-scoped Role + RoleBinding (templates/role.yaml + rolebinding.yaml) instead of the cluster-scoped pair, but the DaemonSet's container command never passes --namespace to the agent. power_agent.py:438-446 therefore takes its else branch — list_pod_for_all_namespaces — which the namespace-only token rejects with 403 Forbidden on every reconcile. Reconciles never advance and no caps are ever applied (silent failure: the agent doesn't exit; pods are just never enumerated). The --namespace CLI flag itself already exists (power_agent.py:542) and the list_namespaced_pod branch (:439-442) works correctly — the dev-pod template (templates/dev-pod.yaml:57) has been wiring this all along via --namespace=$POD_NAMESPACE. This commit extends the same downward-API POD_NAMESPACE env-var pattern to the production DaemonSet, gated on the existing power-agent.effectiveNamespaceRestricted helper so both: - production mode with rbac.namespaceRestricted=true - dev mode (forces effective=true unless overridden) produce a DaemonSet whose argv matches the RBAC scope its token holds. Verified by helm template against both modes: - default (false) → no --namespace, no POD_NAMESPACE, ClusterRole - true → --namespace=$(POD_NAMESPACE), POD_NAMESPACE env via downward API, Role + RoleBinding - helm lint passes in both modes Refs: PR #9682 review, Power Agent live-test finding #3. Signed-off-by: Kai Ma <kaim@nvidia.com>

The cache-realloc (#1) and metric-handle (#2/#3) commits left a few lines unformatted — `longest_prefix_match`'s collapsed signature and the `merged` binding in l1.rs, and the per-worker gauge assignment + a test assert in metrics.rs. `cargo fmt --all --check` (run by the rust-tests CI job) flagged them, failing the job. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… path (review #3) On the orchestrator replay path (use_orchestrator=True) the regression models were never installed: replay/main.py guarded the AIC benchmark-FPM bootstrap behind ``adapter._sm is not None and not adapter._sm._is_easy``, and ``_sm`` is None under use_orchestrator — so the whole block (incl. the ``load_benchmark_fpms`` calls) was skipped. ``get_regression`` then returned None for the whole replay, the throughput regression stayed empty, and orchestrator-replay scaling decisions diverged from PSM — contradicting the adapter docstrings that claimed bootstrap_from_fpms→install_regressions was wired. Fix: - engine_adapter: extract ``install_regressions_from_fpms`` (synchronous, builds the regressions from benchmark FPMs via the throwaway-PSM factory and installs them on the shared store — no plugin bootstrap). ``bootstrap_from_fpms`` now = install_regressions_from_fpms + bootstrap_plugins. - replay_adapter: add path-agnostic ``install_benchmark_fpms`` — PSM path → ``PlannerStateMachine.load_benchmark_fpms``; orchestrator path → ``install_regressions_from_fpms`` (plugins were already bootstrapped at adapter construction, so this does NOT double-bootstrap). Corrected the ``_get_regression`` docstring to the real wiring. - replay/main.py: guard is now ``not adapter._is_easy_mode()`` (path- agnostic) and the two ``adapter._sm.load_benchmark_fpms(...)`` calls become ``adapter.install_benchmark_fpms(...)``. Test: install_benchmark_fpms on the orchestrator path makes get_regression("agg") non-None (was None pre-fix). 840 planner tests pass (+1). NOTE: replay/main.py's end-to-end path can't be exercised in this env (its Rust _core symbol predates the installed extension); the edit is py_compile-validated and the adapter method it calls is unit-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The cache-realloc (#1) and metric-handle (#2/#3) commits left a few lines unformatted — `longest_prefix_match`'s collapsed signature and the `merged` binding in l1.rs, and the per-worker gauge assignment + a test assert in metrics.rs. `cargo fmt --all --check` (run by the rust-tests CI job) flagged them, failing the job. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

During each cell's measurement window, take a few lightweight per-node snapshots (background thread, foreground sub.wait() unperturbed) capturing the event-plane's resource footprint so we can quantify the brokerless-ZMQ vs NATS cost trade-off: - established TCP count via `ss -tn state established` (ZMQ direct mesh is O(p*s); NATS is O(p+s) to the broker) -- the headline connection-cost diff. - per-process CPU% / RSS / open-fd / nproc for the key classes (mocker, event_plane_bench_sub, nats-server, frontend, zmq_broker, loadgen) via /proc/<pid>/{stat,status} + ls /proc/<pid>/fd. - nats-server broker cost: active during NATS cells vs ~idle during ZMQ cells (ZMQ has no broker -- the saving to quantify). New ressample.py is a stdlib-only host-level sampler (runs on the node HOST, not inside pyxis, so its ss/`/proc` scan sees every enroot process + all node TCP). bench.py fires it on every node via a new launcher.popen_host() (srun with no --container-* flags), collects the echoed JSON snapshots, folds them into compact summary stats, writes a per-cell *_res.json, and attaches the summary to every summary.json row. Gated behind DYN_BENCH_RESSAMPLE (default on); snapshot count via DYN_BENCH_RESSAMPLE_N (default 2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The sampler snapshots are read back from the srun-streamed stdout logs, not a follow-up `cat` srun; correct the docstring to match the implementation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#3) Three fixes found in the lego-c2 sanity run: 1. RSS read 0 for in-container processes. Viewed from the node host, enroot/ pyxis processes report VmRSS=0 in /proc/<pid>/status (and 0 resident in statm); only /proc/<pid>/smaps_rollup `Rss:` reports the true resident set. Read smaps_rollup first, fall back to VmRSS for host procs. 2. nproc inflation. The `srun` client + slurmstepd that launch nats-server / event_plane_bench_sub carry the full `-- nats-server ...` command in their argv, so the cmdline-substring match counted them too (nats nproc=3 instead of 1, with 0 RSS dragging the sum). Skip launcher/wrapper cmdlines (srun/slurmstepd/bash -c/` -- `/the sampler itself) so each class counts only the real target process. 3. Worker-node samplers crashed with FileNotFoundError: the node0-created LOGDIR doesn't exist on worker nodes, so `open(--out)` died before the snapshot was emitted -> only node0 reported, losing the cross-node connection counts. Now echo the JSON to stdout FIRST (the orchestrator reads snapshots from the srun-streamed stdout, not the --out file), then makedirs + best-effort write the node-local --out copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ci: update container registry location

fea0239

saturley-hall had a problem deploying to GITLAB March 4, 2025 00:49 — with GitHub Actions Failure

saturley-hall temporarily deployed to GITLAB March 4, 2025 01:32 — with GitHub Actions Inactive

saturley-hall had a problem deploying to GITLAB March 4, 2025 01:33 — with GitHub Actions Failure

ci: fix copyright checks container image

583efbf

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:11 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:12 — with GitHub Actions Inactive

ci: split out jobs into separate workflows

4d09196

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:21 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:22 — with GitHub Actions Inactive

Merge branch 'main' into harrison/ci-test

a8e4bea

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:23 — with GitHub Actions Inactive

ci[fix]: indentation

6484762

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:24 — with GitHub Actions Inactive

saturley-hall requested a review from nv-anants March 4, 2025 02:25

saturley-hall temporarily deployed to GITLAB March 4, 2025 02:27 — with GitHub Actions Inactive

ci: use inline --cache-to to be able to pull image

7a60b58

saturley-hall temporarily deployed to GITLAB March 4, 2025 13:46 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 13:50 — with GitHub Actions Inactive

nv-anants and others added 3 commits March 4, 2025 06:38

remove timeout for cargo deny check

11ff1ce

fix

c9a3b10

ci: follow naming conventions

6cccc48

saturley-hall temporarily deployed to GITLAB March 4, 2025 15:00 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 15:01 — with GitHub Actions Inactive

saturley-hall and others added 4 commits March 4, 2025 10:18

codeowners: update to dynemo org

7c24ae1

ci: fix copyright checks

7a482c4

rerun: check timing

d750624

ci: revert to working cache location

23e0ab9

saturley-hall temporarily deployed to GITLAB March 4, 2025 15:26 — with GitHub Actions Inactive

nnshah1 mentioned this pull request Apr 3, 2026

feat: request lifecycle logging + metrics capture [DIS-1643] #7840

Merged

4 tasks

tanmayv25 mentioned this pull request Apr 15, 2026

DEP: Backend Interface -- LLMEngine ABC and Worker #8251

Open

nnshah1 mentioned this pull request Apr 22, 2026

fix(health): skip canary health check for trtllm/vllm disagg decode workers #8215

Closed

6 tasks

keivenchang mentioned this pull request Apr 24, 2026

[INVESTIGATE] sglang + DSv4: kwarg workaround breaks live — needs diagnosis #8671

Closed

4 tasks

dmitry-tokarev-nv mentioned this pull request Apr 28, 2026

fix: Added retries to curl commands #8784

Merged

richardhuo-nv mentioned this pull request May 7, 2026

fix: do not override the the event_buffer_max_size if user provided one #9284

Merged

jooe0824 mentioned this pull request May 20, 2026

fix(profiler): propagate spec.sla.{ttft,itl} into PlannerConfig #9537

Merged

dmitry-tokarev-nv mentioned this pull request May 26, 2026

fix(trtllm): restore deps, default CMD, and reduce layer count after upstream base switch #9889

Merged

15 tasks

rmccorm4 mentioned this pull request Jun 1, 2026

fix: MM-aware KV routing for Phi-3, Qwen2-VL, Qwen2.5-VL #9441

Merged

tzulingk mentioned this pull request Jun 2, 2026

perf(embedding): build embedding base64 directly from the tensor (numpy.tobytes) #10221

Closed

jthomson04 mentioned this pull request Jun 3, 2026

perf(frontend): profile-guided hot-path optimizations (tokenizer cache, metric handles, log-trace cap) #10273

Merged

BenHamm mentioned this pull request Jun 14, 2026

docs: add Recipes & Feature Benchmarks Fern surfaces #10628

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: update container registry location#3

ci: update container registry location#3
saturley-hall merged 16 commits into
mainfrom
harrison/ci-test

saturley-hall commented Mar 4, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

saturley-hall commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Uh oh!

github-actions Bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saturley-hall commented Mar 4, 2025 •

edited

Loading

github-actions Bot commented Mar 4, 2025 •

edited

Loading