Skip to content

hunyuan_vl / gemma3n: drop dead assignments in cache-offset extraction#1056

Merged
Blaizzy merged 3 commits into
mainfrom
pc/tidy-cache-offset
Apr 24, 2026
Merged

hunyuan_vl / gemma3n: drop dead assignments in cache-offset extraction#1056
Blaizzy merged 3 commits into
mainfrom
pc/tidy-cache-offset

Conversation

@Blaizzy

@Blaizzy Blaizzy commented Apr 24, 2026

Copy link
Copy Markdown
Owner

Summary

Small follow-up to #1055 — two post-merge clean-ups spotted while re-reading the changes:

  • hunyuan_vl/language.py: both branches of the _idx check assigned offset = cache.offset, but only offset_scalar is read downstream (for the xdrope_section prefill-vs-decode branch). Dropped the dead assignment, and hoisted the offset_scalar = 0 default above the if cache is not None so the no-cache else disappears.
  • gemma3n/language.py: raw_offset was computed via a generator expression before the _idx fast path ran, even though the fast path doesn't use it. Moved the lookup into the fallback branch so the fast path does zero extra work.

No functional change; same outputs on the same inputs.

Test plan

  • from mlx_vlm.models.hunyuan_vl import language / from mlx_vlm.models.gemma3n import language — imports clean.
  • Re-run the decode smoke-test on mlx-community/gemma-3n-E2B-it-8bit + tencent/HunyuanOCR (CLI + server single + server concurrent).

🤖 Generated with Claude Code

Blaizzy and others added 3 commits April 24, 2026 02:44
hunyuan_vl was setting ``offset = cache.offset`` in both branches even
though only ``offset_scalar`` is read downstream — that variable is
never used again. Hoist ``offset_scalar = 0`` above the ``if cache``
branch and drop the unused ``offset`` assignment.

gemma3n was eagerly computing ``raw_offset`` via a generator expression
before the ``_idx`` fast path even ran. Move that lookup into the
fallback branch so the fast path does zero extra work.

No functional change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Blaizzy Blaizzy merged commit e41cd25 into main Apr 24, 2026
1 check passed
afanty2021 added a commit to afanty2021/mlx-vlm that referenced this pull request Apr 24, 2026
Merge changes from upstream:
- Blaizzy#1056: hunyuan_vl/gemma3n cache-offset optimization
- Blaizzy#1053: Fix DFlash speculative decoding (GPU hang, performance)
- Blaizzy#1050: Thread-local generation stream (port mlx-lm#1090)
- Blaizzy#1055: Close batch_generate/server decode gap + VLM fixes

Conflict resolution:
- requirements.txt: Mixed approach - mlx>=0.31.2 with transformers<5.4.0
  to maintain omlx compatibility while accepting mlx update

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant