perf(dflash): gate top-5 logits diagnostic behind LLAMA_DFLASH_DEBUG by marksverdhei · Pull Request #78 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-05T12:23:44Z

Epoch #73 task 5 (DFlash perf scout, in lieu of titan-gated bench).

Finding

common/speculative.cpp:935 has an unconditional top-5 logits selection inside the DFlash drafter's hot path:

if (i == 1) {
    std::vector<int> top;
    top.reserve(5);
    for (int j = 0; j < n_vocab; ++j) {
        if ((int) top.size() < 5) {
            top.push_back(j);
            std::sort(top.begin(), top.end(), [&](int a, int b) { return logits[a] > logits[b]; });
        } else if (logits[j] > logits[top.back()]) {
            top.back() = j;
            std::sort(top.begin(), top.end(), [&](int a, int b) { return logits[a] > logits[b]; });
        }
    }
    // build top_dbg string and LOG_INF
}

The if (i == 1) gates when (once per draft call) but not whether. The LOG_INF below is verbosity-gated, so on production the log is suppressed — but the O(n_vocab × log 5) selection still runs.

On gemma-class vocabs (~256k tokens), that's ~1ms per draft call. At Round-10's measured ~8% accept rate, every output token costs several draft calls — so this debug computation is in the steady-state hot path.

Fix

Extend the gate to if (i == 1 && dflash_debug). dflash_debug is the cached env-var probe already declared at line 883 (used by the features-debug block immediately above).

Verified

✅ cmake --build build --target llama-server succeeds
✅ Behavior change: only when LLAMA_DFLASH_DEBUG is set (was unconditional → now gated; identical code runs when enabled)

Out of scope

Other DFlash hot-path observations from the scout that did NOT make this PR (no clear-win fix):

accumulated_ctx grows unboundedly across draft calls. With default LLAMA_DFLASH_CTX_WINDOW=512 only the tail is used, but the buffer keeps growing. Memory leak shape, not perf. Worth a follow-up trim.
std::sort inside the j-loop is asymptotically fine but has a 5-element sort per insertion. Could be replaced by a hand-rolled 5-slot insertion-sort (O(5) vs O(5 log 5 × n_vocab) constant factor). Speed-up is small and only matters when DFLASH_DEBUG=1.

Epoch #73 task 5 (DFlash perf scout, in lieu of titan-gated bench). The top-5 logits selection at common/speculative.cpp:935 was unconditional — `if (i == 1)` gated when (once per draft call) but not whether. The LOG_INF below it is verbosity-gated, so on production the log is suppressed, but the O(n_vocab * log 5) selection still runs. On gemma-class vocabs (~256k tokens) the selection burns ~1ms per draft call. At Round-10's measured ~8% accept rate, every output token costs several draft calls — so this debug computation is in the steady-state hot path. Fix: extend the gate to `if (i == 1 && dflash_debug)`. `dflash_debug` is the cached env-var probe already declared at line 883 (used by the features-debug block immediately above). When LLAMA_DFLASH_DEBUG is set the diagnostic still fires; production is unaffected. Found during epoch #73 task 5 — DFlash hot-path scout. Local CPU build verifies; behavior change only when LLAMA_DFLASH_DEBUG is set (was unconditional → now gated; same code runs when enabled).

This was referenced Jun 5, 2026

Hivemind Maintenance Tasks Epoch 1 #73

Closed

Hivemind Maintenance Tasks Epoch 3 #81

Closed

Hivemind Maintenance Tasks Epoch 4 #86

Closed

marksverdhei merged commit 6733bc1 into ht Jun 12, 2026
3 of 7 checks passed

marksverdhei deleted the perf/dflash-gate-debug-top5 branch June 12, 2026 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(dflash): gate top-5 logits diagnostic behind LLAMA_DFLASH_DEBUG#78

perf(dflash): gate top-5 logits diagnostic behind LLAMA_DFLASH_DEBUG#78
marksverdhei merged 1 commit into
htfrom
perf/dflash-gate-debug-top5

marksverdhei commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 5, 2026

Finding

Fix

Verified

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant