Skip to content

cbtop: BrickScore output hardcoded — 5 bugs corrupt profiling JSON #420

@noahgift

Description

@noahgift

Summary

apr cbtop --model-path <FILE> --headless --json produces completely fake BrickScore data in JSON output, despite the BrickProfiler collecting correct GPU timing internally. The stderr shows real data (LmHead avg=595µs) but JSON reports LmHead actual_us=1.9µs.

Root Cause

brick_scores_from_profiler() in gguf.rs hardcodes all scoring fields and uses wrong denominator.

Bugs (5 total)

Bug 1: gguf.rs:406-413brick_scores_from_profiler hardcodes everything

  • score: 100, grade: "R", gap_factor: 1.0 all hardcoded
  • budget_us and actual_us both set to per_token_us (total_ns / profiler.total_tokens)
  • profiler.total_tokens() counts brick ELEMENTS (~952K), not decoded tokens (~3K)
  • Result: all bricks show score=100, gap=1.0, actual=budget — completely useless

Bug 2: cbtop_measure_batch.rs:328-335 — weighted score truncation

  • pmat_brick_score uses 7 hardcoded weights [1.5, 6.0, 1.0, 10.0, 3.5, 1.5, 12.2]
  • Real profiler returns 11 bricks — zip silently drops 4
  • Aggregate brick_score wrong

Bug 3: cbtop_measure_batch.rs:362-365 — hardcoded PMAT scores

  • rust_project_score: 173.9, tdg_score: 98.1, cuda_tdg_score: 95.2 are constants
  • Not computed from any real data

Bug 4: cbtop_measure_batch.rs:369-374 — hardcoded FalsificationSummary

  • total_points: 137, passed: 137, failed: 0, blocked: 0
  • No actual falsification tests run

Bug 5: cbtop_measure_batch.rs:338 — hardcoded target

  • target_tok_s = 976.0 is a stale CPU spec target
  • Should be derived from hardware capability or removed

Contract Violation

Violates C-GDP-001 (gpu-decode-profiling-v1.yaml): brick data must reflect real GPU execution time.

Fix Plan

  1. Use stats.avg_us() for actual_us (per-call GPU time)
  2. Derive decoded_tokens from LmHead.count (exactly 1 per decoded token)
  3. Compute real score/grade/gap_factor using existing compute_brick_score()
  4. Use equal weights for dynamic brick count instead of hardcoded 7-element array
  5. Remove hardcoded PMAT scores and falsification summary — report 0 if not computed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions