Skip to content

Y7: GPU Performance Benchmarks (APR decode ≥200 tok/s) #141

@noahgift

Description

@noahgift

Overview

Implement GPU performance benchmarks for APR format per Section Y.2 of the spec.

Requirement

  • Y7: APR decode speed must be ≥200 tok/s on GPU (RTX 4090 reference)
  • Must match or exceed GGUF decode speed on same hardware

Falsification Condition

APR < 200 tok/s when GGUF ≥ 200 tok/s on same GPU

Implementation Tasks

  • Add CUDA feature flag to realizar
  • Implement GPU kernels for APR inference
  • Create benchmark harness for GPU performance
  • Verify parity with GGUF on RTX 4090 or equivalent
  • Add to CI with GPU runner (optional)

Blocked By

  • Requires GPU hardware for development and testing

References

  • Spec: docs/specifications/apr-whisper-and-cookbook-support-eoy-2025.md Section Y.2
  • Related: Y6 (CPU benchmarks) - ✅ Verified at 206.4 tok/s

Priority

P2 - Deferred (no GPU hardware available currently)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions