Summary
Add quantization export support for trained models to reduce size and enable edge deployment.
Background
Per trueno-aprender-stdlib-core-language-spec.md Section 13.4 (Model Persistence):
- Implement quantization (Q8_0, Q4_0) export
Requirements
-
Quantization Formats
- Q8_0: 8-bit quantization (4x size reduction)
- Q4_0: 4-bit quantization (8x size reduction)
- Compatible with GGUF/llama.cpp ecosystem
-
API
impl Model {
fn quantize(&self, format: QuantFormat) -> QuantizedModel;
fn save_quantized(&self, path: &str, format: QuantFormat) -> Result<(), Error>;
}
enum QuantFormat {
Q8_0,
Q4_0,
Q4_1,
Q5_0,
}
-
Quality Preservation
- Accuracy degradation < 1% for Q8_0
- Accuracy degradation < 5% for Q4_0
- Calibration dataset support for optimal quantization
Acceptance Criteria
Related
- Ruchy spec:
docs/specifications/trueno-aprender-stdlib-core-language-spec.md
- Integration:
ruchy::stdlib::aprender_bridge
Summary
Add quantization export support for trained models to reduce size and enable edge deployment.
Background
Per
trueno-aprender-stdlib-core-language-spec.mdSection 13.4 (Model Persistence):Requirements
Quantization Formats
API
Quality Preservation
Acceptance Criteria
Related
docs/specifications/trueno-aprender-stdlib-core-language-spec.mdruchy::stdlib::aprender_bridge