Summary
Add apr prune and apr distill subcommands that surface entrenar's pruning and distillation pipelines. This is the #5 most common operation on HuggingFace models — NVIDIA's Minitron paper showed you can derive an 8B model from 15B using 40x fewer training tokens than training from scratch.
Currently #149 tracks Lottery Ticket Hypothesis pruning. This issue is broader: structured pruning (depth/width/attention/MLP) + knowledge distillation as a combined pipeline, matching what NVIDIA, Intel, and HuggingFace recommend.
Motivation
- Hub ecosystem: Pruned+distilled models (Nemotron-Mini-4B, Llama-3.2 derived from 3.1) are the highest-impact derivatives — they create genuinely smaller architectures, not just compressed weights.
- entrenar already has the building blocks:
entrenar::prune has calibrate, pipeline, schedule, config, callback, data_loader, and trainer_integration. entrenar::distill has loss, progressive, and ensemble distillation.
- Gap: No CLI exposure.
apr has no prune or distill subcommands. The entrenar modules are fully internal.
Proposed CLI
# Structured pruning (remove attention heads + MLP neurons)
apr prune model.apr --method structured --target-ratio 0.5 --calibration data.jsonl -o pruned.apr
# Depth pruning (remove entire layers)
apr prune model.apr --method depth --remove-layers "20-24" -o pruned.apr
# Width pruning (reduce hidden dimensions)
apr prune model.apr --method width --target-hidden 512 -o pruned.apr
# Knowledge distillation (teacher → student)
apr distill teacher.apr --student pruned.apr --data train.jsonl --epochs 3 -o distilled.apr
# Progressive distillation (gradual pruning + distillation)
apr distill teacher.apr --progressive --target-ratio 0.5 --data train.jsonl -o distilled.apr
# Combined prune+distill pipeline (Minitron-style)
apr prune model.apr --method structured --target-ratio 0.5 \
--distill --data train.jsonl --epochs 3 -o final.apr
# Planning mode (estimate time, memory, expected quality loss)
apr prune model.apr --method structured --target-ratio 0.5 --plan --json
# Analysis mode (identify which heads/layers to prune)
apr prune model.apr --analyze --calibration data.jsonl --json
Implementation Path
- Wire
entrenar::prune::pipeline into apr prune subcommand
- Wire
entrenar::distill into apr distill subcommand
- Add
--analyze mode that uses entrenar::prune::calibrate to score layer importance
- Add combined
--distill flag on apr prune for end-to-end Minitron workflow
- Add
entrenar::distill::progressive for gradual pruning schedules
- JSON output for all modes (pruning plan, progress, quality metrics)
Entrenar Modules to Surface
| Module |
Purpose |
entrenar::prune::pipeline |
Pruning execution pipeline |
entrenar::prune::calibrate |
Layer importance scoring |
entrenar::prune::schedule |
Gradual pruning schedules |
entrenar::prune::config |
Pruning configuration |
entrenar::prune::callback |
Training callbacks for pruning |
entrenar::prune::data_loader |
Calibration data loading |
entrenar::distill::loss |
Distillation loss functions (KL, MSE) |
entrenar::distill::progressive |
Progressive distillation |
entrenar::distill::ensemble |
Ensemble distillation |
Acceptance Criteria
Related Issues
References
Summary
Add
apr pruneandapr distillsubcommands that surface entrenar's pruning and distillation pipelines. This is the #5 most common operation on HuggingFace models — NVIDIA's Minitron paper showed you can derive an 8B model from 15B using 40x fewer training tokens than training from scratch.Currently #149 tracks Lottery Ticket Hypothesis pruning. This issue is broader: structured pruning (depth/width/attention/MLP) + knowledge distillation as a combined pipeline, matching what NVIDIA, Intel, and HuggingFace recommend.
Motivation
entrenar::prunehas calibrate, pipeline, schedule, config, callback, data_loader, and trainer_integration.entrenar::distillhas loss, progressive, and ensemble distillation.aprhas nopruneordistillsubcommands. The entrenar modules are fully internal.Proposed CLI
Implementation Path
entrenar::prune::pipelineintoapr prunesubcommandentrenar::distillintoapr distillsubcommand--analyzemode that usesentrenar::prune::calibrateto score layer importance--distillflag onapr prunefor end-to-end Minitron workflowentrenar::distill::progressivefor gradual pruning schedulesEntrenar Modules to Surface
entrenar::prune::pipelineentrenar::prune::calibrateentrenar::prune::scheduleentrenar::prune::configentrenar::prune::callbackentrenar::prune::data_loaderentrenar::distill::lossentrenar::distill::progressiveentrenar::distill::ensembleAcceptance Criteria
apr prune model.apr --method structured --target-ratio 0.5produces smaller modelapr distill teacher.apr --student pruned.apr --data train.jsonltrains studentapr prune --analyze --jsonreturns layer importance scoresapr prune --plan --jsonreturns time/memory/quality estimates--distillflag works for Minitron-style pipelineapr validateandapr checkRelated Issues
References
src/prune/,src/distill/,crates/entrenar-distill/