Skip to content

feat: apr prune + apr distill — structured pruning and knowledge distillation pipeline #247

@noahgift

Description

@noahgift

Summary

Add apr prune and apr distill subcommands that surface entrenar's pruning and distillation pipelines. This is the #5 most common operation on HuggingFace models — NVIDIA's Minitron paper showed you can derive an 8B model from 15B using 40x fewer training tokens than training from scratch.

Currently #149 tracks Lottery Ticket Hypothesis pruning. This issue is broader: structured pruning (depth/width/attention/MLP) + knowledge distillation as a combined pipeline, matching what NVIDIA, Intel, and HuggingFace recommend.

Motivation

  • Hub ecosystem: Pruned+distilled models (Nemotron-Mini-4B, Llama-3.2 derived from 3.1) are the highest-impact derivatives — they create genuinely smaller architectures, not just compressed weights.
  • entrenar already has the building blocks: entrenar::prune has calibrate, pipeline, schedule, config, callback, data_loader, and trainer_integration. entrenar::distill has loss, progressive, and ensemble distillation.
  • Gap: No CLI exposure. apr has no prune or distill subcommands. The entrenar modules are fully internal.

Proposed CLI

# Structured pruning (remove attention heads + MLP neurons)
apr prune model.apr --method structured --target-ratio 0.5 --calibration data.jsonl -o pruned.apr

# Depth pruning (remove entire layers)
apr prune model.apr --method depth --remove-layers "20-24" -o pruned.apr

# Width pruning (reduce hidden dimensions)
apr prune model.apr --method width --target-hidden 512 -o pruned.apr

# Knowledge distillation (teacher → student)
apr distill teacher.apr --student pruned.apr --data train.jsonl --epochs 3 -o distilled.apr

# Progressive distillation (gradual pruning + distillation)
apr distill teacher.apr --progressive --target-ratio 0.5 --data train.jsonl -o distilled.apr

# Combined prune+distill pipeline (Minitron-style)
apr prune model.apr --method structured --target-ratio 0.5 \
    --distill --data train.jsonl --epochs 3 -o final.apr

# Planning mode (estimate time, memory, expected quality loss)
apr prune model.apr --method structured --target-ratio 0.5 --plan --json

# Analysis mode (identify which heads/layers to prune)
apr prune model.apr --analyze --calibration data.jsonl --json

Implementation Path

  1. Wire entrenar::prune::pipeline into apr prune subcommand
  2. Wire entrenar::distill into apr distill subcommand
  3. Add --analyze mode that uses entrenar::prune::calibrate to score layer importance
  4. Add combined --distill flag on apr prune for end-to-end Minitron workflow
  5. Add entrenar::distill::progressive for gradual pruning schedules
  6. JSON output for all modes (pruning plan, progress, quality metrics)

Entrenar Modules to Surface

Module Purpose
entrenar::prune::pipeline Pruning execution pipeline
entrenar::prune::calibrate Layer importance scoring
entrenar::prune::schedule Gradual pruning schedules
entrenar::prune::config Pruning configuration
entrenar::prune::callback Training callbacks for pruning
entrenar::prune::data_loader Calibration data loading
entrenar::distill::loss Distillation loss functions (KL, MSE)
entrenar::distill::progressive Progressive distillation
entrenar::distill::ensemble Ensemble distillation

Acceptance Criteria

  • apr prune model.apr --method structured --target-ratio 0.5 produces smaller model
  • apr distill teacher.apr --student pruned.apr --data train.jsonl trains student
  • apr prune --analyze --json returns layer importance scores
  • apr prune --plan --json returns time/memory/quality estimates
  • Combined --distill flag works for Minitron-style pipeline
  • Pruned model passes apr validate and apr check
  • Testable with tiny models (SmolLM-135M → ~70M should complete in <10min)

Related Issues

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions