feat: apr prune + apr distill — structured pruning and knowledge distillation pipeline

## Summary

Add `apr prune` and `apr distill` subcommands that surface entrenar's pruning and distillation pipelines. This is the **#5 most common operation** on HuggingFace models — NVIDIA's Minitron paper showed you can derive an 8B model from 15B using 40x fewer training tokens than training from scratch.

Currently #149 tracks Lottery Ticket Hypothesis pruning. This issue is broader: structured pruning (depth/width/attention/MLP) + knowledge distillation as a combined pipeline, matching what NVIDIA, Intel, and HuggingFace recommend.

## Motivation

- **Hub ecosystem**: Pruned+distilled models (Nemotron-Mini-4B, Llama-3.2 derived from 3.1) are the highest-impact derivatives — they create genuinely smaller architectures, not just compressed weights.
- **entrenar already has the building blocks**: `entrenar::prune` has calibrate, pipeline, schedule, config, callback, data_loader, and trainer_integration. `entrenar::distill` has loss, progressive, and ensemble distillation.
- **Gap**: No CLI exposure. `apr` has no `prune` or `distill` subcommands. The entrenar modules are fully internal.

## Proposed CLI

```bash
# Structured pruning (remove attention heads + MLP neurons)
apr prune model.apr --method structured --target-ratio 0.5 --calibration data.jsonl -o pruned.apr

# Depth pruning (remove entire layers)
apr prune model.apr --method depth --remove-layers "20-24" -o pruned.apr

# Width pruning (reduce hidden dimensions)
apr prune model.apr --method width --target-hidden 512 -o pruned.apr

# Knowledge distillation (teacher → student)
apr distill teacher.apr --student pruned.apr --data train.jsonl --epochs 3 -o distilled.apr

# Progressive distillation (gradual pruning + distillation)
apr distill teacher.apr --progressive --target-ratio 0.5 --data train.jsonl -o distilled.apr

# Combined prune+distill pipeline (Minitron-style)
apr prune model.apr --method structured --target-ratio 0.5 \
    --distill --data train.jsonl --epochs 3 -o final.apr

# Planning mode (estimate time, memory, expected quality loss)
apr prune model.apr --method structured --target-ratio 0.5 --plan --json

# Analysis mode (identify which heads/layers to prune)
apr prune model.apr --analyze --calibration data.jsonl --json
```

## Implementation Path

1. Wire `entrenar::prune::pipeline` into `apr prune` subcommand
2. Wire `entrenar::distill` into `apr distill` subcommand
3. Add `--analyze` mode that uses `entrenar::prune::calibrate` to score layer importance
4. Add combined `--distill` flag on `apr prune` for end-to-end Minitron workflow
5. Add `entrenar::distill::progressive` for gradual pruning schedules
6. JSON output for all modes (pruning plan, progress, quality metrics)

## Entrenar Modules to Surface

| Module | Purpose |
|--------|---------|
| `entrenar::prune::pipeline` | Pruning execution pipeline |
| `entrenar::prune::calibrate` | Layer importance scoring |
| `entrenar::prune::schedule` | Gradual pruning schedules |
| `entrenar::prune::config` | Pruning configuration |
| `entrenar::prune::callback` | Training callbacks for pruning |
| `entrenar::prune::data_loader` | Calibration data loading |
| `entrenar::distill::loss` | Distillation loss functions (KL, MSE) |
| `entrenar::distill::progressive` | Progressive distillation |
| `entrenar::distill::ensemble` | Ensemble distillation |

## Acceptance Criteria

- [ ] `apr prune model.apr --method structured --target-ratio 0.5` produces smaller model
- [ ] `apr distill teacher.apr --student pruned.apr --data train.jsonl` trains student
- [ ] `apr prune --analyze --json` returns layer importance scores
- [ ] `apr prune --plan --json` returns time/memory/quality estimates
- [ ] Combined `--distill` flag works for Minitron-style pipeline
- [ ] Pruned model passes `apr validate` and `apr check`
- [ ] Testable with tiny models (SmolLM-135M → ~70M should complete in <10min)

## Related Issues

- #149 — Lottery Ticket Hypothesis pruning (open, narrower scope)

## References

- [Minitron: LLM Pruning and Distillation](https://huggingface.co/papers/2408.11796)
- [NVIDIA Model Optimizer](https://github.com/NVIDIA/Model-Optimizer)
- [Compact Language Models via Pruning and KD](https://huggingface.co/papers/2407.14679)
- [GLU-Aware Pruning](https://huggingface.co/blog/oopere/making-llms-smaller-without-breaking-them)
- entrenar source: `src/prune/`, `src/distill/`, `crates/entrenar-distill/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: apr prune + apr distill — structured pruning and knowledge distillation pipeline #247

Summary

Motivation

Proposed CLI

Implementation Path

Entrenar Modules to Surface

Acceptance Criteria

Related Issues

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Module	Purpose
`entrenar::prune::pipeline`	Pruning execution pipeline
`entrenar::prune::calibrate`	Layer importance scoring
`entrenar::prune::schedule`	Gradual pruning schedules
`entrenar::prune::config`	Pruning configuration
`entrenar::prune::callback`	Training callbacks for pruning
`entrenar::prune::data_loader`	Calibration data loading
`entrenar::distill::loss`	Distillation loss functions (KL, MSE)
`entrenar::distill::progressive`	Progressive distillation
`entrenar::distill::ensemble`	Ensemble distillation

feat: apr prune + apr distill — structured pruning and knowledge distillation pipeline #247

Description

Summary

Motivation

Proposed CLI

Implementation Path

Entrenar Modules to Surface

Acceptance Criteria

Related Issues

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions