Summary
Add Mixture of Experts (MoE) architecture for specialized ensemble learning. This enables multiple expert models with a learnable gating network that routes inputs to the most appropriate expert(s).
Motivation
Use case: depyler-oracle for transpiler error classification.
Current single RandomForest handles all error types equally. MoE would allow:
- Scope Expert: E0425, E0412 (variable/import resolution)
- Type Expert: E0308, E0277 (casts, trait bounds)
- Method Expert: E0599 (API mapping)
Each expert specializes, improving accuracy on edge cases within categories.
Proposed API
use aprender::ensemble::{MixtureOfExperts, GatingNetwork, SoftmaxGating};
// Define experts
let experts = vec![
RandomForest::new(100, 10), // scope expert
RandomForest::new(100, 10), // type expert
RandomForest::new(100, 10), // method expert
];
// Gating network (routes inputs to experts)
let gating = SoftmaxGating::new(n_features, n_experts);
// MoE ensemble
let moe = MixtureOfExperts::builder()
.experts(experts)
.gating(gating)
.top_k(2) // sparse: only top 2 experts per input
.build();
// Train end-to-end
moe.fit(&X_train, &y_train)?;
// Predict (weighted combination of expert outputs)
let predictions = moe.predict(&X_test)?;
Core Components
1. GatingNetwork Trait (~50 LOC)
pub trait GatingNetwork: Send + Sync {
/// Compute expert weights for input
fn forward(&self, x: &[f32]) -> Vec<f32>;
/// Train gating network
fn fit(&mut self, X: &[Vec<f32>], expert_losses: &[Vec<f32>]) -> Result<()>;
}
2. SoftmaxGating (~100 LOC)
pub struct SoftmaxGating {
weights: Matrix<f32>, // [n_features, n_experts]
temperature: f32,
}
3. MixtureOfExperts (~150 LOC)
pub struct MixtureOfExperts<E: Estimator, G: GatingNetwork> {
experts: Vec<E>,
gating: G,
top_k: usize,
load_balance_weight: f32, // optional: encourage even expert usage
}
Training Strategy
- Option A - Joint training: Train gating + experts together (complex)
- Option B - Two-stage (recommended):
- Stage 1: Pre-train experts on labeled subsets
- Stage 2: Train gating to route to best expert
Nice-to-Have Features
Estimated Effort
- Core implementation: ~200 LOC
- Tests: ~100 LOC
- Total: 1-2 days
References
Summary
Add Mixture of Experts (MoE) architecture for specialized ensemble learning. This enables multiple expert models with a learnable gating network that routes inputs to the most appropriate expert(s).
Motivation
Use case: depyler-oracle for transpiler error classification.
Current single RandomForest handles all error types equally. MoE would allow:
Each expert specializes, improving accuracy on edge cases within categories.
Proposed API
Core Components
1. GatingNetwork Trait (~50 LOC)
2. SoftmaxGating (~100 LOC)
3. MixtureOfExperts (~150 LOC)
Training Strategy
Nice-to-Have Features
Estimated Effort
References