Overview Scaling Laws RandOpt Results

Neural Thickets

Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan Phillip Isola

Massachusetts Institute of Technology

Needle in a Haystack: sparse solutions in small models
Needle in a Haystack Small or untrained models — solutions to downstream tasks are sparse & hard to find
Neural Thicket: dense solutions in large models
Neural Thicket Large pretrained models — solutions to downstream tasks are dense & easy to discover

Note: the two figures above are generated by Gemini


Core finding: The neighborhood around pretrained weights already contains task-specific experts. In small models they are sparse and hard to find; in large models they are dense and easy to discover.

This motivates a simple post-training algorithm we call RandOpt: sample N weight vectors, keep the top K, and majority vote at inference time.

Solution Density & Diversity Around Pretrained Weights

Click either figure to see the accuracy landscape

Scaling Law of Solution Density
Scaling Law of Solution Diversity
The RandOpt Algorithm
RandOpt: Random Guessing in Weight Space + Ensemble
θ (pretrained)
# Training: Select top-K seeds based on D_train performance
seeds = [sample_seed() for _ in range(N)]
sigmas_per_seed = [sigmas[i // (N // len(sigmas))]
for i in range(N)]
## evaluate all perturbed models
scores = [evaluate(theta + sigmas_per_seed[i] * eps(seed[i]), D_train)
for i in range(N)]
top_indices = topk(scores, K).indices
# Inference: Ensemble predictions on test input x
answers = [generate(theta + sigmas_per_seed[i] * eps(seed[i]), x)
for i in top_indices]
prediction = majority_vote(answers)
Pre-trained model θ
Post-training with RandOpt
Benchmark Results

Benchmark Results

Select a task and model to compare methods