Neural Thickets

Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan Phillip Isola

Massachusetts Institute of Technology

arXiv Colab Demo GitHub

Needle in a Haystack: sparse solutions in small models — Needle in a Haystack Small or untrained models — solutions to downstream tasks are sparse & hard to find

Neural Thicket: dense solutions in large models — Neural Thicket Large pretrained models — solutions to downstream tasks are dense & easy to discover

Note: the two figures above are generated by Gemini

Core finding: The neighborhood around pretrained weights already contains task-specific experts. In small models they are sparse and hard to find; in large models they are dense and easy to discover.

This motivates a simple post-training algorithm we call RandOpt: sample N weight vectors, keep the top K, and majority vote at inference time.

Solution Density & Diversity Around Pretrained Weights

Click either figure to see the accuracy landscape

Small model: needle in a haystack regime

Needle in a Haystack. Small models live in a regime where good solutions to downstream tasks occupy a tiny fraction of the surrounding weight space. A smart search algorithm, such as gradient descent or other iterative optimization, is essential to find them.

Neural Thicket. Large pretrained models are surrounded by a dense thicket of task-specific solutions. Random sampling is sufficient to quickly land on promising adaptations, which can then be ensembled to yield strong behavior. We call this approach RandOpt.

Scaling Law of Solution Density

Scaling Law of Solution Diversity

The RandOpt Algorithm

RandOpt: Random Guessing in Weight Space + Ensemble

# Training: Select top-K seeds based on D_train performance

seeds = [sample_seed() for _ in range(N)]

sigmas_per_seed = [sigmas[i // (N // len(sigmas))]

for i in range(N)]

## evaluate all perturbed models

scores = [evaluate(theta + sigmas_per_seed[i] * eps(seed[i]), D_train)

for i in range(N)]

top_indices = topk(scores, K).indices

# Inference: Ensemble predictions on test input x

answers = [generate(theta + sigmas_per_seed[i] * eps(seed[i]), x)

for i in top_indices]

prediction = majority_vote(answers)

Pre-trained model θ

Post-training with RandOpt

Benchmark Results

Select a task and model to compare methods