The Canary in the Carry Chain

Code, data, and paper sources for "The Canary in the Carry Chain: Transformers Know the Schedule Before They Can Execute" (NeurIPS 2026 submission).

What the paper claims

When a transformer fails on an iterative algorithmic task, it has often failed to execute, not to schedule. We factor such tasks as $y = E(x, c(x))$ for a discrete controller $c(x)$ and an executor $E$, and we measure controller-state representations directly on the long Collatz step in base 32, where the controller is the pair $(k, k')$ giving the loop counts of the multiplicative step.

A class-balanced linear probe at encoder Layer 2 decodes $k$ at $100%$ while shuffled, within-length, untrained, and rank-limited controls all stay at chance. In a decoder-only replication the $k$-probe leads exact-match by 43 epochs at the $90%$ threshold. Ablating the Layer 2 feed-forward block collapses exact-match from $99.09%$ to $23.15%$ while the $k$-probe stays at $99.75%$, dissociating the controller from the executor.

On a unified $N = 2000$ class-balanced grid we show a three-effect decomposition for explicit controller interfaces (ECI). Dedicated interface slots alone shift $k_{95}$ from 5 to 6 and lift $A_{k=7}$ to $91.3%$. Consistent input-tied codes add a small further increment. Alignment between the model's predicted controller and the interface embedding shifts the next failure boundary, lifting $A_{k=8}$ to a maximum of $77.5%$ across seven $1000$-epoch seeds, with four of seven seeds at or above $20%$. The corresponding upper bound for predicted interfaces without alignment is $20.7%$.

A theorem in Section 4 shows the conditions under which a deterministic interface, which adds no Shannon information beyond the input, can still move the achievable frontier of a restricted executor class. A cross-seed Layer 2 MLP comparison shows the high-$A_{k=8}$ basin contains many distinct distributed solutions rather than one shared circuit.

Repository layout

The Python and shell files at the repo root are the training, probing, evaluation, and circuit-analysis code. The most relevant entry points are eci_suite.py, eci_placebos.py, circuit_basin.py, circuit_patch.py, probe_balanced.py, decoder_only.py, and the eci_phase_*.py scripts.
results_final/ contains all raw experiment outputs, organized by experiment family. See results_final/MANIFEST.md for the family-level index and results_final/all_results.csv for a consolidated table of headline metrics by (variant, seed).

Reproducing the headline numbers

Long Collatz step in base 32, seven oracle_aligned seeds at 1000 epochs, unified $N = 2000$ grid:

Seed	$A_{\mathrm{bal}}$	$A_{\mathrm{hard}}$	$A_{k=7}$	$A_{k=8}$	$k_{95}$
main	73.39%	45.44%	95.62%	40.69%	7
100	76.13%	54.97%	87.38%	77.53%	6
890	73.91%	47.26%	90.98%	50.80%	6
789	68.83%	30.37%	91.12%	0.00%	6
234	65.48%	20.01%	37.01%	23.01%	6
456	60.93%	15.02%	43.97%	1.09%	4
567	60.58%	14.68%	43.93%	0.10%	4

Mean $A_{k=8}$ is $27.60%$ (std $30.12$). Four of seven seeds reach $A_{k=8} \ge 20%$.

Running the code

pip install torch numpy matplotlib tqdm

# Train the headline 3x+1 base-32 model
python run.py train --base 32 --dev cuda

# Train an ECI variant (one of: strong_baseline, null_slots, iid_marginal,
# shuffled, fixed_permutation, predicted_ss, oracle_aligned, oracle_both,
# eci_baseline, aux_only)
python eci_suite.py oracle_aligned --base 32 --epochs 1000 --out output_eci/oracle_aligned

# Run the per-neuron Layer 2 MLP causal contribution sweep
python circuit_basin.py output_seeds_1k_locks/oracle_aligned_s100 8 100

# Cross-seed activation patching
python circuit_patch.py output_seeds_1k_locks/oracle_aligned_s890 \
                        output_seeds_1k_locks/oracle_aligned_s567 8 200

# Class-balanced probe selectivity sweep
python probe_balanced.py --base 32 --ckpt output/b32/best.pt

# Decoder-only replication
python decoder_only.py --base 32 --epochs 300

# Regenerate every paper figure
for f in paper/make_*.py; do python "$f"; done

A 1000-epoch oracle_aligned run takes about 6 hours on a single H100. The full ECI sweep at 1000 epochs each takes about 30 H100-hours when run with multiple variants in parallel. The cross-seed circuit comparison takes about an hour per seed.

Model weights

Local model checkpoints (*.pt files for the headline output_mps/b32, the decoder-only output_do, the controller-only output_ctrl, and one output_eci_seeds/baseline_s123 seed, totaling ~2 GB) are uploaded separately to Zenodo. The raw metrics.json and balanced_stats.json files needed to reproduce every paper number are in results_final/ and do not require the weights.

Compute

Total compute reported for the experiments in the paper is about 195 H100-equivalent hours, summarized in Appendix I (Table 9) of the paper.

References

Charton, F. and Narayanan, A. (2025). Transformers know more than they can tell: Learning the Collatz sequence. arXiv:2511.10811
Turner, A. et al. (2023). Activation addition: Steering language models without optimization. arXiv:2308.10248
Nanda, N. et al. (2023). Progress measures for grokking via mechanistic interpretability. ICLR 2023
Conmy, A. et al. (2024). How to use and interpret activation patching. arXiv:2404.15255
Nye, M. et al. (2022). Show your work: Scratchpads for intermediate computation with language models. ICLR 2022
McLeish, S. et al. (2024). Transformers can do arithmetic with the right embeddings. NeurIPS 2024

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
lean		lean
output		output
output5		output5
output_ctrl		output_ctrl
output_ctrl_test		output_ctrl_test
output_do		output_do
output_mps/b32		output_mps/b32
outputexp		outputexp
outputorbit2		outputorbit2
results_final		results_final
.gitignore		.gitignore
EXPERIMENTS_INDEX.md		EXPERIMENTS_INDEX.md
MANIFEST.md		MANIFEST.md
README.md		README.md
WEIGHTS_MANIFEST.md		WEIGHTS_MANIFEST.md
circuit.py		circuit.py
circuit_basin.py		circuit_basin.py
circuit_patch.py		circuit_patch.py
collatz.py		collatz.py
collatz5.py		collatz5.py
config.py		config.py
control_token.py		control_token.py
data.py		data.py
decoder_only.py		decoder_only.py
eci_analysis.py		eci_analysis.py
eci_phase_a.py		eci_phase_a.py
eci_phase_b.py		eci_phase_b.py
eci_phase_c.py		eci_phase_c.py
eci_phase_c_single.py		eci_phase_c_single.py
eci_phase_d.py		eci_phase_d.py
eci_placebos.py		eci_placebos.py
eci_suite.py		eci_suite.py
evaluate.py		evaluate.py
experiments_arch.py		experiments_arch.py
experiments_bases.py		experiments_bases.py
experiments_extended.py		experiments_extended.py
experiments_multiseed.py		experiments_multiseed.py
fast_train.py		fast_train.py
launch_aws.sh		launch_aws.sh
launch_parallel.sh		launch_parallel.sh
launch_phase_c.sh		launch_phase_c.sh
model.py		model.py
modexp.py		modexp.py
new_proofs.tex		new_proofs.tex
orbit.py		orbit.py
plots.py		plots.py
probe.py		probe.py
probe_balanced.py		probe_balanced.py
requirements.txt		requirements.txt
review_memo.md		review_memo.md
run.py		run.py
run_new_experiments.sh		run_new_experiments.sh
run_oral_experiments.py		run_oral_experiments.py
run_remaining.sh		run_remaining.sh
steer.py		steer.py
test.py		test.py
train.py		train.py
train5.py		train5.py
trainexp.py		trainexp.py
trainorbit.py		trainorbit.py
trainorbit2.py		trainorbit2.py
transcoder.py		transcoder.py
verify.py		verify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Canary in the Carry Chain

What the paper claims

Repository layout

Reproducing the headline numbers

Running the code

Model weights

Compute

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Canary in the Carry Chain

What the paper claims

Repository layout

Reproducing the headline numbers

Running the code

Model weights

Compute

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages