Official code for the paper Transducing Language Models (ICLR 2026).
A language model defines a distribution over strings, but downstream tasks often need a different output format — words instead of byte-pair tokens, characters instead of subwords, amino acids instead of DNA. This library takes a language model composed with a finite-state transducer (FST) and gives it the standard autoregressive interface (next-symbol distributions and prefix probabilities) over the transformed output, so it drops into any system built for ordinary language models.
A source language model
Sampling from
That sum is generally infinite. The library computes it in finite time by
decomposing the precover
Take the transducer
──▶ (( q0 )) ⟲ for every input character c, an arc c : lowercase(c)
e.g. A:a B:b … a:a b:b …
For the target string ab. Written with the paper's
basis (cylinder) notation
So the transduced prefix probability is a finite sum of four source prefix
probabilities — these four strings form the quotient
Computing that decomposition — and the next-symbol distribution it yields — is
exactly what TransducedLM does.
TransducedLM(vfst, lm, config) takes a VectorizedFST (the transducer), a
source LM scorer, and a Config. The example below lifts GPT-2 (a token-level
model) to a byte-level model with the hf_realpha transducer, then queries
the autoregressive interface of the transduced model:
import asyncio
from genlm.backend import load_model_by_name
from transduced_lm import Config, TransducedLM
from transduced_lm.benchmark.transducer import load_transducer
# Source model: GPT-2 over byte-pair tokens.
llm = load_model_by_name("gpt2", backend="hf")
# Compose it with a transducer f. "hf_realpha" maps GPT-2 tokens → bytes,
# turning the token-level model into a character/byte-level language model.
setup = load_transducer("hf_realpha", llm=llm, model_name="gpt2")
tlm = TransducedLM(setup.vfst, setup.lm, Config(prune_threshold=1e-3))
async def main():
# Encode a target (byte) prefix as output-symbol ids.
ctx = tuple(setup.out_sym_to_id[str(b)] for b in b"Hello")
# Next-symbol distribution of the transduced model p_Y: log p_Y(byte | "Hello")
dist = await tlm.logp_next(ctx)
best = max(dist, key=dist.get)
print(f"most likely next byte: {chr(best)!r} (logp={dist[best]:.3f})")
# The precover decomposition behind that number (quotient + remainder beams):
remainder, quotient = await tlm.decompose(ctx)
asyncio.run(main())To use your own transformation you need a VectorizedFST. Build the
transducer either directly in
pynini (symbols are the
string forms of integer ids; epsilon is label 0), or with the bundled
FST class in
transduced_lm.benchmark.ptb.fst and
convert it to pynini with transduction_fst_to_pynini from
transduced_lm.benchmark.ptb.fst_converter.
Then wrap it in VectorizedFST, call compute_universal_states(), and pass it
to TransducedLM with any source LM exposing async logp_next_for(ctx) -> ndarray.
The three transducers from the paper are constructed in
src/transduced_lm/benchmark/fst_loaders.py and ptb_fst_builder.py.
Requires Python ≥ 3.10 (tested on 3.12). A fresh environment is recommended:
conda create -n tlm python=3.12 && conda activate tlm # or: python -m venv
pip install -e .This installs pinned dependency versions matching the paper experiments
(pynini, genlm-bytes, genlm-backend, torch, transformers, …). For
GPU-accelerated inference with vLLM (used for all paper experiments):
pip install -e ".[vllm]"The experiment scripts require a CUDA GPU.
To verify the install end-to-end — package import, vLLM, HuggingFace model download/load, and dataset download — run the setup smoke test (loads each transducer + model and scores a few symbols, using the same CLI as the experiment scripts below):
bash scripts/smoke_test.sh # gpt2-large + vesteinn/gpt2-dna; no HF login neededThe three transformations studied in the paper:
hf_realpha— tokens → bytes (turns a subword LM into a character-level model).ptb_ported— tokens → words (applies Penn Treebank tokenization as a transduction).hf_dna2aa— DNA → amino acids.
The pretrained DNA model is on the Hugging Face Hub at
vesteinn/gpt2-dna and downloads
automatically when you pass --model vesteinn/gpt2-dna. Llama models require
huggingface-cli login for gated access.
Single quick runs:
bash scripts/run_realpha.sh # tokens → bytes, GPT-2 Large, one paragraph
bash scripts/run_ptb.sh # tokens → words (Penn Treebank)
bash scripts/run_dna2aa.sh # DNA → amino acidsFull pipeline (benchmarks → CSVs → LaTeX tables → figures):
bash scripts/experiments/run_all.sh --quick # fast smoke test
bash scripts/experiments/run_all.sh # full reproductionSee scripts/experiments/README.md for the
mapping from scripts to paper tables/figures and per-experiment parameters. The
scripts/experiments/paper_runs/ directory holds the exact SLURM (sbatch)
scripts used for the paper, including the Phi-4 runs. Scripts use the active
Python environment; set CONDA_ENV=<name> to have them activate a named conda
environment automatically.
@inproceedings{snbjarnarson2026transducing,
title = {Transducing Language Models},
author = {V\'esteinn Sn{\ae}bjarnarson and Samuel Kiegeland and Tianyu Liu and Reda Boumasmoud and Ryan Cotterell and Tim Vieira},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026},
url = {https://openreview.net/forum?id=qOyF214xmg}
}