Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese
This is the repository for our EMNLP 2025 paper. We propose that translationese should be a graded phenomenon, that is, one translation can contain more or less translationese than another. We step away from the traditional binary classification of translated vs. original text, and instead attempt to directly measure the degree of translationese in a text. In this paper, we compare several scoing functions for measuring translationese to find a better Translationese-index, which can measure translationese in a graded and generalizable manner.
We primarily experiment on two settings:
- Multi-genre synthetic translations: methods should be able to classify high-translationese data from low-transaltionese data in multiple genres (generalizable).
- In-the-wild translations with human annotations: methods should correlate well with human annotations, both for pointwise and pairwise annotations (graded).
All data used in our experiments can be found in the data/ folder.
And we provide implementations for all methods we compare in the src/ folder, and scripts for training and evaluating these methods in the scripts/ folder.
Among all methods, we find that the best method (so far) is the likelihood ratios of two contrastively fine-tuned LLMs (T-index), one trained on high-translationese data and the other trained on low-translationese data. The codes for batch inference of this method can be found in src/t_index.py.
# script for reproducing results in the paper
export CUDA_VISIBLE_DEVICES=0
for seed in 10 20 30; do
bash scripts/train/sft.sh oliver_twist_qwen ${seed} 1000 1e-6 3 16
bash scripts/train/dpo.sh oliver_twist_qwen ${seed} 1000 16
bash scripts/train/rm.sh oliver_twist_qwen ${seed} 1000
bash scripts/train/xlmr.sh oliver_twist_qwen ${seed}
done
bash scripts/train/sft.sh coca_blog_llama 10 1000 1e-6 3 16
export CUDA_VISIBLE_DEVICES=0,1
bash scripts/train/sft.sh mixture 10 5000 2.7e-5 1 32
bash scripts/train/sft.sh mixture 10 3000 2.7e-5 1 32
export CUDA_VISIBLE_DEVICES=0
bash scripts/train/sft.sh mixture 10 1000 1e-6 3 16
for n_samples in 5000 3000 1000; do
bash scripts/train/rm.sh mixture 10 ${n_samples}
done
export CUDA_VISIBLE_DEVICES=0,1
for n_samples in 5000 3000 1000; do
bash scripts/train/dpo.sh mixture 10 ${n_samples} 8
done
bash scripts/run/synthtic.sh
bash scripts/run/wild.sh