This repository contains the implementation code and associated data for our work, "Disentangling Language Medium and Cultural Context for Evaluating Multilingual Large Language Models," published in 2025 ACL Main.
This repository provides the code and data for our paper on multilingual LLMs evaluation. We propose a Dual Evaluation Framework that separately considers linguistic medium and cultural context, enabling more nuanced and comprehensive assessment of LLMs across languages and cultures. Our experiments uncover a "Cultural-Linguistic Synergy" phenomenon—LLMs perform better when the question’s cultural background matches the language. Further analysis suggests that the proportion of activated neurons can indicate model performance in multilingual and multicultural settings. Our findings highlight the importance of both cultural and linguistic factors in LLM evaluation.
git clone https://github.com/yingjiahao14/Dual-Eval
cd Dual-EvalThe environment setup for this project follows the guidelines and dependencies outlined in BLEnD. Please refer to their documentation for detailed instructions on configuring the environment and installing required evaluation dependencies.
# Download the Llama-3 model from Hugging Face and store it locally
huggingface-cli download --resume-download meta-llama/Llama-3-8B-Instruct --local-dir model/Llama-3-8B-InstructIf you have already downloaded the models, you may need to update the corresponding paths in utils.py to ensure they point to the correct locations.
Model Inference: Execute the command below to run model inference:
bash model_inference.sh [OPTIONS]You can customize the inference process by adding the following command-line arguments:
-
--cuda-devices: Specify which GPU(s) to use (e.g., "0", "0,1").
Example:
--cuda-devices "0,1" -
--model-keys: Provide a comma-separated list of model names to use for inference.
Example:
--model-keys "Llama3-8b-Instruct,gemma-2-9b-it" -
--country-lang: Specify country-language mappings. Use a comma to separate entries, and a colon to separate country and languages (languages separated by semicolons if multiple).
Example:
--country-lang "China:China,UK,US:US,China" -
--prompt-numbers: Specify which prompt numbers to use (comma-separated).
Example:
--prompt-numbers "inst-4"
Note: If you do not specify these parameters, default values defined in the script will be used.
Model Evaluation: To evaluate model outputs, navigate to the evaluation directory and execute:
cd evaluation/get_performance
bash evaluate.sh [OPTIONS]You can customize the evaluation process by specifying command-line arguments as shown below.
-
--cuda-devices
Specify which GPU(s) to use (e.g., "0", "0,1").
Example:
--cuda-devices "0" -
--model-keys
Comma-separated list of model names to evaluate.
Example:
--model-keys "Llama3-8b-Instruct,gemma-2-9b-it" -
--country-lang
Specify country-language mappings. Use a comma to separate entries, and a colon to separate country and languages (languages separated by commas if multiple).
Example:
--country-lang "China:China,UK,US:US,China" -
--prompt-numbers
Comma-separated list of prompt numbers to use for evaluation.
Example:
--prompt-numbers "inst-8"
Specialized Neurons Calculation: To calculate specialized neurons for your models, follow the steps below:
# This script extracts key neurons for Q_{i,j}
bash get_neuron.sh
# Calculates the proportion of specialized neurons for P_{i,j} using the specified threshold mode
python get_specialized_neuron.py --mode [MODE]You can choose different threshold functions for neuron selection by specifying the --mode argument in get_specialized_neuron.py. The available modes and their possible values are:
-
layer-topk (default):
-
layer-topscore:
-
global_topk:
If you find this work helpful, please consider citing:
@misc{ying2025disentanglinglanguagecultureevaluating,
title={Disentangling Language and Culture for Evaluating Multilingual Large Language Models},
author={Jiahao Ying and Wei Tang and Yiran Zhao and Yixin Cao and Yu Rong and Wenxuan Zhang},
year={2025},
eprint={2505.24635},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.24635},
}