Skip to content

HectorHHZ/Sparse_Matrix_Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SMT: Fine-Tuning Large Language Models with Sparse Matrices

The implementation of SMT: Fine-Tuning Large Language Models with Sparse Matrices. This paper introduce a method for selecting sparse sub-matrices that aims to minimize the performance gap between PEFT vs. full fine-tuning (FT) while also reducing both fine-tuning computational costs and memory costs. We explored both gradient-based and activation-based parameter selection methods to identify the most significant sub-matrices for downstream tasks, updating only these blocks during fine-tuning. In our experiments, we demonstrated that SMT consistently surpasses other PEFT baselines (e.g., LoRA and DoRA) in fine-tuning popular large language models such as LLaMA across a broad spectrum of tasks, while reducing the GPU memory footprint by 67% compared to FT.

We implemented SMT in two frameworks: DeepSpeed and Hugging Face Trainer/PEFT. Instructions for setting up the environment, training, and evaluation can be found in the subfolders.

Latest News ๐Ÿ”ฅ๐Ÿ”ฅ

Latest Results on DeepSeek-R1-Distill Model

Obsrervation 1: Deepseek-R1-Distill-LLaMA8B model underperforms the base LLaMA-3-8B model on Commonsense Reasoning dataset without reasoning trace.

DeepSeek-R1-Distill-LLaMA8B BoolQ PIQA SIQA HellaSwag WinoGrande ARC-e ARC-c OBQA AVG
base 53.9 50.0 37.4 23.4 25.3 30.6 28.0 23.4 34.0
SMT(0.86%) 70.6 66.4 77.8 62.4 84.6 60.9 53.2 72.6 68.6
Full FT. 71.0 66.2 76.5 62.2 85.4 61.8 52.8 72.6 68.6
LLaMA3-8B Model BoolQ PIQA SIQA HellaSwag WinoGrande ARC-e ARC-c OBQA AVG
SMT(0.71%) 75.7 88.4 81.4 96.2 88.2 92.7 83.2 88.6 86.8

Observation 2: Deepseek-R1-Distill-LLaMA8B model largely outperforms the base LLaMA-3-8B model on Math Reasoning dataset with reasoning trace.

DeepSeek-R1-DistillLLaMA8B Model GSM8k SingleEq SVAMP MultiArith AddSub AQuA AVG
SMT(0.71%) 60.8 92.5 70.6 95.3 87.3 31.2 73.0
LLaMA3-8B Model GSM8k SingleEq SVAMP MultiArith AddSub AQuA AVG
SMT(0.71%) 42.8 88.5 60.4 93.9 85.8 25.2 66.1

Datasets

For Commonsense Reasoning downstream evaluation datasets, we use the processed dataset splits provided by the LLM-Adapters repository: https://github.com/AGI-Edgerunners/LLM-Adapters/tree/main/dataset. We follow the same dataset organization and evaluation settings as described in the LLM-Adapters paper.

Citation

If you find this repository or our paper useful in your research, please consider citing:

@inproceedings{he2025smt,
  title={SMT: Fine-Tuning Large Language Models with Sparse Matrices},
  author={He, Haoze and Li, Juncheng B and Jiang, Xuan and Miller, Heather},
  booktitle={The Thirteenth International Conference on Learning Representations}
}

About

Github repo for ICLR-2025 paper, Fine-tuning Large Language Models with Sparse Matrices

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages