- Features
- Benchmark
- Changelog
- Supported Models
- Supported Training Approaches
- Provided Datasets
- Requirement
- Getting Started
- Projects using LLaMA Factory
- License
- Citation
- Acknowledgement
Pre-training datasets
pip install --upgrade huggingface_hub
huggingface-cli login| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.8 | 3.11 |
| torch | 1.13.1 | 2.4.0 |
| transformers | 4.41.2 | 4.43.4 |
| datasets | 2.16.0 | 2.20.0 |
| accelerate | 0.30.1 | 0.32.0 |
| peft | 0.11.1 | 0.12.0 |
| trl | 0.8.6 | 0.9.6 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.5.0 |
| flash-attn | 2.3.0 | 2.6.3 |
* estimated
| Method | Bits | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B |
|---|---|---|---|---|---|---|---|---|
| Full | AMP | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB |
| Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB |
| Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB |
| LoRA/GaLore/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
Important
Installation is mandatory.
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"Extra dependencies available: torch, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, galore, badam, adam-mini, qwen, modelscope, openmind, swanlab, quality
Tip
Use pip install --no-deps -e . to resolve package conflicts.
For Windows users
If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you need to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.2, please select the appropriate release version based on your CUDA version.
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whlTo enable FlashAttention-2 on the Windows platform, you need to install the precompiled flash-attn library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from flash-attention based on your requirements.
Please refer to data/README.md for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope / Modelers hub or load the dataset in local disk.
Note
Please update data/dataset_info.json to use your custom dataset.
Use the following 3 commands to run LoRA fine-tuning, inference and merging of the Llama3-8B-Instruct model, respectively.
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yamlSee examples/README.md for advanced usage (including distributed training).
Tip
Use llamafactory-cli help to show help information.
Fine-Tuning with LLaMA Board GUI (powered by Gradio)
llamafactory-cli webuiFor CUDA users:
cd docker/docker-cuda/
docker compose up -d
docker compose exec llamafactory bashFor AMD ROCm users:
cd docker/docker-rocm/
docker compose up -d
docker compose exec llamafactory bashBuild without Docker Compose
For CUDA users:
docker build -f ./docker/docker-cuda/Dockerfile \
--build-arg INSTALL_BNB=false \
--build-arg INSTALL_VLLM=false \
--build-arg INSTALL_DEEPSPEED=false \
--build-arg INSTALL_FLASHATTN=false \
--build-arg PIP_INDEX=https://pypi.org/simple \
-t llamafactory:latest .
docker run -dit --gpus=all \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./data:/app/data \
-v ./output:/app/output \
-p 7860:7860 \
-p 8000:8000 \
--shm-size 16G \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bashFor AMD ROCm users:
docker build -f ./docker/docker-rocm/Dockerfile \
--build-arg INSTALL_BNB=false \
--build-arg INSTALL_VLLM=false \
--build-arg INSTALL_DEEPSPEED=false \
--build-arg INSTALL_FLASHATTN=false \
--build-arg PIP_INDEX=https://pypi.org/simple \
-t llamafactory:latest .
docker run -dit \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./data:/app/data \
-v ./output:/app/output \
-v ./saves:/app/saves \
-p 7860:7860 \
-p 8000:8000 \
--device /dev/kfd \
--device /dev/dri \
--shm-size 16G \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bashDetails about volume
hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.ms_cache: Similar to Hugging Face cache but for ModelScope users.om_cache: Similar to Hugging Face cache but for Modelers users.data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.output: Set export dir to this location so that the merged result can be accessed directly on the host machine.
API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yamlIf you have trouble with downloading models and datasets from Hugging Face, you can use ModelScope.
export USE_MODELSCOPE_HUB=1 # `set USE_MODELSCOPE_HUB=1` for WindowsTrain the model by specifying a model ID of the ModelScope Hub as the model_name_or_path. You can find a full list of model IDs at ModelScope Hub, e.g., LLM-Research/Meta-Llama-3-8B-Instruct.
You can also use Modelers Hub to download models and datasets.
export USE_OPENMIND_HUB=1 # `set USE_OPENMIND_HUB=1` for WindowsTrain the model by specifying a model ID of the Modelers Hub as the model_name_or_path. You can find a full list of model IDs at Modelers Hub, e.g., TeleAI/TeleChat-7B-pt.
To use Weights & Biases for logging experimental results, you need to add the following arguments to yaml files.
report_to: wandb
run_name: test_run # optionalSet WANDB_API_KEY to your key when launching training tasks to log in with your W&B account.
To use SwanLab for logging experimental results, you need to add the following arguments to yaml files.
use_swanlab: true
swanlab_run_name: test_run # optionalWhen launching training tasks, you can log in to SwanLab in three ways:
- Add
swanlab_api_key=<your_api_key>to the yaml file, and set it to your API key. - Set the environment variable
SWANLAB_API_KEYto your API key. - Use the
swanlab logincommand to complete the login.
If you have a project that should be incorporated, please contact via email or create a pull request.
Click to show
- Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [arxiv]
- Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [arxiv]
- Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [arxiv]
- Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [arxiv]
- Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [arxiv]
- Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. KDD 2024. [arxiv]
- Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. ACL 2024. [arxiv]
- Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [arxiv]
- Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [arxiv]
- Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [arxiv]
- Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [arxiv]
- Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [arxiv]
- Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [arxiv]
- Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL 2024 Findings. [arxiv]
- Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [arxiv]
- Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [arxiv]
- Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv]
- Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [arxiv]
- Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [arxiv]
- Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [arxiv]
- Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [arxiv]
- Wu et al. Large Language Models are Parallel Multilingual Learners. 2024. [arxiv]
- Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [arxiv]
- Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [arxiv]
- Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. COLING 2024. [arxiv]
- Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [arxiv]
- Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [arxiv]
- Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [arxiv]
- Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [arxiv]
- Ma et al. Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation. ICML 2024. [arxiv]
- Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [arxiv]
- Shang et al. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models. 2024. [arxiv]
- Huang et al. LLMTune: Accelerate Database Knob Tuning with Large Language Models. 2024. [arxiv]
- Deng et al. Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction. 2024. [arxiv]
- Acikgoz et al. Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare. 2024. [arxiv]
- Zhang et al. Small Language Models Need Strong Verifiers to Self-Correct Reasoning. ACL 2024 Findings. [arxiv]
- Zhou et al. FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering. NAACL 2024. [arxiv]
- Xu et al. Large Language Models for Cyber Security: A Systematic Literature Review. 2024. [arxiv]
- Dammu et al. "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations. 2024. [arxiv]
- Yi et al. A safety realignment framework via subspace-oriented model fusion for large language models. 2024. [arxiv]
- Lou et al. SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling. 2024. [arxiv]
- Zhang et al. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners. 2024. [arxiv]
- Zhang et al. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. 2024. [arxiv]
- Zihong Chen. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. 2024. [paper]
- Gao et al. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. 2024. [arxiv]
- Wang and Song. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset. 2024. [arxiv]
- Hu et al. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models. 2024. [arxiv]
- Ge et al. Time Sensitive Knowledge Editing through Efficient Finetuning. ACL 2024. [arxiv]
- Tan et al. Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions. 2024. [arxiv]
- Song et al. Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters. 2024. [arxiv]
- Gu et al. RWKV-CLIP: A Robust Vision-Language Representation Learner. 2024. [arxiv]
- Chen et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. 2024. [arxiv]
- Zhu et al. Are Large Language Models Good Statisticians?. 2024. [arxiv]
- Li et al. Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning. 2024. [arxiv]
- Ding et al. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce. 2024. [arxiv]
- He et al. COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities. 2024. [arxiv]
- Lin et al. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving. 2024. [arxiv]
- Treutlein et al. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. 2024. [arxiv]
- Feng et al. SS-Bench: A Benchmark for Social Story Generation and Evaluation. 2024. [arxiv]
- Feng et al. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. 2024. [arxiv]
- Liu et al. Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals. 2024. [arxiv]
- Iyer et al. Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. AmericasNLP 2024. [paper]
- Li et al. Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. 2024. [arxiv]
- Yang et al. Financial Knowledge Large Language Model. 2024. [arxiv]
- Lin et al. DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. 2024. [arxiv]
- Bako et al. Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization. 2024. [arxiv]
- Huang et al. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. 2024. [arxiv]
- Jiang et al. LLM-Collaboration on Automatic Science Journalism for the General Audience. 2024. [arxiv]
- Inouye et al. Applied Auto-tuning on LoRA Hyperparameters. 2024. [paper]
- Qi et al. Research on Tibetan Tourism Viewpoints information generation system based on LLM. 2024. [arxiv]
- Xu et al. Course-Correction: Safety Alignment Using Synthetic Preferences. 2024. [arxiv]
- Sun et al. LAMBDA: A Large Model Based Data Agent. 2024. [arxiv]
- Zhu et al. CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. 2024. [arxiv]
- Yu et al. Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment. 2024. [arxiv]
- Xie et al. The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning. IALP 2024. [paper]
- Liu et al. Instruct-Code-Llama: Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback. ICIC 2024. [paper]
- Wang et al. Cybernetic Sentinels: Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning. ICIC 2024. [paper]
- Xia et al. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. 2024. [arxiv]
- Zeng et al. Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions. 2024. [arxiv]
- Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [paper]
- Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [arxiv]
- Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [paper]
- StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
- DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
- Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
- CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.
- MachineMindset: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.
- Luminia-13B-v3: A large language model specialized in generate metadata for stable diffusion. [demo]
- Chinese-LLaVA-Med: A multimodal large language model specialized in Chinese medical domain, based on LLaVA-1.5-7B.
- AutoRE: A document-level relation extraction system based on large language models.
- NVIDIA RTX AI Toolkit: SDKs for fine-tuning LLMs on Windows PC for NVIDIA RTX.
- LazyLLM: An easy and lazy way for building multi-agent LLMs applications and supports model fine-tuning via LLaMA Factory.
- RAG-Retrieval: A full pipeline for RAG retrieval model fine-tuning, inference, and distillation. [blog]
This repository is licensed under the Apache-2.0 License.
Please follow the model licenses to use the corresponding model weights: Baichuan 2 / BLOOM / ChatGLM3 / Command R / DeepSeek / Falcon / Gemma / GLM-4 / Granite / Index / InternLM2 / Llama / Llama 2 (LLaVA-1.5) / Llama 3 / MiniCPM / Mistral/Mixtral/Pixtral / OLMo / Phi-1.5/Phi-2 / Phi-3 / Qwen / Skywork / StarCoder 2 / TeleChat2 / XVERSE / Yi / Yi-1.5 / Yuan 2