AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length
Large Language Models (LLMs) have advanced code generation but struggle to balance performance and inference costs across diverse tasks. Dynamically selecting the optimal LLM based on task difficulty and resource constraints offers a solution, yet existing methods are resource-intensive, costly, and rely on human-annotated difficulty labels, which are often unavailable or misaligned with LLMs' perception.
We introduce AdaptiveLLM , a framework that dynamically selects optimal LLMs by automatically assessing task difficulty. It estimates difficulty using Chain-of-Thought (CoT) lengths from reasoning models, clusters tasks into three difficulty levels via k-means, and fine-tunes CodeBERT to embed difficulty-aware features. An XGBoost classifier then selects the best model for each task, optimizing performance-cost trade-offs.
This folder is used to store the experimental code for the baseline method ComplexityNet. In the ComplexityNet framework, the model pool consists of CodeLlama, GPT-3.5, and GPT-4o, and the selector used for fine-tuning is Qwen2.5-7B-Instruct.
This folder contains the box plot comparison between difficulty annotations based on CoT length and human-annotated difficulty levels. We conducted the comparison on two datasets: LeetCodeSample and CodeContests.
This folder contains the confusion matrix comparing difficulty annotations based on CoT length with human-annotated difficulty levels, aimed at exploring the differences between the two classification methods. The comparison was also performed on the LeetCodeSample and CodeContests datasets.
This folder contains the original datasets as well as the datasets annotated with difficulty labels based on CoT length.
- prompts_en_extra_is_freeform.jsonl: HumanEval dataset
- prompts_python_en_test.jsonl: CodeContests dataset
- prompts.jsonl: LeetCodeSample dataset
This folder contains the combined datasets of three datasets annotated with chain-of-thought difficulty labels by the DeepSeek-R1-Distill-Qwen-32B model.
This folder contains the generation results and code produced by invoking the models from the model pool on the three datasets.
This folder contains the CoT lengths generated by the DeepSeek R1 distilled models with parameter sizes of 1.5B, 7B, 14B, and 32B, along with their corresponding clustering results.
This folder contains the code and results for fine-tuning CodeBERT and training the XGBoost classifier
- CodeBert_finetune.py : Training code for fine-tuning CodeBERT.
- data_split.py : Code for splitting the dataset into training and testing sets.
- score.py : Formula for calculating the cost-performance score of models.
- Classifier.py : Code for training the XGBoost classifier.
- test_data.jsonl : Test dataset.
- train_data.jsonl : Training dataset.
- predictions_1.jsonl : Prediction results from AdaptiveLLM.
- predictions_2.jsonl : Prediction results from AdaptiveLLM (without fine-tuning).
- xgboost_model_1.pkl : Trained XGBoost classifier from the AdaptiveLLM framework.
- xgboost_model_2.pkl : Trained XGBoost classifier from AdaptiveLLM (without fine-tuning).
| LLM | Size | Link | Price |
|---|---|---|---|
| Yi-Coder-1.5B-Chat | 1.5B | https://huggingface.co/01-ai/Yi-Coder-1.5B-Chat | $ 0.14/ M Tokens |
| Qwen2.5-Coder-1.5B-Instruct | 1.5B | https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct | $ 0.14/ M Tokens |
| CodeLlama-7b-Instruct-hf | 7B | https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf | $ 0.42/ M Tokens |
| starcoder2-15b-instruct-v0.1 | 15B | https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 | $ 0.72/ M Tokens |
| deepseek-coder-v2-lite-instruct | 16B | https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | $ 0.72/ M Tokens |
| Codestral-22B-v0.1 | 22B | https://huggingface.co/mistralai/Codestral-22B-v0.1 | $ 0.95/ M Tokens |
| deepseek-coder-33b-instruct | 33B | https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct | $ 1.26/ M Tokens |
| Qwen2.5-Coder-32B-Instruct | 32B | https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct | $ 1.26/ M Tokens |
| LLM | Size | Link |
|---|---|---|
| DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| DeepSeek-R1-Distill-Qwen-7B | 7B | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| DeepSeek-R1-Distill-Qwen-14B | 14B | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
| DeepSeek-R1-Distill-Qwen-32B | 32B | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
| DeepSeek-R1 | 671B | https://huggingface.co/deepseek-ai/DeepSeek-R1 |