Retrieval-augmented-Code-Summarization

The dataset comes from JCSD provided by Hu and PCSD. But for fair and convenient comparision we use filtered version of JCSD provided by DECOM and PCSD provided by SG-Trans

Data preprocess

python process.py

For who lack patience

# first parameter is the dataset
# second parameter is the GPU_ids
# third parameter is the number of exemplars
bash run.sh JCSD 0,1 4

1. Train retriever and generator jointly

# For java and 4 exemplars
lang=JCSD
number=4
python run.py \
	--do_train \
	--do_eval \
	--model_name_or_path Salesforce/codet5-base \
	--train_filename dataset/${lang}/train.jsonl \
	--dev_filename dataset/${lang}/valid.jsonl \
	--output_dir saved_models/${lang}${number} \
	--max_source_length 512 \
	--max_target_length 64 \
	--code_length 256 \
	--nl_length 64 \
	--beam_size 10 \
	--train_batch_size 32 \
	--eval_batch_size 24 \
	--learning_rate 5e-5 \
	--gradient_accumulation_steps 4 \
	--num_train_epochs 10 \
	--passage_number ${number} \
	--GPU_ids 0,1

2. Generate predictions for test set

# For java
python run.py \
	--do_test \
	--model_name_or_path Salesforce/codet5-base \
	--train_filename dataset/${lang}/train.jsonl \
	--test_filename dataset/${lang}/test.jsonl \
	--output_dir saved_models/${lang}${number} \
	--max_source_length 512 \
	--max_target_length 64 \
	--code_length 256 \
	--nl_length 64 \
	--beam_size 10 \
	--eval_batch_size 24 \
	--GPU_ids 0,1

3. Evaluate the result

# You should use python2.7 to run the evaluation program
# Set the path as the dir containing test.output and test.gold
path=saved_models/JCSD4
cd eval
python evaluate.py $path

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
ablation		ablation
dataset		dataset
eval		eval
README.md		README.md
distribution.py		distribution.py
evaluate.py		evaluate.py
model.py		model.py
process.py		process.py
run.py		run.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-augmented-Code-Summarization

Data preprocess

For who lack patience

1. Train retriever and generator jointly

2. Generate predictions for test set

3. Evaluate the result

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retrieval-augmented-Code-Summarization

Data preprocess

For who lack patience

1. Train retriever and generator jointly

2. Generate predictions for test set

3. Evaluate the result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages