We describe the utility of Break decompositions for the HotpotQA "fullwiki" setting.
Our experiments broadly use the following components besides Break -
- HoptotQA: (
train,dev-distractor,dev-fullwiki) data from HotpotQA website. - Wikipedia (only first paras): from HotpotQA wiki.
- IR: tf-idf ranker built on the above Wikipedia using DrQA (see
ranker/README.md). - Bert RC: BERT
base-uncasedspan-extraction model finetuned on hotpot using an older version of HuggingFace's repo (seebert_rc/README.md). - Single-hop RC: an ensemble trained on augmented SQuAD by Min et al..
- Break RC: As described in our paper.
- After navigating to the current directory, download data.tar.gz. Then extract using
tar xvfz data.tar.gzto have adatadirectory. - Navigate to
rankerdirectory and prepare the tf-idf ranker by following the instructions inranker/README.md. E.g. one can do so as follows :
$ cd ranker
$ python build_tfidf.py --wiki ../data/wiki_firstpara_sents.jsonl
$ cd ..
You should end up with data/wiki_first_paras-tfidf-ngram=2-hash=16777216-tokenizer=spacy.npz.
- Optional: one can finetune BERT on Hotpot by following the instructions in
bert_rc/README.md. We provide an already finetuned model indata/bert_hotpot_67.3f1. - Preprocess the Break data provided in
data/qdmr_dataas follows:
$ cd process_break
$ python prepro_break.py --input_file ../data/qdmr_data/hotpotqa_dev_gold.csv --output_file processed_data_hotpotqa_gold.csv --is_gold
# qdmrs predicted using copy-net for the hotpot part of Break dev & test
$ python prepro_break.py --input_file ../data/qdmr_data/preds_copynet_hpqa_dev_test.tsv --ques_file ../data/qdmr_data/hotpotqa_dev_questions.csv --output_file processed_data_hotpotqa_copy_net.csv
# qdmrs predicted using rule-based model for the hotpot part of Break dev & test
$ python prepro_break.py --input_file ../data/qdmr_data/preds_rule_based_high_level_dev.tsv --ques_file ../data/qdmr_data/hotpotqa_dev_questions.csv --output_file processed_data_hotpotqa_rule_based.csv
$ cd ..
This will save processed_data_hotpotqa_gold.csv, processed_data_hotpotqa_copy_net.csv, processed_data_hotpotqa_rule_based.csv in data/qdmr_data. Further process these outputted files as:
# gold
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_gold.csv --out_jsonl qdmrs_hotpotqa_gold.jsonl
# copy-net
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_copy_net.csv --out_jsonl qdmrs_hotpotqa_copy_net.jsonl
# rule-based
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_rule_based.csv --out_jsonl qdmrs_hotpotqa_rule_based.jsonl
- Predict using Break RC and create Hotpot-style data for Bert RC :
$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_gold.jsonl --suffix gold --predict_batch_size 256
After taking some time, this would save:
a) BreakRC results in ./data/predictions/break_rc_results_gold.json (as --suffix was gold).
b) Hotpot-style data for Bert RC using baseline IR using the whole question: in ./data/hotpot_data/hotpot_after_ques_ir_gold.json.
c) Hotpot-style data for Bert RC using Break IR : in ./data/hotpot_data/hotpot_after_break_ir_gold.json.
The above command and outputs are corresponding to the gold (human-annotated) QDMRs indicated by the --suffix gold arg. We note that run_break.py will load the ranker & Wikipedia in the RAM and one should make sure to have enough memory. One can similarly repeat this step for the predicted QDMRs (copy_net, rule_based, etc) by using appropriate file paths and --suffix to change the output file names :
# copy-net
$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_copy_net.jsonl --suffix copy_net --predict_batch_size 256
# rule-based
$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_rule_based.jsonl --suffix rule_based --predict_batch_size 256
Each of these two commands will output 3 files as described above.
- Navigate to
bert_rcand use Bert RC (instructions inbert_rc/README.md) to make predicions for each of the multiple Hotpot-style files outputted by BreakRC. Make sure to provide appropriate--pred_dirarg to store the predictions for different hotpot-style files in different directories. For example in the copy_net case, for the Break IR hotpot-style file outputted by the previous step one can make predictions as :
$ cd bert_rc
# convert to squad format
$ python hotpot_to_squad2.py --max_n_samples -1 --data_type dev --data_file ../data/hotpot_data/hotpot_after_break_ir_copy_net.json --out_file examples_data/bert_rc_break_ir_copy_net.json
# use bert rc
$ CUDA_VISIBLE_DEVICES=0 python run_squad.py --bert_model bert-base-uncased --do_predict --do_evaluate --do_lower_case --predict_file examples_data/bert_rc_break_ir_copy_net.json --predict_batch_size 128 --max_seq_length 500 --doc_stride 128 --output_dir ../data/bert_hotpot_67.3f1 --n_best_size 5 --preds_dir preds_break_ir_copy_net
$ cd ..
This will save predictions.json in preds_break_ir_copy_net directory.
- Evaluate the IR performance, EM, F1 of the Break RC outputs and various Bert RC outputs:
python evaluate_break.py --data_dir ./data --input_results_file ./data/predictions/break_rc_results_copy_net.json --bert_rc_pred_files path1,path2
where argument to --bert_rc_pred_files is a comma separated list of paths to any number of predictions.json files outputted by Bert RC. The best performance will be obtained by Break IR + Bert RC as reported in the paper.
Notes:
- Code was tested on
requirements.txt. - We are not including the IR baselines based on content words & noun phrases as they do not improve over the trivial whole-question baseline. The interested reader can refer to our paper for exact details.
Contact: Ankit Gupta.