break_utility

Utility of Break for open-domain QA

We describe the utility of Break decompositions for the HotpotQA "fullwiki" setting.

Overview

Our experiments broadly use the following components besides Break -

HoptotQA: (train, dev-distractor, dev-fullwiki) data from HotpotQA website.
Wikipedia (only first paras): from HotpotQA wiki.
IR: tf-idf ranker built on the above Wikipedia using DrQA (see ranker/README.md).
Bert RC: BERT base-uncased span-extraction model finetuned on hotpot using an older version of HuggingFace's repo (see bert_rc/README.md).
Single-hop RC: an ensemble trained on augmented SQuAD by Min et al..
Break RC: As described in our paper.

Usage

After navigating to the current directory, download data.tar.gz. Then extract using tar xvfz data.tar.gz to have a data directory.
Navigate to ranker directory and prepare the tf-idf ranker by following the instructions in ranker/README.md. E.g. one can do so as follows :

$ cd ranker
$ python build_tfidf.py --wiki ../data/wiki_firstpara_sents.jsonl
$ cd ..

You should end up with data/wiki_first_paras-tfidf-ngram=2-hash=16777216-tokenizer=spacy.npz.

Optional: one can finetune BERT on Hotpot by following the instructions in bert_rc/README.md. We provide an already finetuned model in data/bert_hotpot_67.3f1.
Preprocess the Break data provided in data/qdmr_data as follows:

$ cd process_break
$ python prepro_break.py --input_file ../data/qdmr_data/hotpotqa_dev_gold.csv --output_file processed_data_hotpotqa_gold.csv --is_gold

# qdmrs predicted using copy-net for the hotpot part of Break dev & test
$ python prepro_break.py --input_file ../data/qdmr_data/preds_copynet_hpqa_dev_test.tsv --ques_file ../data/qdmr_data/hotpotqa_dev_questions.csv --output_file processed_data_hotpotqa_copy_net.csv

# qdmrs predicted using rule-based model for the hotpot part of Break dev & test
$ python prepro_break.py --input_file ../data/qdmr_data/preds_rule_based_high_level_dev.tsv --ques_file ../data/qdmr_data/hotpotqa_dev_questions.csv --output_file processed_data_hotpotqa_rule_based.csv
$ cd ..

This will save processed_data_hotpotqa_gold.csv, processed_data_hotpotqa_copy_net.csv, processed_data_hotpotqa_rule_based.csv in data/qdmr_data. Further process these outputted files as:

# gold
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_gold.csv --out_jsonl qdmrs_hotpotqa_gold.jsonl

# copy-net
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_copy_net.csv --out_jsonl qdmrs_hotpotqa_copy_net.jsonl

# rule-based
$ python prepare_break.py --processed_break_csv data/qdmr_data/processed_data_hotpotqa_rule_based.csv --out_jsonl qdmrs_hotpotqa_rule_based.jsonl

Predict using Break RC and create Hotpot-style data for Bert RC :

$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_gold.jsonl --suffix gold --predict_batch_size 256

After taking some time, this would save:
a) BreakRC results in ./data/predictions/break_rc_results_gold.json (as --suffix was gold).
b) Hotpot-style data for Bert RC using baseline IR using the whole question: in ./data/hotpot_data/hotpot_after_ques_ir_gold.json.
c) Hotpot-style data for Bert RC using Break IR : in ./data/hotpot_data/hotpot_after_break_ir_gold.json.

The above command and outputs are corresponding to the gold (human-annotated) QDMRs indicated by the --suffix gold arg. We note that run_break.py will load the ranker & Wikipedia in the RAM and one should make sure to have enough memory. One can similarly repeat this step for the predicted QDMRs (copy_net, rule_based, etc) by using appropriate file paths and --suffix to change the output file names :

# copy-net
$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_copy_net.jsonl --suffix copy_net --predict_batch_size 256

# rule-based
$ CUDA_VISIBLE_DEVICES=0 python run_break.py --qdmr_jsonl ./data/qdmr_data/qdmrs_hotpotqa_rule_based.jsonl --suffix rule_based --predict_batch_size 256

Each of these two commands will output 3 files as described above.

Navigate to bert_rc and use Bert RC (instructions in bert_rc/README.md) to make predicions for each of the multiple Hotpot-style files outputted by BreakRC. Make sure to provide appropriate --pred_dir arg to store the predictions for different hotpot-style files in different directories. For example in the copy_net case, for the Break IR hotpot-style file outputted by the previous step one can make predictions as :

$ cd bert_rc

# convert to squad format
$ python hotpot_to_squad2.py --max_n_samples -1 --data_type dev --data_file ../data/hotpot_data/hotpot_after_break_ir_copy_net.json --out_file examples_data/bert_rc_break_ir_copy_net.json

# use bert rc
$ CUDA_VISIBLE_DEVICES=0 python run_squad.py  --bert_model bert-base-uncased  --do_predict --do_evaluate --do_lower_case --predict_file examples_data/bert_rc_break_ir_copy_net.json  --predict_batch_size 128  --max_seq_length 500  --doc_stride 128   --output_dir ../data/bert_hotpot_67.3f1  --n_best_size 5 --preds_dir preds_break_ir_copy_net

$ cd ..

This will save predictions.json in preds_break_ir_copy_net directory.

Evaluate the IR performance, EM, F1 of the Break RC outputs and various Bert RC outputs:

python evaluate_break.py --data_dir ./data --input_results_file ./data/predictions/break_rc_results_copy_net.json --bert_rc_pred_files path1,path2

where argument to --bert_rc_pred_files is a comma separated list of paths to any number of predictions.json files outputted by Bert RC. The best performance will be obtained by Break IR + Bert RC as reported in the paper.

Notes:

Code was tested on requirements.txt.
We are not including the IR baselines based on content words & noun phrases as they do not improve over the trivial whole-question baseline. The interested reader can refer to our paper for exact details.

Contact: Ankit Gupta.

Name		Name	Last commit message	Last commit date
parent directory ..
bert_rc		bert_rc
process_break		process_break
ranker		ranker
.gitignore		.gitignore
README.md		README.md
break_utils.py		break_utils.py
evaluate_break.py		evaluate_break.py
hotpot_evaluate_v1.py		hotpot_evaluate_v1.py
info_retriever.py		info_retriever.py
modeling.py		modeling.py
pred_util.py		pred_util.py
prepare_break.py		prepare_break.py
prepro.py		prepro.py
prepro_util.py		prepro_util.py
requirements.txt		requirements.txt
run_break.py		run_break.py
tokenization.py		tokenization.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Utility of Break for open-domain QA

Overview

Usage

FilesExpand file tree

break_utility

Directory actions

More options

Directory actions

More options

Latest commit

History

break_utility

Folders and files

parent directory

README.md

Utility of Break for open-domain QA

Overview

Usage