AMI_process

AMI Dataset Preprocess

This code can be used to preprocess AMI dataset for meeting summarization from scratch.

1. Get ACL 2018 Preprocessed Dataset

Download files from ACL18-code.
Copy files dascim-acl2018_abssumm-3ee827b34472/data/meeting/ami/* to ./data/acl18-ami/*.
Copy files dascim-acl2018_abssumm-3ee827b34472/data/meeting/lists/list.ami, list.ami.train, list.ami.eval, list.ami.test to **./data/**.

Now, you will have data/acl18-ami/*, data/list.ami, data/list.ami.train, data/list.ami.eval and data/list.ami.test

2. Get AMI Dataset

Download the AMI dataset from AMI manual annotations v1.6.2 and unzip it to ./data/. Finally you can get ./data/ami_public_manual_1.6.2/*, which contains original AMI files.
Run python delete_useless_files.py

python delete_useless_files.py: Following previous setting, we only use part of the meeting files, so we first filter out useless files. This script is used to extract words, topics and summaries according to the useful file list ./data/list.ami. Output files are in data/ami/words, data/ami/summary, data/ami/topics

3. Process AMI Data

Run python ami_preprocess.py, results are shown in data/ami/train, data/ami/valid, data/ami/test

python ami_preprocess.py will first create dirs train,valid,test under data/ami/ and then extract speaker, time, utterance, topic from data/ami/words and data/ami/topics, in each line, the data format is speaker \t time \t utterance \t topic.

4. Clean Data

First copy stopwords from dascim-acl2018_abssumm-3ee827b34472/resources/stopwords/meeting/filler_words.en.txt to data/ami/stop_words.txt. Note that you should rename the file.
Run python create_clean_data.py.

create_clean_data.py will first creat output dirs cleaned, cleaned/train, cleaned/valid, cleaned/test at data/ami/, then clean utterances.

5. Get AMI Meeting Summary

Run python get_summary.py.

python get_summary.py will create a output dir data/ami/final_summary and extract the summary from xml files for each meeting.

6. Final

Now, you have meetings in data/ami/train, data/ami/valid, data/ami/test and also summaries in data/ami/final_summary.

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
.gitignore		.gitignore
README.md		README.md
ami_preprocess.py		ami_preprocess.py
create_clean_data.py		create_clean_data.py
delete_useless_files.py		delete_useless_files.py
get_summary.py		get_summary.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

AMI Dataset Preprocess

1. Get ACL 2018 Preprocessed Dataset

2. Get AMI Dataset

3. Process AMI Data

4. Clean Data

5. Get AMI Meeting Summary

6. Final

FilesExpand file tree

AMI_process

Directory actions

More options

Directory actions

More options

Latest commit

History

AMI_process

Folders and files

parent directory

README.md

AMI Dataset Preprocess

1. Get ACL 2018 Preprocessed Dataset

2. Get AMI Dataset

3. Process AMI Data

4. Clean Data

5. Get AMI Meeting Summary

6. Final