Abbreviation data

This repository provides labeled data for training abbreviation expansion models, as described in:

Gorman, K., Kirov, C., Roark, B., and Sproat, R. 2021. Structured abbreviation in context. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 995-1005.

If you use this data in a publication, we would appreciate it if you cite this paper.

Annotation

Sentences were extracted from English Wikipedia articles, then filtered as described in the paper. Annotators were then asked to introduce abbreviations to the sentences.

Organization

The data, with the original 80%/10%/10% split, can be found in the data directory. The data are text-format Protocol Buffers using the protocol described in abbreviation.proto. To load this data into Python, install the Protocol Buffers compiler protoc, then:

pip install -r requirements.txt
make

Then, see textproto.py.

Authors

This data was collected by Kyle Gorman with help from the annotators and Brian Roark, Richard Sproat, Olivia Redfield, Caterina Golner, and Katherine Wang.

License

See LICENSE.

Contributing

See CONTRIBUTING.

Mandatory disclaimer

This is not an official Google product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abbreviation data

Annotation

Organization

Authors

License

Contributing

Mandatory disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
AUTHORS		AUTHORS
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
abbreviation.proto		abbreviation.proto
requirements.txt		requirements.txt
textproto.py		textproto.py

Folders and files

Latest commit

History

Repository files navigation

Abbreviation data

Annotation

Organization

Authors

License

Contributing

Mandatory disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages