Skip to content

Conversation

@TomDonoghue
Copy link
Member

@TomDonoghue TomDonoghue commented Jan 31, 2021

This PR does some updates on words collection & processing:

  • fixes how IDs are extracted, to not accidentally collect IDs listed as references
  • updates collecting years data, to cover more cases of how it may be encoded
  • drops the nltk dependency, by adding stopwords and tokenizing functionality to the module
  • update to cleanly handle the situation in which no authors are found
  • refactors and cleans of the code, including optimization of words processing

@TomDonoghue TomDonoghue changed the title [MNT] - Fix up some words collection & processing [ENH - Fix up some words collection & processing Feb 1, 2021
@TomDonoghue TomDonoghue mentioned this pull request Feb 1, 2021
@TomDonoghue TomDonoghue changed the title [ENH - Fix up some words collection & processing [ENH] - Update words collection & processing Feb 1, 2021
@lisc-tools lisc-tools deleted a comment from codecov-io Feb 1, 2021
@lisc-tools lisc-tools deleted a comment from codecov-io Feb 1, 2021
@TomDonoghue TomDonoghue merged commit a7bbe07 into main Feb 1, 2021
@TomDonoghue TomDonoghue deleted the words branch February 1, 2021 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants