-
Notifications
You must be signed in to change notification settings - Fork 537
Open
Description
The layout feature model data generation seems to very similar across the various models and usually quite long functions.
It would be good to refactor and simplify those methods.
For example FullTextParser.getBodyTextFeatured and HeaderParser.getSectionHeaderFeatured (both more than 400 lines).
The function could be split into multiple parts, e.g.:
- select tokens to be included (e.g. filter out whitespace etc)
- generate feature vector objects
- convert feature vector object to string
The feature vectors could share functionality which would make it more clear, what actually is intended to be different.
(does header and fulltext actually need to be different?)
Perhaps that could also be more separated from what features were actually selected for the model implementation (e.g. Wapiti).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels