Skip to content

Conversation

@desilinguist
Copy link
Collaborator

This PR closes #600.

The various SKLL modules learner.py, experiments.py, config.py etc. were getting really long as single files and unwieldy when it came to adding new development. This PR tries to address this issue by converting these modules into sub-packages instead with specific functions moved to different modules under these sub-package.

Here's what the SKLL code tree looks like today:

.
├── __init__.py
├── config.py
├── data
│   ├── __init__.py
│   ├── dict_vectorizer.py
│   ├── featureset.py
│   ├── readers.py
│   ├── writers.py
├── experiments.py
├── learner.py
├── logutils.py
├── metrics.py
├── utilities
│   ├── __init__.py
│   ├── compute_eval_from_predictions.py
│   ├── filter_features.py
│   ├── generate_predictions.py
│   ├── join_features.py
│   ├── plot_learning_curves.py
│   ├── print_model_weights.py
│   ├── run_experiment.py
│   ├── skll_convert.py
│   └── summarize_results.py
└── version.py

13 directories, 84 files

With this PR, it will look like:

.
├── __init__.py
├── config
│   ├── __init__.py
│   └── utils.py
├── data
│   ├── __init__.py
│   ├── dict_vectorizer.py
│   ├── featureset.py
│   ├── readers.py
│   ├── writers.py
├── experiments
│   ├── __init__.py
│   ├── input.py
│   ├── output.py
│   └── utils.py
├── learner
│   ├── __init__.py
│   └── utils.py
├── metrics.py
├── utils
│   ├── __init__.py
│   ├── commandline
│   │   ├── __init__.py
│   │   ├── compute_eval_from_predictions.py
│   │   ├── filter_features.py
│   │   ├── generate_predictions.py
│   │   ├── join_features.py
│   │   ├── plot_learning_curves.py
│   │   ├── print_model_weights.py
│   │   ├── run_experiment.py
│   │   ├── skll_convert.py
│   │   └── summarize_results.py
│   ├── constants.py
│   └── logging.py
└── version.py

12 directories, 91 files

In addition, some functions that were previously indicated to be private (with a leading underscore) are now public since they can actually be quite useful as part of the API. Some examples include experiments.input.load_featureset() and experiments.output.generate_learning_curve_plots().

Specifically, this PR:

  • Reorganizes and refactors SKLL module code into sub-packages.
  • Sets up __init__.py for these sub-packages such that most of the idiomatic SKLL imports work just like they do now.
  • Removes unnecessary imports from the top-level skll namespace. Right now only the following are importable from skll - FeatureSet, Learner, and run_configuration.
  • Moves logutils.py to utils/logging.py
  • Consolidates all the constants from across the various modules into a single place: utils/constants.py.
  • Moves all the command line utilities under utils/commandline/ and adjusts setup.py accordingly.
  • Converts the private _import_custom_learner() function to a public function called load_custom_learner() which does not modify globals() directly but just returns the loaded learner to the caller which then modifies globals() on its end.
  • Renames test_logutils.py and test_utilities.py to test_logging_utils.py and test_commandline_utils.py.
  • Updates API documentation to remove things that are not top-level importable, to include the new sub-packages and constants, and to tweak the names of sections.

- Split into `config/__init__.py` and `config/utils.py`
Split into `experiments/__init__.py`, `experiments/input.py`, `experiments/output.py`, and `experiments/utils.py`.
Split into `learner/__init__.py` and `learner/utils.py`.
Move all `utilities/*.py` into `utils/commadline/*.py`.
Move skll constants from all across codebase into `utils/constants.py`.
Move `logutils.py` into `utils/logging.py`.
- Remove unnecessary functions and classes from top-level namespace.
- Remove some unnecessary newlines.
- Add new sections for `config` and `utils` packages.
- Replace "module" with "Package" in various places.
- Remove documentation for functions that are no longer in top-level skll namespace.
- Call it `load_custom_learner()` and have it return the object instead of populating `globals()` directly.
- Populate globals() where this function is called instead,
@desilinguist desilinguist requested review from a user, Lguyogiro, aoifecahill and mulhod April 11, 2020 13:54
@pep8speaks
Copy link

pep8speaks commented Apr 11, 2020

Hello @desilinguist! Thanks for updating this PR.

Line 41:101: E501 line too long (113 > 100 characters)

Line 769:101: E501 line too long (106 > 100 characters)

Line 1115:101: E501 line too long (114 > 100 characters)
Line 1117:101: E501 line too long (114 > 100 characters)
Line 1119:101: E501 line too long (117 > 100 characters)
Line 1165:101: E501 line too long (114 > 100 characters)
Line 1167:101: E501 line too long (114 > 100 characters)
Line 1169:101: E501 line too long (117 > 100 characters)

Comment last updated at 2020-04-11 14:00:59 UTC

@codecov
Copy link

codecov bot commented Apr 11, 2020

Codecov Report

Merging #601 into master will increase coverage by 0.10%.
The diff coverage is 96.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   95.06%   95.16%   +0.10%     
==========================================
  Files          20       26       +6     
  Lines        2977     3021      +44     
==========================================
+ Hits         2830     2875      +45     
+ Misses        147      146       -1     
Impacted Files Coverage Δ
skll/metrics.py 96.87% <ø> (-0.27%) ⬇️
...utils/commandline/compute_eval_from_predictions.py 97.18% <ø> (ø)
skll/utils/commandline/filter_features.py 98.41% <ø> (ø)
skll/utils/commandline/generate_predictions.py 98.59% <ø> (ø)
skll/utils/commandline/join_features.py 98.14% <ø> (ø)
skll/utils/commandline/print_model_weights.py 94.91% <ø> (ø)
skll/utils/commandline/run_experiment.py 96.77% <ø> (ø)
skll/experiments/utils.py 93.51% <93.51%> (ø)
skll/config/utils.py 96.00% <96.00%> (ø)
skll/learner/utils.py 96.31% <96.31%> (ø)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d64e25...996d44f. Read the comment docs.

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think this was really needed. It was getting difficult to develop in the long modules.

Copy link
Collaborator

@aoifecahill aoifecahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Co-Authored-By: Matt Mulholland <mulhodm@gmail.com>
@desilinguist desilinguist added this to the v2.5 milestone Apr 14, 2020
@desilinguist desilinguist merged commit ee5b4a1 into master Apr 14, 2020
@delete-merged-branch delete-merged-branch bot deleted the 600-make-subpackages branch April 14, 2020 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reorganize code into subpackages

5 participants