Enhancements to fast_predict #632

mulhod · 2023-07-31T19:31:25Z

Save RSMTool modeler in place of SKLL learner and make attributes like feature_info, train_predictions_mean, trim_min, etc., available in the modeler instance; modify fast_predict to be able to utilize these attributes in a backwards-compatible way so that fewer arguments can be passed to fast_predict.

~~This is a draft at the moment! No tests have been added quite yet. I would like to get some feedback on the direction here. Here are some points that I anticipate some discussion around:~~

fast_predict can now be called with just the features and the modeler instance. Parameters like df_feature_info, trim_min, and h1_mean, if left unspecified, will be retrieved from the modeler (provided it was trained with this or a future version of RSMTool).
Scaling and trimming in fast_predict can now be turned on/off via the scale/trim parameters, both of which are set to False by default. This is partially to allow for backwards-compatibility: If scaling-related parameters are not specified (currently), no scaling is done. But the attributes will always be there in the future (in the modeler object). Thus, this mechanism can be used to explicitly turn off trimming or scaling.
The change is backwards-compatible! Old RSMTool models (serialized SKLL learners) can still be loaded. Use of fast_predict, furthermore, is backwards-compatible. All of the attributes of the modeler that would be used for things like trimming and scaling can be explicitly passed in in the same way they are now in the call to fast_predict. Thus, old models and old fast_predict calls should work without requiring any change at all.
This change results in not saving SKLL models directly. Instead RSMTool modeler instances would be saved. These objects contain the SKLL model under the learner attribute, so we wouldn't really be losing much since it could always be referenced or saved by itself later.

Closes #631

mulhod · 2023-07-31T19:34:33Z

ICYI, here's what I was doing to test that this works (at least for loading RSMTool modeler objects, so I haven't tested the backwards compatibility yet):

❯ ../../../env/bin/rsmtool config_rsmtool.json -f
Output directory: /Users/mmulholland/Documents/rsmtool/rsmtool/examples/rsmtool
WARNING: /Users/mmulholland/Documents/rsmtool/rsmtool/examples/rsmtool already contains a non-empty 'output' directory. Th
e generated report might contain unexpected information from a previous experiment.
Saving configuration file.
Reading in all data from files.
Preprocessing all features.
Pre-processing training and test set features
Saving training and test set data to disk.
Training LinearRegression model.
Running analyses on training set.
Generating training and test set predictions.
Processing train set predictions.
Processing test set predictions.
Scaling the coefficients and saving them to disk
Running prediction analyses.
Starting report generation.
Merging sections
Exporting HTML
WARNING: Alternative text is missing on 8 image(s).
❯ ../../../env/bin/ipython
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:41) [Clang 15.0.7 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import rsmtool

In [2]: m = rsmtool.rsmtool.Modeler.load_from_file("output/ASAP2.model")

In [3]: rsmtool.rsmpredict.fast_predict({"DISCOURSE": 4.93806460126142, "ORGANIZATION": -0.0846667513334603, "GRAMMAR": -0.316793975540994, "MECHANICS": 4.65591397849462}, m)
Out[3]:
{'raw': 3.4671587960793446,
 'scale': 3.4876896893346814,
 'raw_trim': 3.4671587960793446,
 'raw_trim_round': 3,
 'scale_trim': 3.4876896893346814,
 'scale_trim_round': 3}

rsmtool/rsmtool.py

desilinguist · 2023-07-31T19:39:55Z

Hmm, One thing that immediately occurs to me is this – saving a modeler instead of a SKLL model – would be a completely backwards-incompatible change.

At the very least, we would need to modify how rsmpredict and rsmexplain work at the very least since they need access to the model. Also, we would need to change notebook sections that try to inspect the model object. Probably lots of stuff in the documentation too.

mulhod · 2023-07-31T19:44:13Z

Oh, yeah, rsmpredict is being modified now. I started with fast_predict right now as a POC. But the idea would be to have logic that tries to load a Learner object and, if that fails, load a Modeler object (or vice versa).

desilinguist · 2023-07-31T19:48:23Z

Ah, okay, Yeah, we can probably just modify Modeler.load_from_file() to return a Modeler instance irrespective of whether the file on disk contains a serialized SKLL model or a serialized modeler. And then the rest should just work?

mulhod · 2023-07-31T19:49:02Z

I'm going to push a change that should make that aspect a little bit clearer.

…ke feature_info, train_prediction_mean, trim_min, etc., available in the modeler instance; modify fast_predict to be able to utilize these attributes in a backwards-compatible way so that fewer arguments can be passed to fast_predict

…ol modeler instead of a SKLL learner object

…ast_predict for better backwards-compatibility

mulhod · 2023-08-02T15:17:25Z

@desilinguist I think this is ready for the part you were going to handle (converting unit test models).

codecov · 2023-08-02T15:33:19Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.06% 🎉

Comparison is base (2ed3733) 93.21% compared to head (5924248) 93.28%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #632      +/-   ##
==========================================
+ Coverage   93.21%   93.28%   +0.06%     
==========================================
  Files          33       33              
  Lines        4921     4972      +51     
==========================================
+ Hits         4587     4638      +51     
  Misses        334      334

Files Changed	Coverage Δ
rsmtool/modeler.py	`97.64% <100.00%> (+0.05%)`	⬆️
rsmtool/rsmpredict.py	`99.35% <100.00%> (+0.17%)`	⬆️
rsmtool/rsmtool.py	`99.21% <100.00%> (+0.05%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ecified and no attribute)

desilinguist

Thanks for working on this @mulhod! I had some suggestions/comments, mostly minor.

rsmtool/modeler.py

rsmtool/rsmpredict.py

tests/test_experiment_rsmpredict.py

tests/test_modeler.py

rsmtool/rsmpredict.py

rsmtool/modeler.py

rsmtool/rsmpredict.py

… etc.

…e documentation

mulhod · 2023-08-03T15:52:55Z

Thanks, @blongwill! I believe that I have addressed your concerns. Please take another look!

desilinguist · 2023-08-03T18:19:31Z

@mulhod I just updated all the relevant model files. I just realized we haven't yet looked at the documentation. I think there may be few places where the .model file may be described (the outputs generated by rsmtool). Can you please take a look and update all the relevant places?

mulhod · 2023-08-03T18:28:00Z

@desilinguist I have done some grepping around in the documentation. I found one instance so far that needed to be changed. I'll continue digging a bit, though. My first iteration was looking specifically for pieces that touched on literally a .model file output. But, perhaps search for SKLL, etc., would be a good idea too.

doc/intermediate_files_rsmtool.rst.inc

Co-authored-by: Nitin Madnani <nmadnani@ets.org>

desilinguist

LGTM! 🎉

blongwill · 2023-08-03T20:17:30Z

Thanks, @blongwill! I believe that I have addressed your concerns. Please take another look!

Looks great Matt! Thanks for making those updates!

mulhod requested a review from desilinguist July 31, 2023 19:31

mulhod self-assigned this Jul 31, 2023

desilinguist reviewed Jul 31, 2023

View reviewed changes

rsmtool/rsmtool.py Outdated Show resolved Hide resolved

desilinguist marked this pull request as draft July 31, 2023 19:41

mulhod requested review from Frost45, damien2012eng, dblandan and tamarl08 July 31, 2023 19:59

mulhod force-pushed the feature/fast_predict_enhancement branch 2 times, most recently from c04d69c to 3bafca3 Compare August 1, 2023 04:25

mulhod added 5 commits August 2, 2023 00:44

Adjust rsmtool/notebooks/skll_model.ipynb such that it loads an RSMTo…

f592fed

…ol modeler instead of a SKLL learner object

Fix minor issue in tests/update_files.py

c0296ec

Update test models and code to update test models

187e96d

Reorganize modifications to modeler.py and rsmtool.py and refashion f…

4ae0242

…ast_predict for better backwards-compatibility

mulhod force-pushed the feature/fast_predict_enhancement branch from d61cb4a to c3a25b6 Compare August 2, 2023 14:22

mulhod changed the title ~~Draft: Enhancements to fast_predict~~ Enhancements to fast_predict Aug 2, 2023

mulhod marked this pull request as ready for review August 2, 2023 15:17

mulhod requested review from blongwill, cadygansen and tazin-afrin August 2, 2023 15:18

mulhod added 2 commits August 2, 2023 13:15

Add/expand tests fast_predict and for modeler save method

25220ab

Fix model-loading issue in explanation utils tests

5c0d168

Set trim/scale to False by default in fast_predict

d095c00

mulhod force-pushed the feature/fast_predict_enhancement branch from 0fa4fa2 to d095c00 Compare August 2, 2023 17:15

Add coverage of the default trim_tolerance in fast_predict (when unsp…

b87f4f5

…ecified and no attribute)

desilinguist reviewed Aug 2, 2023

View reviewed changes

Address MR suggestions

4c5cab6