-
Notifications
You must be signed in to change notification settings - Fork 19
Enhancements to fast_predict #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ICYI, here's what I was doing to test that this works (at least for loading RSMTool modeler objects, so I haven't tested the backwards compatibility yet): ❯ ../../../env/bin/rsmtool config_rsmtool.json -f
Output directory: /Users/mmulholland/Documents/rsmtool/rsmtool/examples/rsmtool
WARNING: /Users/mmulholland/Documents/rsmtool/rsmtool/examples/rsmtool already contains a non-empty 'output' directory. Th
e generated report might contain unexpected information from a previous experiment.
Saving configuration file.
Reading in all data from files.
Preprocessing all features.
Pre-processing training and test set features
Saving training and test set data to disk.
Training LinearRegression model.
Running analyses on training set.
Generating training and test set predictions.
Processing train set predictions.
Processing test set predictions.
Scaling the coefficients and saving them to disk
Running prediction analyses.
Starting report generation.
Merging sections
Exporting HTML
WARNING: Alternative text is missing on 8 image(s).
❯ ../../../env/bin/ipython
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:41) [Clang 15.0.7 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import rsmtool
In [2]: m = rsmtool.rsmtool.Modeler.load_from_file("output/ASAP2.model")
In [3]: rsmtool.rsmpredict.fast_predict({"DISCOURSE": 4.93806460126142, "ORGANIZATION": -0.0846667513334603, "GRAMMAR": -0.316793975540994, "MECHANICS": 4.65591397849462}, m)
Out[3]:
{'raw': 3.4671587960793446,
'scale': 3.4876896893346814,
'raw_trim': 3.4671587960793446,
'raw_trim_round': 3,
'scale_trim': 3.4876896893346814,
'scale_trim_round': 3} |
|
Hmm, One thing that immediately occurs to me is this – saving a modeler instead of a SKLL model – would be a completely backwards-incompatible change. At the very least, we would need to modify how |
|
Oh, yeah, |
|
Ah, okay, Yeah, we can probably just modify |
|
I'm going to push a change that should make that aspect a little bit clearer. |
c04d69c to
3bafca3
Compare
…ke feature_info, train_prediction_mean, trim_min, etc., available in the modeler instance; modify fast_predict to be able to utilize these attributes in a backwards-compatible way so that fewer arguments can be passed to fast_predict
…ol modeler instead of a SKLL learner object
…ast_predict for better backwards-compatibility
d61cb4a to
c3a25b6
Compare
|
@desilinguist I think this is ready for the part you were going to handle (converting unit test models). |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #632 +/- ##
==========================================
+ Coverage 93.21% 93.28% +0.06%
==========================================
Files 33 33
Lines 4921 4972 +51
==========================================
+ Hits 4587 4638 +51
Misses 334 334
☔ View full report in Codecov by Sentry. |
0fa4fa2 to
d095c00
Compare
…ecified and no attribute)
desilinguist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @mulhod! I had some suggestions/comments, mostly minor.
|
Thanks, @blongwill! I believe that I have addressed your concerns. Please take another look! |
|
@mulhod I just updated all the relevant model files. I just realized we haven't yet looked at the documentation. I think there may be few places where the |
|
@desilinguist I have done some grepping around in the documentation. I found one instance so far that needed to be changed. I'll continue digging a bit, though. My first iteration was looking specifically for pieces that touched on literally a |
Co-authored-by: Nitin Madnani <nmadnani@ets.org>
desilinguist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
Looks great Matt! Thanks for making those updates! |
Save RSMTool modeler in place of SKLL learner and make attributes like
feature_info,train_predictions_mean,trim_min, etc., available in the modeler instance; modifyfast_predictto be able to utilize these attributes in a backwards-compatible way so that fewer arguments can be passed tofast_predict.This is a draft at the moment! No tests have been added quite yet. I would like to get some feedback on the direction here. Here are some points that I anticipate some discussion around:fast_predictcan now be called with just the features and the modeler instance. Parameters likedf_feature_info,trim_min, andh1_mean, if left unspecified, will be retrieved from the modeler (provided it was trained with this or a future version of RSMTool).fast_predictcan now be turned on/off via thescale/trimparameters, both of which are set toFalseby default. This is partially to allow for backwards-compatibility: If scaling-related parameters are not specified (currently), no scaling is done. But the attributes will always be there in the future (in the modeler object). Thus, this mechanism can be used to explicitly turn off trimming or scaling.fast_predict, furthermore, is backwards-compatible. All of the attributes of the modeler that would be used for things like trimming and scaling can be explicitly passed in in the same way they are now in the call tofast_predict. Thus, old models and oldfast_predictcalls should work without requiring any change at all.learnerattribute, so we wouldn't really be losing much since it could always be referenced or saved by itself later.Closes #631