Skip to content

Conversation

@tamarl08
Copy link
Contributor

No description provided.

@desilinguist
Copy link
Collaborator

@tamarl08 you'll need to rebase.

@tamarl08 tamarl08 force-pushed the 180-parallel-grid-search branch from f67775e to 0f2f0ed Compare August 31, 2023 14:45
Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder about exposing this in the config since it's a detail that doesn't necessarily have to do with input/experiment. I don't know where else it would be put, but it seems slightly out of place. I'm not sure there's a better way to do it, but my initial thought was that it could be exposed by an environment variable. But I do see the benefit. I wonder if there's a possibility -- if it's not too complicated -- to also have this be controlled by an environment variable, though.

@tamarl08
Copy link
Contributor Author

tamarl08 commented Sep 1, 2023

@mulhod I see your point. I also don't like the fact that it defaults to 1, but can get a value of None to enable parallelization.

I didn't find anything else that we control using an environment variable, do we do that?

Another option I thought of, is adding a general option of sending extra skll args dict in the call to run_experiment. It will also allow disabling the grid search and specifying fixed parameters, and possibly many other tuning options, without making rsmtool config look like skll config.

@desilinguist
Copy link
Collaborator

desilinguist commented Sep 1, 2023

Honestly, I am not that bothered by the existence of this parameter since it's meant for advanced users who will probably already be familiar with the same parameter in SKLL.

I am not sure I see the benefits of using an environment variable over a configuration setting since then users have to think about two places rather than one for settings.

I think the idea of being able to pass a dictionary is great for advanced users but we should make that a longer term goal rather than for this PR.

@mulhod
Copy link
Contributor

mulhod commented Sep 1, 2023

I think what I was perhaps envisioning was something like:

Get environment variable "SKLL_GRID_JOBS" if it exists; if not, use "grid_jobs" from config; if neither, use 1. This way, it could be passed in either way. But, it does make it more complicated.

@mulhod
Copy link
Contributor

mulhod commented Sep 1, 2023

@mulhod I see your point. I also don't like the fact that it defaults to 1, but can get a value of None to enable parallelization.

I didn't find anything else that we control using an environment variable, do we do that?

Another option I thought of, is adding a general option of sending extra skll args dict in the call to run_experiment. It will also allow disabling the grid search and specifying fixed parameters, and possibly many other tuning options, without making rsmtool config look like skll config.

By the way, this is how it works in skll.

https://github.com/EducationalTestingService/skll/blob/main/skll/utils/constants.py#L138
https://github.com/EducationalTestingService/skll/blob/main/skll/learner/__init__.py#L1093
https://github.com/EducationalTestingService/skll/blob/main/skll/learner/__init__.py#L1857

@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
rsmtool/utils/constants.py ø
rsmtool/modeler.py 100.00%

📢 Thoughts on this report? Let us know!.

@desilinguist
Copy link
Collaborator

@mulhod I see your point. I also don't like the fact that it defaults to 1, but can get a value of None to enable parallelization.
I didn't find anything else that we control using an environment variable, do we do that?
Another option I thought of, is adding a general option of sending extra skll args dict in the call to run_experiment. It will also allow disabling the grid search and specifying fixed parameters, and possibly many other tuning options, without making rsmtool config look like skll config.

By the way, this is how it works in skll.

https://github.com/EducationalTestingService/skll/blob/main/skll/utils/constants.py#L138

Honestly, I've never really been a fan of that approach and I think most users of SKLL either don't know about it or use it. It also doesn't seem to be mentioned in the docs at all so it's more of a hidden setting.

@desilinguist
Copy link
Collaborator

desilinguist commented Sep 5, 2023

@tamarl08 @mulhod since this new setting only affects SKLL models, I think we should prefix it with skll_ to call it skll_grid_search_jobs in keeping with other options like skll_objective that already exist. Thoughts?

@desilinguist
Copy link
Collaborator

@tamarl08 did you already check that automatic and interactive config generation work with this new field?

@tamarl08
Copy link
Contributor Author

tamarl08 commented Sep 5, 2023

I added the field to the output and the tests pass, do we usually do manual testing as well?

@desilinguist
Copy link
Collaborator

Yes, I usually like to test the config generation by running rsmtool generate and rsmtool generate --interactive just to be sure.

@tamarl08
Copy link
Contributor Author

tamarl08 commented Sep 5, 2023

Yes, I usually like to test the config generation by running rsmtool generate and rsmtool generate --interactive just to be sure.

Done!

Copy link
Collaborator

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@tamarl08 tamarl08 requested a review from mulhod September 5, 2023 19:47
@tamarl08 tamarl08 merged commit a2372a6 into main Sep 7, 2023
@delete-merged-branch delete-merged-branch bot deleted the 180-parallel-grid-search branch September 7, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants