[WB-7468] [WB-7692] Settings refactor by dmitryduev · Pull Request #3083 · wandb/wandb

dmitryduev · 2021-12-24T07:10:23Z

Fixes WB-7468
Fixes WB-7692
...and a ton of other stuff

I apologize for the size of this PR and the length of the description below :)

Description

Context: currently, settings are handled in a very complex manner, making it somewhat challenging to add new features and to maintain the existing ones.

The main purpose of this PR (which is likely only the first one in a series of PRs) is to refactor the wandb.sdk.wandb_settings.Settings class to provide:

Consistency, tight control, and encapsulation: implement general interfaces for settings.
- Enable preprocessing, validation, and runtime hooks (e.g. a setting can depend on another setting) for individual settings.
- Implement and enforce source priority logic in a consistent matter. Brake settings into policies and non-policy ones.
Compactness

Having addressed the above, I focused on making sure that nothing is broken across the client.

It's been a lot of fun working on this, and I expect even more fun from the review process :)

Implementation details

This PR adds the Property class that is utilized by the Settings.

Property: a class to represent attributes (individual settings) of the Settings object.

Encapsulates the logic of how to preprocess and validate values of settings throughout the lifetime of a class instance.
Allows for runtime modification of settings with hooks, e.g. in the case when a setting depends on another setting.
The update() method is used to update the value of a setting. Enforces source priority logic when updating values.
The is_policy attribute determines the source priority when updating the property value. E.g. if is_policy is True, the smallest Source value takes precedence.
Ability to freeze/unfreeze.

Here's a basic example (for more examples, see tests/wandb_settings_test.py)

from wandb.sdk.wandb_settings import Property, Source


def uses_https(x):
    if not x.startswith("https"):
        raise ValueError("Must use https")
    return True

base_url = Property(
    name="base_url",
    value="https://wandb.com/",
    preprocessor=lambda x: x.rstrip("/"),
    validator=[lambda x: isinstance(x, str), uses_https],
    source=Source.BASE,
)

endpoint = Property(
    name="endpoint",
    value="site",
    validator=lambda x: isinstance(x, str),
    hook=lambda x: "/".join([base_url.value, x]),
    source=Source.BASE,
)

>>> print(base_url)  # note the stripped "/"
'https://wandb.com'
>>> print(endpoint)  # note the runtime hook
'https://wandb.com/site'
>>> print(endpoint._value)  # raw value
'site'
>>> base_url.update(value="https://wandb.ai/", source=Source.INIT)
>>> print(endpoint)  # valid update with a higher priority source
'https://wandb.ai/site'
>>> base_url.update(value="http://wandb.ai/")  # invalid value - second validator will raise exception
ValueError: Must use https
>>> base_url.update(value="https://wandb.dev", source=Source.USER)
>>> print(endpoint)  # valid value from a lower priority source has no effect
'https://wandb.ai/site'

Settings: a class that uses Property objects to represent configurable settings.

The code is supposed to be self-documented -- see wandb/sdk/wandb_settings.py :)
Clearly and compactly defines all individual settings, their default values, preprocessors, validators, and runtime hooks as well as whether or not they are treated as policies.
Provides a mechanism to update settings specifying source (which abides the corresponding Property source logic) via Settings.update(). Direct attribute assignment is not allowed.
Careful Settings object copying.
Mapping interface.
Exposes attribute.value if attribute is a Property.
Ability to freeze/unfreeze.
Settings.make_static() method that we can use to replace StaticSettings.
Adapted/reworked convenience methods to apply settings originating from different source.

Other stuff in this PR

Fix WB-7468 (code saving and wandb.login()).
Fix WB-7692 (drop some of the deps back-ported to py2).
A note on using a debugger in CONTRIBUTING.md.
Configurable '{"code_saving_enabled": true}' in mock_server ctx.
Improved test suite robustness (see below).
Expose run settings via a public attribute Run.settings. We should also start saving them by default with each run as it would provide a lot of useful information that is not otherwise captured.

Testing

Numerous (>100) tests, both settings-specific and not, were added or fixed.

No more sloppiness in the way the settings are treated in the tests :)
Improved isolation for numerous tests that were leaking state through filesystem and env variables and potentially affecting other tests (in a flaky way) executed by the same pytest worker.

Some notes

When designing this, I considered external libraries attrs and pydantic as potentially useful. Neither seemed like a good option for us:

pydantic generates scary code behind the scenes.
attrs only ensures conversion and validation at init.
Neither have runtime hooks/converters/validation.
No easy way to implement tracking and updating values based on source priorities.

Status as of 1/11/2022

A single functional test is failing: disabled code saving. This happens bc the mock server sets {“code_saving_enabled”: True} in the user settings by default, which, since now code saving setting is a policy, has higher priority than the ENV variable that the test sets. In unit tests, I made that configurable and so I do live_mock_server.set_ctx({"code_saving_enabled": None}) before running the corresponding tests — now I am working on making this possible with yea. Shouldn't take too long, but I think the review can proceed in the meantime.
Several tests on Windows are failing -- need to fix them, but I also think it's not a road block for the review.

vanpelt

Overall looks great! One addition I would suggest is a quick explainer on how to add new settings. We could put this in a comment of wandb_settings or one of our doc files, but there's a lot to think about between env vars, validation, and serialization that would be good to clearly define.

vanpelt · 2022-01-20T22:24:29Z

    html.bind_to_run(mocked_run, "rad", "summary")
    wandb.Html.seq_to_json([html], mocked_run, "rad", "summary")
    assert os.path.exists(html._path)
+    wandb.finish()


I'm curious why we added all of these? The mocked_run object should be calling finish after yielding it to the test.

vanpelt · 2022-01-20T22:26:31Z

        run = wandb.init(entity="ign", project="ignored")
        sweep_projects.append(run.project)
        sweep_entities.append(run.entity)
+        run.finish()


Same here, I would think the sweep agents are automatically calling finish when the function returns...

vanpelt · 2022-01-20T22:28:39Z

        f.write("Not that big")
    art = run.log_code()
    assert sorted(art.manifest.entries.keys()) == ["test.py"]
+    run.finish()


Same here, the test_settings fixture has:

if wandb.run is not None: wandb.run.finish()

I'm sure there was a reason for this, but if we can make it "Just Work" in tests it's going to make future devs lives a lot less error prone.

Yea, I was trying to find the reason for flakiness and the working hypothesis was that maybe for some reason the singleton doesn't get recycled properly or something, which is of course not true since all fixtures are func-scoped.
I later discovered that it's (mainly) because of stuff leaking into os.environ and the filesystem through certain fixtures and tests, but decided to keep these calls in as it seemed more explicit and a settings fixture didn't seem like the first place I'd look for a teardown action like that.

Now that you bring this up, how about ditching it from test_settings and instead doing something like

@pytest.fixture(autouse=True) def cleanup(): yield if wandb.run is not None: wandb.run.finish()

What do you think? @raubitsj?

Yeah, I was also thinking there could be leaking if an exception is thrown in the yield I wonder if we need:

@pytest.fixture(autouse=True) def cleanup(): try: yield finally: if wandb.run is not None: wandb.run.finish()

vanpelt · 2022-01-20T22:31:15Z

+@enum.unique
+class Source(enum.IntEnum):
+    OVERRIDE: int = 0
+    BASE: int = 1  # fixme: audit this


Would love to know a little more about this comment ;)

https://youtu.be/HQ47glxcxr0?t=13 ;)

It's a general comment that we should use these more or ditch - e.g. ENTITY or WORKSPACE never come up in the code, ORG only come up in tests, etc.

vanpelt · 2022-01-20T22:33:53Z

+        }
+        self.username: Any = {
+            "validator": lambda x: isinstance(x, str),
+        }


All of these Any annotations make me feel sad. Any reason not to make these NamedTuples or something with strong type gurarentees?

Yeah.. All these are converted to Property class instances immediately after definition, all of which come with validators that guarantee not only types, but also anything else, specific to each setting (see e.g. https://github.com/wandb/client/pull/3083/files/cf4a6864f7e306dd5d3fc1172fb26f29fc96265f#diff-432d500a1d77db4e4a22825d98e99cb860fee78777f6540b85395c1e5ee28e94R476). Here's my reasoning for doing it this way:

Properties encapsulate all the logic we want from them: preprocessing, validation, runtime hooks, deps, source priorities (see an example in the pr description). So there are strong type guarantees, but yes, static checkers aren't of much use in this case, action happens at runtime. (So all those Any's are there just to make mypy happy.) But we should bring our test coverage to ~100% anyways :)

Validation allows programming by contract of sorts (long live PEP316 ^_^), which static checkers won't help with.

I could have gone with object.__setattr__ to define things (like we're doing now) but that breaks code completion in pycharm (and i'd expect same in vscode), and it's kind of nice to have.

We could define these as Properties straight away:

self.mofl: Property = Property( name="mofl", validator=[lambda x: isinstance(x, (int, float)), lambda x: x > 42], )

or something, it just takes up more space to spell it out since I decided to attach names to them. I'm afraid it might freak mypy out, although I haven't tried to be honest.

@raubitsj has a great idea about how to get the best of both static and runtime worlds, implementing now.
[the idea is to use typed class attributes + dynamic re-definition as properties propagating type annotation as the first validator in the resulting property object; this does add a bit of pain to the dev experience, but I'm adding a note to the contrib guide + a unit test that ensures people don't screw up in the process].

vanpelt · 2022-01-20T22:35:13Z

-                continue
-            if self._priority_failed(k, source=_source, override=_override):
+        env_prefix: str = "WANDB_"
+        special_env_var_names = {


Would be good to add a note about what makes these specials and when a dev should add something to this list.

dmitryduev · 2022-01-21T04:10:55Z

Many thanks @vanpelt!

Overall looks great! One addition I would suggest is a quick explainer on how to add new settings. We could put this in a comment of wandb_settings or one of our doc files, but there's a lot to think about between env vars, validation, and serialization that would be good to clearly define.
Good point, will add a section to CONTRIBUTING.md, there's quite a bit going on to have it only in the comments, me thinks.

raubitsj

Awesome. Lots of great stuff here!

dmitryduev added 30 commits December 6, 2021 10:20

import Kyle's tests

464a8ec

Merge branch 'master' of github.com:wandb/client into settings

6a5cdb0

require attrs

baf26b4

Merge branch 'master' of github.com:wandb/client into settings

c2f81b5

save current thoughts

f5f7e95

save current thoughts

da39aca

save current thoughts

ac30d14

save progress

c7b9ceb

save progress

d428657

save progress

983418e

Merge branch 'master' of github.com:wandb/client into settings

46d5be4

add args from old Settings.__init__

e061fcb

add args from old Settings.__init__

76b22bb

add more validators

087d5a6

save progress

465baba

save progress

fe1534a

Merge branch 'master' of github.com:wandb/client into settings

a0bd329

Merge branch 'master' of github.com:wandb/client into settings

f52e725

Merge branch 'master' of github.com:wandb/client into settings

7888788

sort settings attribs

9f9e51d

explicitely 'allocate' attribs

7098baa

Merge branch 'master' of github.com:wandb/client into settings

af831a0

port private properties

2e4e764

port progress

e85c3ef

wip: wandb_setup.py

82166a2

wip: wandb_setup.py

c3903af

wip: wandb_setup.py

dd42744

wip: wandb_init.py and wandb_run.py

ebdfc5c

wip: wandb_init.py and wandb_run.py

a7d4205

setup sets up, init inits, and run runs

48aae06

dmitryduev and others added 11 commits January 18, 2022 14:54

fix standalone_tests/grpc_client.py

55a0925

Merge branch 'master' into settings

d4b3be5

only mock hf in telemetry test

c939dcb

rm unnecessary wandb.finish() calls in test_keras.py

d6bd580

add tests to wandb_settings_test to increase coverage

8227018

fix linting

7fff44a

fix test_non_writable_root_dir for circleci

bce9ba3

fix test_non_writable_root_dir for circleci

b0b1f1d

fix merge conflict

ad27e4d

pin yea-wandb==0.7.29

e7053ec

pin yea-wandb==0.7.30

cf4a686

vanpelt approved these changes Jan 20, 2022

View reviewed changes

Merge branch 'master' into settings

c6677dd

dmitryduev and others added 11 commits January 21, 2022 22:45

run.settings returns a frozen copy

c8084e8

get the best of both static and runtime checking worlds

a62385d

py36 typing stuff

ac490af

py36 typing stuff

9b4b65b

fix linting

434b3aa

fix linting

7c59685

fix linting

28cecd3

update CONTRIBUTING.md

98b7fce

make apply_* methods private again

26ccc77

nail polish

bba6ac2

Merge branch 'master' into settings

1cdfba5

raubitsj approved these changes Jan 24, 2022

View reviewed changes

raubitsj merged commit 69c0d83 into master Jan 24, 2022

raubitsj deleted the settings branch January 24, 2022 22:00

This was referenced Jan 28, 2022

Settings: cautiously warn instead of erroring out for now #3180

Merged

[WB-7410] Set user_settings to an empty dict in offline mode in _WandbSetup__WandbSetup._load_user_settings #2924

Closed

Conversation

dmitryduev commented Dec 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Implementation details

Other stuff in this PR

Testing

Some notes

Status as of 1/11/2022

Uh oh!

vanpelt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitryduev Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanpelt Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitryduev Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitryduev commented Jan 21, 2022

Uh oh!

raubitsj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmitryduev commented Dec 24, 2021 •

edited

Loading

dmitryduev Jan 21, 2022 •

edited

Loading

vanpelt Jan 21, 2022 •

edited

Loading

dmitryduev Jan 21, 2022 •

edited

Loading