Skip to content

[WB-7841] Fix python sweep agent for users of wandb service / pytorch-lightning#3465

Merged
raubitsj merged 1 commit intomasterfrom
wb-7841-service-fix-pyagent
Apr 3, 2022
Merged

[WB-7841] Fix python sweep agent for users of wandb service / pytorch-lightning#3465
raubitsj merged 1 commit intomasterfrom
wb-7841-service-fix-pyagent

Conversation

@raubitsj
Copy link
Copy Markdown
Contributor

@raubitsj raubitsj commented Apr 2, 2022

Fixes WB-7841

Description

Fix pyagent when using wandb service.

For now, pyagent is aggressively reseting the setup object between training attempts.. This makes some things work but it was breaking the service session. For now we will teardown and restart the service session.

Testing

Manually tested with sweep_check.py when enabling service... I think it is time to add a mockserver for pyagent.

Checklist

  • Name PR "[WB-NNNN][WB-MMMM] Add support for..." similar to entries in CHANGELOG.md
  • Include reference to internal ticket "Fixes WB-NNNN" (and github issue "Fixes #NNNN" if applicable)

@raubitsj raubitsj requested a review from kptkin April 2, 2022 23:02
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2022

Codecov Report

Merging #3465 (6be7a4e) into master (1bc04df) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3465      +/-   ##
==========================================
+ Coverage   81.62%   81.63%   +0.01%     
==========================================
  Files         236      236              
  Lines       29113    29116       +3     
==========================================
+ Hits        23763    23770       +7     
+ Misses       5350     5346       -4     
Flag Coverage Δ
functest 58.19% <100.00%> (+0.03%) ⬆️
unittest 71.76% <33.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wandb/sdk/wandb_manager.py 94.48% <100.00%> (+0.13%) ⬆️
wandb/sdk/internal/artifacts.py 76.84% <0.00%> (-3.16%) ⬇️
wandb/sdk/internal/internal_api.py 82.06% <0.00%> (-0.41%) ⬇️
wandb/sdk/lib/git.py 76.35% <0.00%> (ø)
wandb/sdk/wandb_run.py 90.11% <0.00%> (+0.25%) ⬆️
wandb/sdk/launch/agent/agent.py 93.18% <0.00%> (+0.75%) ⬆️
wandb/sdk/internal/meta.py 90.74% <0.00%> (+3.08%) ⬆️

Copy link
Copy Markdown
Collaborator

@kptkin kptkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it was an easy fix. Could you add a test for this?

@kptkin
Copy link
Copy Markdown
Collaborator

kptkin commented Apr 3, 2022

BTW, tried this fix for wandb.agent that start a run a function that executes multiprocessing and it breaks:
(use case for this is if it is a multiprocessing training job with torch.multiprocessing.spawn for example)

import wandb
import multiprocessing as mp

def train_fn(rank):
    with wandb.init() as run:
        run.log({"rank": rank})

def agent_fn():
    procs = [mp.Process(target=train_fn, args=(rank,)) for rank in range(2)]
    for p in procs:
        p.start()

    for p in procs:
        p.join()

def sweep_fn():
    config = dict(
        method="random",
        parameters=dict(
            param0=dict(values=[2]),
            param1=dict(values=[0, 1, 4]),
            param2=dict(values=[0, 0.5, 1.5]),
            epochs=dict(value=4),
            )
        )
    sweep_id = wandb.sweep(config)
    wandb.agent(sweep_id, function=agent_fn, count=1)

if __name__ == "__main__":
    wandb.require("service")
    sweep_fn()

which is an issue that our growth asked for: WB-8808

(so from what I understand sweep job doesn't allow multiprocessing)

@raubitsj
Copy link
Copy Markdown
Contributor Author

raubitsj commented Apr 3, 2022

@kptkin
Great example.... that we should support... someday :)

right now the thread context called by wandb.agent() gets a hardcoded runid, so this example wont work with service or without.

We have a workaround at:
https://stackoverflow.com/questions/63469762/weightsbiases-sweep-keras-k-fold-validation

We will be able to support this eventually, but we need to figure out a better way to pass around the RUN_ID from the sweep or make it easier for users to spawn a new context inside a sweep that is independent of a sweep.

@raubitsj raubitsj merged commit 5dc9a15 into master Apr 3, 2022
@raubitsj raubitsj deleted the wb-7841-service-fix-pyagent branch April 3, 2022 01:46
@kptkin kptkin added this to the sdk-2022-04.1 milestone Apr 3, 2022
@raubitsj raubitsj changed the title [WB-7841] Remove service env on teardown [WB-7841] Fix python sweep agent for users of wandb service / pytorch-lightning Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants