[WB-7841] Fix python sweep agent for users of wandb service / pytorch-lightning#3465
[WB-7841] Fix python sweep agent for users of wandb service / pytorch-lightning#3465
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3465 +/- ##
==========================================
+ Coverage 81.62% 81.63% +0.01%
==========================================
Files 236 236
Lines 29113 29116 +3
==========================================
+ Hits 23763 23770 +7
+ Misses 5350 5346 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
kptkin
left a comment
There was a problem hiding this comment.
Glad it was an easy fix. Could you add a test for this?
|
BTW, tried this fix for wandb.agent that start a run a function that executes multiprocessing and it breaks: which is an issue that our growth asked for: WB-8808 (so from what I understand sweep job doesn't allow multiprocessing) |
|
@kptkin right now the thread context called by wandb.agent() gets a hardcoded runid, so this example wont work with service or without. We have a workaround at: We will be able to support this eventually, but we need to figure out a better way to pass around the RUN_ID from the sweep or make it easier for users to spawn a new context inside a sweep that is independent of a sweep. |
Fixes WB-7841
Description
Fix pyagent when using wandb service.
For now, pyagent is aggressively reseting the setup object between training attempts.. This makes some things work but it was breaking the service session. For now we will teardown and restart the service session.
Testing
Manually tested with sweep_check.py when enabling service... I think it is time to add a mockserver for pyagent.
Checklist