fix(sdk): resolve config type regression and threading issues in lazy loading#10682
Merged
thanos-wandb merged 6 commits intomainfrom Oct 15, 2025
Merged
fix(sdk): resolve config type regression and threading issues in lazy loading#10682thanos-wandb merged 6 commits intomainfrom
thanos-wandb merged 6 commits intomainfrom
Conversation
URGENT: Fixes two critical issues introduced in v0.22.1 lazy loading: 1. Config Type Regression (High Priority) Problem: run.config returns string instead of dict Root Cause: _load_from_attrs() bypassed due to _is_loaded flag logic Customer Impact: TypeError in artifact.logged_by() workflows Fix: - Modified _load_with_fragment to respect force parameter - Added defensive conversion in config/summary/system_metrics properties - Ensure _load_from_attrs called when loading full fragments 2. Threading Deadlock (High Priority) Problem: Jobs getting stuck in concurrent/futures._base.py:439 Root Cause: ThreadPoolExecutor in upgrade_to_full() overwhelming asyncio Customer Impact: Training jobs hanging indefinitely Fix: - Replace parallel ThreadPoolExecutor with sequential loading - Add error handling to prevent cascading failures - Preserve functionality while eliminating deadlock risk Validation: - Tested exact customer reproduction case (artifact.logged_by()) - Verified config returns dict type consistently - Confirmed sequential upgrade prevents deadlock pattern - All existing functionality preserved Fixes reported issues: - Pinterest workflows: TypeError: string indices must be integers - Pinterest training jobs: stuck in Future.result() deadlock - Affects all workflows using artifact.logged_by() + run.config access
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
nickpenaranda
approved these changes
Oct 10, 2025
jacobromero
reviewed
Oct 10, 2025
timoffex
reviewed
Oct 10, 2025
…cus on config regression - Simplify defensive type checking as suggested by jacobromero (_convert_to_dict already handles dict inputs as noop) - Revert threading changes since deadlock is fixed in PR #10683 - Update changelog wording to be more positive - Focus PR solely on the config type regression fix
jacobromero
approved these changes
Oct 14, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
run.configis a string if loaded viawandb.Api().runs(..., lazy=True)#10647Issue: config type regression
User impact: workflows failing with
TypeError: string indices must be integersReproduction:
Root cause:
_load_from_attrs()conversion logic bypassed due to_is_loadedflag issuesFix: Ensure proper string→dict conversion + defensive property checks
Testing
How was this PR tested?
Regression Testing:
artifact.logged_by()+run.configaccess returns proper dict typeTest Cases: