-
Notifications
You must be signed in to change notification settings - Fork 27.4k
ASAN shards are very unbalanced #72368
Copy link
Copy link
Closed
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: ciRelated to continuous integrationRelated to continuous integrationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
Have a look at https://hud.pytorch.org/ci/pytorch/pytorch/master?name_filter=asan:
Typical run time for shard1 is 3h30m for shard2 is 1h30m and for shard3 is 1h20m
- Can it be because shard1 runs c++ tests?
- Or because sharding calculations are made based on runtime of optimized builds (i.e. windows and asan shards would always be scewed towards linux-cuda tests runtimes?)
- Is there a bug in sharing algorithm?
(Perhaps it ignores skipped/slow test metrics when calculating testfile runtimes?)
Tests (at the file granularity) are selected for each shards based on previous nightly runtime using following function
pytorch/tools/testing/test_selections.py
Line 170 in b730768
| def get_shard_based_on_S3(which_shard: int, num_shards: int, tests: List[str], test_times_file: str) -> List[str]: |
Versions
N/A it relates to CI
cc @seemethere @malfet @pytorch/pytorch-dev-infra
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: ciRelated to continuous integrationRelated to continuous integrationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Done