Skip to content

ASAN shards are very unbalanced #72368

@malfet

Description

@malfet

🐛 Describe the bug

Have a look at https://hud.pytorch.org/ci/pytorch/pytorch/master?name_filter=asan:
Typical run time for shard1 is 3h30m for shard2 is 1h30m and for shard3 is 1h20m

  • Can it be because shard1 runs c++ tests?
  • Or because sharding calculations are made based on runtime of optimized builds (i.e. windows and asan shards would always be scewed towards linux-cuda tests runtimes?)
  • Is there a bug in sharing algorithm?
    (Perhaps it ignores skipped/slow test metrics when calculating testfile runtimes?)

Tests (at the file granularity) are selected for each shards based on previous nightly runtime using following function

def get_shard_based_on_S3(which_shard: int, num_shards: int, tests: List[str], test_times_file: str) -> List[str]:

Versions

N/A it relates to CI

cc @seemethere @malfet @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: ciRelated to continuous integrationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions