Skip to content

[BE] Skip binary smoke tests in PR#87340

Closed
huydhn wants to merge 2 commits intopytorch:masterfrom
huydhn:skip-binary-smoke-test-in-pr
Closed

[BE] Skip binary smoke tests in PR#87340
huydhn wants to merge 2 commits intopytorch:masterfrom
huydhn:skip-binary-smoke-test-in-pr

Conversation

@huydhn
Copy link
Copy Markdown
Contributor

@huydhn huydhn commented Oct 20, 2022

My thought behind this:

  • We now have land check ciflow/trunk for every PR and the smoke test linux-binary-manywheel is the job with the longest duration at the moment (2.6h at p90 on https://hud.pytorch.org/metrics). This is a long pole TTS
  • Looking at the Rockset data, I count only 28 cases in which the linux-binary-manywheel failed while trunk succeeded. These binary build failures were flaky or cancelled or were old (CUDA 10.2). It means that trunk signals can safely cover binary smoke test signals, making the latter redundant in PR
  • We are already running exactly the same step in trunk. AFAIK, it's very rare to have a PR reverted solely because of breaking binary builds. Normally, other signals come first like breaking tests or land race

So, it makes sense to skip this relatively slow step and run it only in trunk. Also here is the Rockset query for reference. It query all the occurrences (ciflow/trunk) where linux-binary-manywheel failed while trunk succeeded, thus linux-binary-manywheel could block merge.

WITH
binary_builds AS (
    SELECT
        workflow_run.head_branch,
        workflow_run.conclusion,
        workflow_run.head_sha,
        workflow_run.html_url
    FROM
        commons.workflow_run
    WHERE
        workflow_run.head_branch LIKE 'ciflow/trunk/%'
        AND workflow_run.name = 'linux-binary-manywheel'
        AND workflow_run.conclusion != 'null'
        AND workflow_run.conclusion != 'cancelled'
      GROUP BY
          workflow_run.conclusion,
          workflow_run.head_branch,
          workflow_run.head_sha,
          workflow_run.html_url
),
trunk_builds AS (
    SELECT
        workflow_run.head_branch,
        workflow_run.conclusion,
        workflow_run.head_sha,
        workflow_run.html_url
    FROM
        commons.workflow_run
    WHERE
        workflow_run.head_branch LIKE 'ciflow/trunk/%'
        AND workflow_run.name = 'trunk'
        AND workflow_run.conclusion != 'null'
        AND workflow_run.conclusion != 'cancelled'
    GROUP BY
        workflow_run.conclusion,
        workflow_run.head_branch,
        workflow_run.head_sha,
        workflow_run.html_url
)
SELECT
    binary_builds.head_branch,
    binary_builds.head_sha,
    binary_builds.conclusion AS binary_builds_conclusion,
    trunk_builds.conclusion AS trunk_builds_conclusion,
    binary_builds.html_url AS binary_builds_html_url,
    trunk_builds.html_url AS trunk_builds_html_url
FROM
    binary_builds
    LEFT JOIN trunk_builds ON binary_builds.head_sha = trunk_builds.head_sha AND binary_builds.head_branch = trunk_builds.head_branch
WHERE
    binary_builds.conclusion != 'success'
    AND trunk_builds.conclusion = 'success'
ORDER BY
    binary_builds.head_branch DESC

@huydhn huydhn requested a review from atalman October 20, 2022 00:27
@huydhn huydhn self-assigned this Oct 20, 2022
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 20, 2022
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Oct 20, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87340

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit 4a81047:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@huydhn huydhn added ciflow/trunk Trigger trunk jobs on your pull request test-config/default labels Oct 20, 2022
@huydhn huydhn marked this pull request as ready for review October 20, 2022 00:50
@huydhn huydhn requested a review from a team as a code owner October 20, 2022 00:50
@huydhn huydhn requested review from a team and removed request for a team October 20, 2022 00:50
@huydhn huydhn marked this pull request as draft October 20, 2022 16:16
@huydhn huydhn marked this pull request as ready for review October 20, 2022 18:17
@huydhn
Copy link
Copy Markdown
Contributor Author

huydhn commented Oct 20, 2022

From the discussion with @atalman, here are some concerns in doing this:

  • Increase the chance to break nightly
  • Add pressure on the diff train oncall and on the people who now must bisect nightlies

This is indeed a trade-off between devx (running binary smoke tests twice in PR and master while asking people to wait for it) and coverage.

@huydhn
Copy link
Copy Markdown
Contributor Author

huydhn commented Oct 20, 2022

@pytorchbot rebase -b master

@pytorch pytorch deleted a comment from pytorch-bot bot Oct 20, 2022
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 20, 2022
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 20, 2022
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 20, 2022
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased skip-binary-smoke-test-in-pr onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via git checkout skip-binary-smoke-test-in-pr && git pull --rebase)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request test-config/default topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants