Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

chore: re-enable e2e tests for web-sveltekit#63910

Merged
bahrmichael merged 11 commits into
mainfrom
bahrmichae/e2e-svelte-fixes
Jul 18, 2024
Merged

chore: re-enable e2e tests for web-sveltekit#63910
bahrmichael merged 11 commits into
mainfrom
bahrmichae/e2e-svelte-fixes

Conversation

@bahrmichael

@bahrmichael bahrmichael commented Jul 18, 2024

Copy link
Copy Markdown
Contributor

The goal of this PR is to increase the stability of web-sveltekit e2e-tests so that we don't have to rely on manual runs anymore. They have previously been disabled due to a high number of failures: https://github.com/sourcegraph/sourcegraph/pull/63874


To improve the stability of web-sveltekit e2e-tests, I used sg ci bazel test //client/web-sveltekit:e2e_test --runs_per_test=N with N=5,10,15 to see which tests break under different levels of pressure on the machine. The logs looked like it was mostly timeouts, that got worse when increasing N. That means we can check where tests will break due to timeouts, but we don't really need to raise timeouts so far that it would work with N=20.

With N=5, 10 we get a good understanding if our timeouts are high enough.

You can see two CI runs here after applying higher timeouts and skipping a consistently failing test:


From logs of some other run that I don't have the link to anymore, we can see that some tests take up to 50s so a timeout of 60s (instead of the default 30s) for CI should be a good new ceiling.

Slow test file: [chromium] › src/routes/[...repo=reporev]/(validrev)/(code)/-/blob/[...path]/page.spec.ts (48.9s)
--
  | Slow test file: [chromium] › src/routes/[...repo=reporev]/(validrev)/(code)/page.spec.ts (45.0s)
  | Slow test file: [chromium] › src/routes/search/page.spec.ts (40.6s)
  | Slow test file: [chromium] › src/routes/[...repo=reporev]/(validrev)/(code)/-/tree/[...path]/page.spec.ts (31.7s)
  | Slow test file: [chromium] › src/routes/layout.spec.ts (31.5s)

Test plan

CI

Changelog

@cla-bot cla-bot Bot added the cla-signed label Jul 18, 2024
Comment on lines +18 to +21
timeout: process.env.BAZEL ? 60_000 : 30_000,
expect: {
timeout: process.env.BAZEL ? 20_000 : 5_000,
},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, the timeout per test goes from 5s to 30s on local machines, the timeout per assertion stays at 5s; and on Bazel we allow each test to run for 60s with a 20s timeout per assertion. Things may run a bit slower there than on our dev machines.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it is possible to collect profiles for the tests to see what is causing such a big difference between CI vs locally? Based on what I know about the recent Intel and Apple chips, a 2x or 3x speedup for single-core performance seems like quite a lot, so it seems weird that we need such large timeouts in CI compared to local dev.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Playwright we can get traces, where we can compare how long each step takes on local vs. CI. I'm not sure about more low-level profiling, without doing more dev-infra stuff on the CI agents.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sourcegraph/developer-infrastructure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the ping. bazel-do jobs didn't previously upload the recorded profile, which is now fixed in #64148 .

I went ahead and fired up a job for you with your test target, you can explore the profile here: https://buildkite.com/sourcegraph/sourcegraph/builds/284928#019103c3-5f71-4aac-89ce-065a5b331d09

You'll need to to feed that to the chrome trace explorer or better, use https://ui.perfetto.dev/ if you want a fancier UI.

It's going to be pretty hairy, so if after a first look, you don't spot much, reach us out on #discuss-dev-infra to schedule a call to explore it together.

Comment on lines -107 to -110
async function openSidebar(page: Page): Promise<void> {
return page.getByLabel('Open sidebar').click()
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined this function so that the test output lets us find the line where it failed more easily.

})

test('error handling non-existing directory -> root', async ({ page, sg }) => {
test.skip('error handling non-existing directory -> root', async ({ page, sg }) => {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped this test because it failed reliably when closing the sidebar. There's something broken that can't be fixed with increased timeouts.

@bahrmichael bahrmichael marked this pull request as ready for review July 18, 2024 13:13
@bahrmichael bahrmichael changed the title chore: reduce flakiness of web-sveltekit e2e-tests chore: reduce flakiness of web-sveltekit e2e-tests by increasing timeouts Jul 18, 2024
@bahrmichael bahrmichael requested a review from a team July 18, 2024 13:13
@bahrmichael bahrmichael changed the title chore: reduce flakiness of web-sveltekit e2e-tests by increasing timeouts chore: re-enable e2e tests for web-sveltekit Jul 18, 2024

@camdencheek camdencheek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +18 to +21
timeout: process.env.BAZEL ? 60_000 : 30_000,
expect: {
timeout: process.env.BAZEL ? 20_000 : 5_000,
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sourcegraph/developer-infrastructure

@bahrmichael bahrmichael merged commit d5292ca into main Jul 18, 2024
@bahrmichael bahrmichael deleted the bahrmichae/e2e-svelte-fixes branch July 18, 2024 17:31
bahrmichael added a commit that referenced this pull request Jul 22, 2024
jhchabran referenced this pull request Jul 30, 2024
Follow-up to [this
comment](https://github.com/sourcegraph/sourcegraph/pull/63910#discussion_r1683190713)
where the need was raised for having profiles for further inspection of
problematic targets when run in isolation.

Basically, every bazel-do will now collect the profile, and it'll be
uploaded as a job artifact.

## Test plan

See https://buildkite.com/sourcegraph/sourcegraph/builds/284913 for a
test run.

<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->

## Changelog

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants