Skip to content

Fix flaky tests (publish-panel.spec.js focus assertion before panel close completed)#77893

Merged
jsnajdr merged 2 commits into
WordPress:trunkfrom
danluu:try/test-flakes-pr
May 14, 2026
Merged

Fix flaky tests (publish-panel.spec.js focus assertion before panel close completed)#77893
jsnajdr merged 2 commits into
WordPress:trunkfrom
danluu:try/test-flakes-pr

Conversation

@danluu

@danluu danluu commented May 2, 2026

Copy link
Copy Markdown
Contributor

This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).

What?

Gutenberg CI tests seem to be somewhat flaky. Of the PRs I've submitted recently, it feels like most PRs that have no "real" CI failures fail on initial submission and need to be re-run once or twice to pass CI. I haven't actually run the numbers; maybe it's 25% or 30% or something. In any case, a decent number of CI runs fail due to flakes.

This PR attempts to fix #68892, #77385, and #77721. It probably make sense to clean up more than just these, but since I'm not familiar with this codebase, as with the RTC fuzzing bugs, I wanted to start with a small PR to get people's opinions on automated AI flake fixing before submitting more or larger PRs.

AI TEXT

The revised plan is a PR boundary, not a complete flake program. Its purpose is to select fixes where the cause is local, the reviewer can understand the change from one spec file, and the original behavior remains covered by the same end-to-end assertion. That is why the best first PR is limited to homepage-settings.spec.js: the failure mechanism is a fixture and locator problem, and the fix can isolate the page fixtures and scope row selection without deleting coverage or changing the user-visible behavior under test.

The plan intentionally rejects fixes that depend on later cleanup, monitoring, or policy enforcement. It does not use skips, quarantines, broad retries, weakened assertions, or test deletion because those reduce the chance that the PR actually improves correctness. A fix is acceptable only when it replaces a race with a deterministic condition tied to the operation being tested, such as an exact locator, an already-observable UI state, or a clearly completed editor action.

The excluded areas are not dismissed as unimportant. RTC synchronization, router readiness, Openverse/media behavior, upload lifecycle semantics, REST retry policy, and reporter hygiene can all be real sources of flakiness. They are excluded from the first PR because each has a wider review surface and a higher risk of either changing product behavior or adding infrastructure that needs ongoing ownership. Those should be handled as separate focused changes only when there is concrete causal evidence and the fix preserves or strengthens the existing coverage.

Revised Fix Plan

  1. Keep the PR small and test-focused. Do not include a flake ledger, reporter rewrite, CI workflow redesign, broad requestUtils retry layer, dashboard, bot rule, skip/quarantine mechanism, or any other change that requires ongoing follow-up.
  2. Preserve the asserted behavior exactly. For each touched test, state the behavior protected by the current assertion and keep that assertion or an equivalent assertion in the same PR. Do not reduce expectations, loosen assertions, or turn an end-to-end behavior into only lower-level coverage.
  3. Do not delete tests in this PR. Deletion or material narrowing should be treated as out of scope unless the certainty is extraordinarily high and reviewers can see identical or stronger replacement coverage. The default is to rewrite the flaky interaction, not remove it.
  4. Fix only flakes with a concrete, local mechanism and a small patch. Acceptable mechanisms are fixture isolation, exact locator scoping, waiting for an already-observable UI state, and explicitly selecting an existing block before using its toolbar. Do not include entries whose latest evidence is only socket hang up.
  5. Avoid shared abstractions. Do not add cross-suite readiness APIs or package-level helpers. A small helper inside a touched spec is acceptable only if it keeps the diff simpler and has no behavior outside that file.
  6. Exclude higher-risk areas from this PR: Openverse mocking, global REST retry/backoff, flake issue deduplication, DataViews helper design, RTC synchronization, upload lifecycle semantics, and broad router synchronization. Those may deserve separate focused PRs, but they add risk here.
  7. Validate only the changed surface. Run the directly affected spec(s), preferably with --repeat-each for the modified tests, and verify the original behavior is still asserted. Do not make post-merge monitoring or future cleanup part of the PR's correctness.

Low-Risk PR Shape

  1. Best first PR: fix homepage-settings.spec.js only. It is a clear test-fixture/locator bug: isolate page fixtures and scope rows exactly while preserving the same homepage/posts-page action assertions.
  2. If a second fix is included, choose publish-panel.spec.js only if the change is a narrow wait for panel closure or aria-expanded=false before the existing focus assertion. Keep it test-only.
  3. Keep post-content-focus-mode.spec.js and classic.spec.js as follow-up PRs unless the first PR remains trivially small. They are still local, but they touch different editor flows and are easier to review separately.
  4. Keep router, DataViews, RTC, upload-save-lock, Openverse/media, REST retry/backoff, and reporter work out of this low-risk PR.

Concrete Small-Scope Fix Selection

Checked again on 2026-05-02, the only fixes that fit the low-risk idea are the two test-only fixes below. They are direct responses to the latest retained failure examples, they preserve the existing assertions, and they do not require shared helpers, product changes, retries, skips, quarantines, or follow-up ownership.

Include Flake issue(s) File / test Concrete fix How the fix hits the flake Coverage effect
Yes, first #77385, #68892 test/e2e/specs/site-editor/homepage-settings.spec.js / should show correct homepage actions based on current homepage or posts page Own the page fixtures before seeding and select the intended row with exact title scoping. The safest shape is to reset page state before creating Homepage, Sample page, and Draft page, then derive row locators from exact page titles rather than broad getByLabel() matches. The latest failure is Playwright strict mode resolving Sample Page and Sample page as two matching rows. Fixture reset removes leaked page rows; exact row selection prevents case/substring title collisions from selecting the wrong DataViews row. The same homepage and posts-page menu assertions remain in place. No reduction. The test still verifies that the current homepage cannot be set as homepage/posts page and that the posts page action disappears after setting a posts page.
Yes, second only if the PR remains tiny #77721 test/e2e/specs/editor/various/publish-panel.spec.js / should move focus back to the Publish panel toggle button when canceling After clicking Cancel, wait for the publish panel to actually close, preferably by waiting for the top-bar Publish toggle to have aria-expanded="false", then keep the existing toBeFocused() assertion. The latest failure shows the Publish toggle still has aria-expanded="true" while the test is already asserting focus. Waiting for the close state ties the assertion to the operation under test instead of sampling focus while the panel is still open. No reduction. The focus assertion remains the coverage; the added wait only establishes that the cancel/close transition completed before checking focus return.

Actual PR Branch Fix Coverage

The created code branch is try/test-flakes-pr at commit 0aba5a3831f (Stabilize small-scope flaky e2e tests). It changes only test/e2e/specs/site-editor/homepage-settings.spec.js and test/e2e/specs/editor/various/publish-panel.spec.js.

That branch fixes these flakes:

Fixed by branch Flake issue(s) Why this branch fixes it
Yes #77385, #68892 The branch resets homepage/posts-page settings and deletes all pages before creating the three pages used by homepage-settings.spec.js, then replaces broad getByLabel() row filtering with exact, case-sensitive row locators. This directly targets the observed strict-mode failure where both Sample Page and Sample page matched the same intended row lookup.
Yes #77721 The branch waits for the Publish panel toggle to report aria-expanded="false" after clicking Cancel, then keeps the original toBeFocused() assertion. This directly targets the observed failure where the test asserted focus while the panel was still open (aria-expanded="true").

That branch does not fix #77720 (post-content-focus-mode.spec.js) or #77705 / #36123 (classic.spec.js). Those remain intentionally excluded from the small PR because they need focused review of broader editor flows.

Do not include post-content-focus-mode.spec.js in this PR even though its latest failure is a local-looking wait for Initial content. That flake lives inside the "Show template" rendering mode and template-part embedding path; a correct fix probably needs a Post Content readiness helper and should be reviewed as a focused editor-flow PR.

Do not include classic.spec.js in this PR even though explicitly selecting the Classic block after undo might address the latest toolbar timeout. The test covers a long TinyMCE/media-modal/upload/convert/undo flow and has a long historical flake issue; keep it separate so reviewers can evaluate whether the fix actually preserves the media conversion and undo coverage rather than merely making the toolbar button clickable.

END AI TEXT

@github-actions

github-actions Bot commented May 2, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: danluu <danluu@git.wordpress.org>
Co-authored-by: jsnajdr <jsnajdr@git.wordpress.org>
Co-authored-by: Mamaduka <mamaduka@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@danluu danluu force-pushed the try/test-flakes-pr branch from 2231852 to a27bc3f Compare May 2, 2026 20:49
@danluu danluu force-pushed the try/test-flakes-pr branch from a27bc3f to 4c5a34b Compare May 2, 2026 21:00
@t-hamano t-hamano added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label May 4, 2026
'aria-expanded',
'false'
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unlikely fix. The .toBeFocused() check on the next should wait until the button is focused, i.e., should be fine with finding an unfocused button several times before giving up. The failure report looks like this:

2) [chromium] › specs/editor/various/publish-panel.spec.js:11:2 › Post publish panel › should move focus back to the Publish panel toggle button when canceling 
    Error: expect(locator).toBeFocused() failed

    Locator:  locator('role=region[name="Editor top bar"i]').locator('role=button[name="Publish"i]')
    Expected: focused
    Received: inactive
    Timeout:  5000ms

    Call log:
      - Expect "toBeFocused" with timeout 5000ms
      - waiting for locator('role=region[name="Editor top bar"i]').locator('role=button[name="Publish"i]')
        9 × locator resolved to <button type="button" aria-expanded="true" aria-disabled="false" class="components-button editor-post-publish-panel__toggle editor-post-publish-button__button is-primary is-compact">Publish</button>
          - unexpected value "inactive"


      31 |
      32 | 		// Test focus is moved back to the Publish panel toggle button.
    > 33 | 		await expect( publishPanelToggleButton ).toBeFocused();
         | 		                                         ^
      34 | 	} );

Note that when the button failed to be focused, it still has aria-expanded=true attribute. Waiting for aria-expanded=false should fail the same way as waiting for focus. The 9 × locator bit means that Playwright tried 9 times before giving up.

It looks almost like the test fails to click the "Cancel" button at all. It can happen when it is disabled, i.e., the isSavingNonPostEntityChanges selector value is true. This happens when some other entity is saving in the background. Strange.

@Mamaduka Mamaduka May 8, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is an odd, flaky one. Trace logs say that the button is enabled, but the click still fails.

Update: Our a11y way of disabling buttons uses accessibleWhenDisabled, which renders aria-disabled="true" and intercepts click/mousedown with stopPropagation()+preventDefault(), so onClick never fires. This is why the logs were misleading.

@jsnajdr, suspicion was right. I'm going to create PR for this. See #78082.

Trace logs

waiting for locator('role=region[name="Editor publish"i]').locator('role=button[name="Cancel"i]')
locator resolved to <button type="button" class="components-button is-secondary is-compact">Cancel</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
done scrolling
performing click action
click action done
waiting for scheduled navigations to finish
navigations have finished

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks almost like the test fails to click the "Cancel" button at all.

Looking at the Playwright trace stored as an action artifact, I see that the test has trouble clicking "Cancel" because the editor-post-publish-panel__slide-in-animation animation is running. The PostPublishPanel is sliding in and the "Cancel" button is moving. That's why the click sometimes fails.

It's very strange because this animation is gated with @media not (prefers-reduced-motion) and Playwright is configured with reducedMotion: 'reduce'. The animation shouldn't run. And yet, as anyone can check themselves by running the test with npm run test:e2e:debug in UI mode, the animation is running. And if you add a background-color: red style to make is visually obvious whether prefers-reduced-motion is respected, then the panel is red! It's also interesting that adding a check in code:

console.log( 'prefers-reduced-motion:', window.matchMedia( 'prefers-reduced-motion' ).matches );

it will log true. And yet the visual output based on CSS will behave like false.

I think this is a Playwright bug, I can reproduce it even with a minimal independent example. I'll look into this further today.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aria-expanded fix proposed here is actually a very promising one, it just took me some time to undestand why 🙂

The publishPanelToggleButton is rendered the entire time while the test runs, and it's sometimes just obscured by the publish panel. After clicking "Cancel", we are probably checking a button that is rendered, but is just obscured by the still-hiding publish panel.

Let's merge this and see what happens. According to logs in #77721, this test is one of the flakiest we have, fails many times per hour. We should learn quickly how good the fix was.

@jsnajdr jsnajdr enabled auto-merge (squash) May 14, 2026 10:28
@jsnajdr jsnajdr changed the title Fix flaky tests (homepage-settings.spec.js duplicate Sample Page / Sample page row match, publish-panel.spec.js focus assertion before panel close completed) Fix flaky tests (publish-panel.spec.js focus assertion before panel close completed) May 14, 2026
@jsnajdr jsnajdr merged commit 5a8392e into WordPress:trunk May 14, 2026
41 of 42 checks passed
@github-actions github-actions Bot added this to the Gutenberg 23.3 milestone May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.

Projects

None yet

4 participants