Fix flaky tests (publish-panel.spec.js focus assertion before panel close completed)#77893
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
| 'aria-expanded', | ||
| 'false' | ||
| ); | ||
|
|
There was a problem hiding this comment.
This is an unlikely fix. The .toBeFocused() check on the next should wait until the button is focused, i.e., should be fine with finding an unfocused button several times before giving up. The failure report looks like this:
2) [chromium] › specs/editor/various/publish-panel.spec.js:11:2 › Post publish panel › should move focus back to the Publish panel toggle button when canceling
Error: expect(locator).toBeFocused() failed
Locator: locator('role=region[name="Editor top bar"i]').locator('role=button[name="Publish"i]')
Expected: focused
Received: inactive
Timeout: 5000ms
Call log:
- Expect "toBeFocused" with timeout 5000ms
- waiting for locator('role=region[name="Editor top bar"i]').locator('role=button[name="Publish"i]')
9 × locator resolved to <button type="button" aria-expanded="true" aria-disabled="false" class="components-button editor-post-publish-panel__toggle editor-post-publish-button__button is-primary is-compact">Publish</button>
- unexpected value "inactive"
31 |
32 | // Test focus is moved back to the Publish panel toggle button.
> 33 | await expect( publishPanelToggleButton ).toBeFocused();
| ^
34 | } );
Note that when the button failed to be focused, it still has aria-expanded=true attribute. Waiting for aria-expanded=false should fail the same way as waiting for focus. The 9 × locator bit means that Playwright tried 9 times before giving up.
It looks almost like the test fails to click the "Cancel" button at all. It can happen when it is disabled, i.e., the isSavingNonPostEntityChanges selector value is true. This happens when some other entity is saving in the background. Strange.
There was a problem hiding this comment.
Yeah, this is an odd, flaky one. Trace logs say that the button is enabled, but the click still fails.
Update: Our a11y way of disabling buttons uses accessibleWhenDisabled, which renders aria-disabled="true" and intercepts click/mousedown with stopPropagation()+preventDefault(), so onClick never fires. This is why the logs were misleading.
@jsnajdr, suspicion was right. I'm going to create PR for this. See #78082.
Trace logs
waiting for locator('role=region[name="Editor publish"i]').locator('role=button[name="Cancel"i]')
locator resolved to <button type="button" class="components-button is-secondary is-compact">Cancel</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
done scrolling
performing click action
click action done
waiting for scheduled navigations to finish
navigations have finished
There was a problem hiding this comment.
It looks almost like the test fails to click the "Cancel" button at all.
Looking at the Playwright trace stored as an action artifact, I see that the test has trouble clicking "Cancel" because the editor-post-publish-panel__slide-in-animation animation is running. The PostPublishPanel is sliding in and the "Cancel" button is moving. That's why the click sometimes fails.
It's very strange because this animation is gated with @media not (prefers-reduced-motion) and Playwright is configured with reducedMotion: 'reduce'. The animation shouldn't run. And yet, as anyone can check themselves by running the test with npm run test:e2e:debug in UI mode, the animation is running. And if you add a background-color: red style to make is visually obvious whether prefers-reduced-motion is respected, then the panel is red! It's also interesting that adding a check in code:
console.log( 'prefers-reduced-motion:', window.matchMedia( 'prefers-reduced-motion' ).matches );it will log true. And yet the visual output based on CSS will behave like false.
I think this is a Playwright bug, I can reproduce it even with a minimal independent example. I'll look into this further today.
There was a problem hiding this comment.
The aria-expanded fix proposed here is actually a very promising one, it just took me some time to undestand why 🙂
The publishPanelToggleButton is rendered the entire time while the test runs, and it's sometimes just obscured by the publish panel. After clicking "Cancel", we are probably checking a button that is rendered, but is just obscured by the still-hiding publish panel.
Let's merge this and see what happens. According to logs in #77721, this test is one of the flakiest we have, fails many times per hour. We should learn quickly how good the fix was.
This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).
What?
Gutenberg CI tests seem to be somewhat flaky. Of the PRs I've submitted recently, it feels like most PRs that have no "real" CI failures fail on initial submission and need to be re-run once or twice to pass CI. I haven't actually run the numbers; maybe it's 25% or 30% or something. In any case, a decent number of CI runs fail due to flakes.
This PR attempts to fix #68892, #77385, and #77721. It probably make sense to clean up more than just these, but since I'm not familiar with this codebase, as with the RTC fuzzing bugs, I wanted to start with a small PR to get people's opinions on automated AI flake fixing before submitting more or larger PRs.
AI TEXT
The revised plan is a PR boundary, not a complete flake program. Its purpose is to select fixes where the cause is local, the reviewer can understand the change from one spec file, and the original behavior remains covered by the same end-to-end assertion. That is why the best first PR is limited to
homepage-settings.spec.js: the failure mechanism is a fixture and locator problem, and the fix can isolate the page fixtures and scope row selection without deleting coverage or changing the user-visible behavior under test.The plan intentionally rejects fixes that depend on later cleanup, monitoring, or policy enforcement. It does not use skips, quarantines, broad retries, weakened assertions, or test deletion because those reduce the chance that the PR actually improves correctness. A fix is acceptable only when it replaces a race with a deterministic condition tied to the operation being tested, such as an exact locator, an already-observable UI state, or a clearly completed editor action.
The excluded areas are not dismissed as unimportant. RTC synchronization, router readiness, Openverse/media behavior, upload lifecycle semantics, REST retry policy, and reporter hygiene can all be real sources of flakiness. They are excluded from the first PR because each has a wider review surface and a higher risk of either changing product behavior or adding infrastructure that needs ongoing ownership. Those should be handled as separate focused changes only when there is concrete causal evidence and the fix preserves or strengthens the existing coverage.
Revised Fix Plan
requestUtilsretry layer, dashboard, bot rule, skip/quarantine mechanism, or any other change that requires ongoing follow-up.socket hang up.--repeat-eachfor the modified tests, and verify the original behavior is still asserted. Do not make post-merge monitoring or future cleanup part of the PR's correctness.Low-Risk PR Shape
homepage-settings.spec.jsonly. It is a clear test-fixture/locator bug: isolate page fixtures and scope rows exactly while preserving the same homepage/posts-page action assertions.publish-panel.spec.jsonly if the change is a narrow wait for panel closure oraria-expanded=falsebefore the existing focus assertion. Keep it test-only.post-content-focus-mode.spec.jsandclassic.spec.jsas follow-up PRs unless the first PR remains trivially small. They are still local, but they touch different editor flows and are easier to review separately.Concrete Small-Scope Fix Selection
Checked again on 2026-05-02, the only fixes that fit the low-risk idea are the two test-only fixes below. They are direct responses to the latest retained failure examples, they preserve the existing assertions, and they do not require shared helpers, product changes, retries, skips, quarantines, or follow-up ownership.
test/e2e/specs/site-editor/homepage-settings.spec.js/should show correct homepage actions based on current homepage or posts pageHomepage,Sample page, andDraft page, then derive row locators from exact page titles rather than broadgetByLabel()matches.Sample PageandSample pageas two matching rows. Fixture reset removes leaked page rows; exact row selection prevents case/substring title collisions from selecting the wrong DataViews row. The same homepage and posts-page menu assertions remain in place.test/e2e/specs/editor/various/publish-panel.spec.js/should move focus back to the Publish panel toggle button when cancelingCancel, wait for the publish panel to actually close, preferably by waiting for the top-bar Publish toggle to havearia-expanded="false", then keep the existingtoBeFocused()assertion.aria-expanded="true"while the test is already asserting focus. Waiting for the close state ties the assertion to the operation under test instead of sampling focus while the panel is still open.Actual PR Branch Fix Coverage
The created code branch is
try/test-flakes-prat commit0aba5a3831f(Stabilize small-scope flaky e2e tests). It changes onlytest/e2e/specs/site-editor/homepage-settings.spec.jsandtest/e2e/specs/editor/various/publish-panel.spec.js.That branch fixes these flakes:
homepage-settings.spec.js, then replaces broadgetByLabel()row filtering with exact, case-sensitive row locators. This directly targets the observed strict-mode failure where bothSample PageandSample pagematched the same intended row lookup.aria-expanded="false"after clickingCancel, then keeps the originaltoBeFocused()assertion. This directly targets the observed failure where the test asserted focus while the panel was still open (aria-expanded="true").That branch does not fix #77720 (
post-content-focus-mode.spec.js) or #77705 / #36123 (classic.spec.js). Those remain intentionally excluded from the small PR because they need focused review of broader editor flows.Do not include
post-content-focus-mode.spec.jsin this PR even though its latest failure is a local-looking wait forInitial content. That flake lives inside the "Show template" rendering mode and template-part embedding path; a correct fix probably needs a Post Content readiness helper and should be reviewed as a focused editor-flow PR.Do not include
classic.spec.jsin this PR even though explicitly selecting the Classic block after undo might address the latest toolbar timeout. The test covers a long TinyMCE/media-modal/upload/convert/undo flow and has a long historical flake issue; keep it separate so reviewers can evaluate whether the fix actually preserves the media conversion and undo coverage rather than merely making the toolbar button clickable.END AI TEXT