Skip to content

[Task] chore(e2e): strengthen W1 regression coverage #598

@Astro-Han

Description

@Astro-Han

Goal

Strengthen W1 regression coverage so that more of the session-view rewrite is blocked by CI before human retest, and the remaining manual pass is reserved for true visual judgment.

When this task is complete, the missing W1 regression coverage is tracked in one place across behavior assertions, computed-style assertions, checklist segmentation, and correct tool-failure test guidance.

Scope

In scope:

  • Add the remaining W1 behavior assertions that are still being checked manually.
  • Add low-cost computed-style assertions where screenshot diff is unnecessary.
  • Split the local W1 checklist into CI-covered vs human-visual sections.
  • Document correct stderr-producing commands for tool-failure tests.

Detailed scope:

A. Behavior assertions

Add E2E specs for scenarios currently relied on by manual handtest:

  • scroll: reading_history persistence across content_resize / dock_resize
  • scroll: weak trackpad gesture (multiple low-delta wheels) still demotes to reading_history
  • scroll: nested raw tool output gesture isolation from parent timeline
  • chevron: 12px size + collapsed-right / expanded-down orientation
  • thinking indicator: only renders when working && assistantVisible === 0; disappears once any reasoning / prose / tool appears
  • bubble selectability: user-select: text on user-message-text / bubble-text / agent-prose / agent-reasoning subtrees

B. Computed style assertions

Add computed-style assertions for:

  • user bubble hairline: dark --border-weak, light --border-weaker inset box-shadow
  • trow-result-body inner descendants: mono-small font + fg-weak color, with no sans/base/large leakage
  • collapsible chevron icon size: 12px (DESIGN.md L412)

C. Handtest checklist segmentation

Update docs/design/session-view-w1-manual-checklist.md to split into two sections:

  • CI must pass first — items already covered by smoke gate
  • 5-minute visual pass — items that still require human design judgment

D. Tool failure test command guidance

Document that false is not a valid stderr-display test command because it produces no output by design. Recommend:

  • ls /definitely-not-exist
  • node -e 'console.error("boom"); process.exit(1)'

Out of scope:

Relevant files or context

Background:

Likely files:

  • packages/app/e2e/session/session-w1-*.spec.ts
  • packages/app/e2e/session/session-renderer-diagnostics.spec.ts
  • local checklist: docs/design/session-view-w1-manual-checklist.md

Verification

  • New behavior and computed-style specs pass within the current smoke runtime budget.
  • Checklist is split into CI-covered vs human-visual sections.
  • Failure-command guidance is updated in the checklist.
  • New specs pass on a clean main without regressing existing smoke coverage.

Execution mode

Agent should investigate and propose a plan first

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityappApplication behavior and product flowstaskNarrow execution, audit, spike, migration, tracking, or upstream follow-up worktech-debtSupplemental cleanup, maintainability, architecture, test, or quality debt contextuiDesign system and user interface

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions