Skip to content

refactor(core): dedupe QWEN_CODE_API_TIMEOUT_MS env override logic#3

Closed
B-A-M-N wants to merge 1 commit into
mainfrom
refactor/dedupe-timeout-env-override
Closed

refactor(core): dedupe QWEN_CODE_API_TIMEOUT_MS env override logic#3
B-A-M-N wants to merge 1 commit into
mainfrom
refactor/dedupe-timeout-env-override

Conversation

@B-A-M-N

@B-A-M-N B-A-M-N commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

What changed

  • Refactored: packages/core/src/models/modelConfigResolver.ts — extracted applyTimeoutEnvOverride() helper, replaced inline duplicates in both resolver paths

Why

PR QwenLM#3629 ("feat: make API timeout configurable via QWEN_CODE_API_TIMEOUT_MS env var") merged the same timeout override logic into both resolveModelConfig() and resolveQwenOAuthConfig(). This refactor deduplicates that code into a single shared helper.

Test plan

  • npx vitest run packages/core/src/models/modelConfigResolver.test.ts — all 37 tests pass (including existing OAuth and non-OAuth timeout regression tests)

Related

Follow-up to QwenLM#3629 (merged)
Related: QwenLM#3247

🤖 Generated with Claude Code


Summary by cubic

Refactors the QWEN_CODE_API_TIMEOUT_MS env override by extracting a shared applyTimeoutEnvOverride() helper in packages/core/src/models/modelConfigResolver.ts, used by both resolveModelConfig() and resolveQwenOAuthConfig(). Behavior is unchanged; precedence stays modelProvider > env > settings > default, reducing duplication and keeping OAuth and non-OAuth paths consistent.

Written for commit 15195d0. Summary will update on new commits.

…Override helper

Deduplicates the timeout env override logic that was duplicated in
resolveModelConfig() and resolveQwenOAuthConfig() after PR QwenLM#3629 merged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@github-actions

Copy link
Copy Markdown

Code Coverage Summary

Package Lines Statements Functions Branches
CLI 53.76% 53.76% 70.2% 79.57%
Core 75.13% 75.13% 77.66% 81.47%
CLI Package - Full Text Report
-------------------|---------|----------|---------|---------|-------------------
File               | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
-------------------|---------|----------|---------|---------|-------------------
All files          |   53.76 |    79.57 |    70.2 |   53.76 |                   
 src               |   61.85 |    62.43 |    62.5 |   61.85 |                   
  gemini.tsx       |   58.41 |    56.79 |      60 |   58.41 | ...22,730-733,741 
  ...ractiveCli.ts |   54.83 |    57.35 |   44.44 |   54.83 | ...07-715,719-720 
  ...liCommands.ts |   73.92 |     72.5 |     100 |   73.92 | ...40-264,289,389 
  ...ActiveAuth.ts |     100 |     87.5 |     100 |     100 | 66-80             
 ...cp-integration |    46.3 |    63.01 |   55.88 |    46.3 |                   
  acpAgent.ts      |   48.12 |    63.38 |   62.06 |   48.12 | ...91-793,807-815 
  authMethods.ts   |   12.19 |      100 |       0 |   12.19 | 11-31,34-38,41-50 
  errorCodes.ts    |       0 |        0 |       0 |       0 | 1-22              
  ...DirContext.ts |     100 |      100 |     100 |     100 |                   
 ...ration/service |   68.65 |    83.33 |   66.66 |   68.65 |                   
  filesystem.ts    |   68.65 |    83.33 |   66.66 |   68.65 | ...32,77-94,97-98 
 ...ration/session |   64.39 |    66.96 |   73.21 |   64.39 |                   
  ...ryReplayer.ts |   64.83 |    72.97 |   81.81 |   64.83 | ...68-269,277-278 
  Session.ts       |      59 |    62.91 |   64.28 |      59 | ...2049,2055-2058 
  ...entTracker.ts |   90.85 |    84.84 |      90 |   90.85 | ...35,199,251-260 
  index.ts         |       0 |        0 |       0 |       0 | 1-40              
  ...ssionUtils.ts |   84.21 |    77.77 |     100 |   84.21 | ...37-153,209-211 
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...ssion/emitters |   91.53 |    89.47 |   88.46 |   91.53 |                   
  BaseEmitter.ts   |   76.92 |    66.66 |      80 |   76.92 | 23-24,39-40,55-56 
  ...ageEmitter.ts |   82.22 |    83.33 |   83.33 |   82.22 | 29-44             
  PlanEmitter.ts   |     100 |      100 |     100 |     100 |                   
  ...allEmitter.ts |   97.96 |     91.8 |     100 |   97.96 | 226-227,316,324   
  index.ts         |       0 |        0 |       0 |       0 | 1-10              
 ...ession/rewrite |   89.69 |    85.89 |   94.11 |   89.69 |                   
  LlmRewriter.ts   |   80.53 |    79.31 |     100 |   80.53 | ...17-119,170-174 
  ...Middleware.ts |   95.83 |    85.71 |     100 |   95.83 | 119,127-129       
  TurnBuffer.ts    |     100 |      100 |     100 |     100 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 src/commands      |   66.22 |      100 |    12.5 |   66.22 |                   
  auth.ts          |   46.55 |      100 |       0 |   46.55 | ...58,67-72,75-76 
  channel.ts       |   56.66 |      100 |       0 |   56.66 | 15-19,27-34       
  extensions.tsx   |   96.55 |      100 |      50 |   96.55 | 37                
  hooks.tsx        |   66.66 |      100 |       0 |   66.66 | 20-24             
  mcp.ts           |   94.73 |      100 |      50 |   94.73 | 28                
 src/commands/auth |   42.33 |    95.83 |      60 |   42.33 |                   
  handler.ts       |   27.44 |    94.44 |   14.28 |   27.44 | 55-404            
  ...veSelector.ts |     100 |    96.66 |     100 |     100 | 58                
 ...mmands/channel |    26.5 |    93.75 |   26.47 |    26.5 |                   
  ...l-registry.ts |    8.57 |      100 |       0 |    8.57 | 6-21,24-42        
  config-utils.ts  |   91.89 |      100 |   66.66 |   91.89 | 20-25             
  configure.ts     |    14.7 |      100 |       0 |    14.7 | 18-21,23-84       
  pairing.ts       |   26.31 |      100 |       0 |   26.31 | ...30,40-50,52-65 
  pidfile.ts       |   96.34 |    86.95 |     100 |   96.34 | 49,59,91          
  start.ts         |     5.1 |      100 |       0 |     5.1 | ...69-472,474-482 
  status.ts        |   17.54 |      100 |       0 |   17.54 | 15-26,32-77       
  stop.ts          |      20 |      100 |       0 |      20 | 14-48             
 ...nds/extensions |   84.53 |    88.95 |   81.81 |   84.53 |                   
  consent.ts       |   71.65 |    89.28 |   42.85 |   71.65 | ...85-141,156-162 
  disable.ts       |     100 |      100 |     100 |     100 |                   
  enable.ts        |     100 |      100 |     100 |     100 |                   
  install.ts       |    75.6 |    66.66 |   66.66 |    75.6 | ...39-142,145-153 
  link.ts          |     100 |      100 |     100 |     100 |                   
  list.ts          |     100 |      100 |     100 |     100 |                   
  new.ts           |     100 |      100 |     100 |     100 |                   
  settings.ts      |   99.15 |      100 |   83.33 |   99.15 | 151               
  uninstall.ts     |    37.5 |      100 |   33.33 |    37.5 | 23-45,57-64,67-70 
  update.ts        |   96.32 |      100 |     100 |   96.32 | 101-105           
  utils.ts         |   60.24 |    28.57 |     100 |   60.24 | ...81,83-87,89-93 
 ...les/mcp-server |       0 |        0 |       0 |       0 |                   
  example.ts       |       0 |        0 |       0 |       0 | 1-60              
 src/commands/mcp  |   92.29 |    86.08 |   88.88 |   92.29 |                   
  add.ts           |     100 |    98.03 |     100 |     100 | 293               
  list.ts          |   91.22 |    80.76 |      80 |   91.22 | ...19-121,146-147 
  reconnect.ts     |   76.72 |    71.42 |   85.71 |   76.72 | 35-48,153-175     
  remove.ts        |     100 |       80 |     100 |     100 | 21-25             
 src/config        |   92.21 |    82.59 |   84.28 |   92.21 |                   
  auth.ts          |   87.87 |    81.35 |     100 |   87.87 | ...20-221,237-238 
  config.ts        |      87 |    82.63 |      70 |      87 | ...1244,1266-1267 
  keyBindings.ts   |   95.95 |       50 |     100 |   95.95 | 160-163           
  ...idersScope.ts |      92 |       90 |     100 |      92 | 11-12             
  sandboxConfig.ts |    58.9 |    61.53 |   66.66 |    58.9 | ...54-68,73,77-89 
  settings.ts      |   83.13 |    82.55 |   85.71 |   83.13 | ...35-936,941-944 
  ...ingsSchema.ts |     100 |      100 |     100 |     100 |                   
  ...tedFolders.ts |   96.29 |       94 |     100 |   96.29 | ...88-190,205-206 
 ...nfig/migration |   94.56 |    78.94 |   83.33 |   94.56 |                   
  index.ts         |   93.93 |    88.88 |     100 |   93.93 | 85-86             
  scheduler.ts     |   96.55 |    77.77 |     100 |   96.55 | 19-20             
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...ation/versions |   93.63 |     94.5 |     100 |   93.63 |                   
  ...-v2-shared.ts |     100 |      100 |     100 |     100 |                   
  v1-to-v2.ts      |   81.75 |    90.19 |     100 |   81.75 | ...28-229,231-247 
  v2-to-v3.ts      |     100 |      100 |     100 |     100 |                   
 src/constants     |    3.52 |        0 |       0 |    3.52 |                   
  ...dardApiKey.ts |     100 |      100 |     100 |     100 |                   
  codingPlan.ts    |       0 |        0 |       0 |       0 | 1-347             
 src/core          |     100 |      100 |     100 |     100 |                   
  auth.ts          |     100 |      100 |     100 |     100 |                   
  initializer.ts   |     100 |      100 |     100 |     100 |                   
  theme.ts         |     100 |      100 |     100 |     100 |                   
 src/dualOutput    |   63.09 |    64.51 |   55.55 |   63.09 |                   
  ...tputBridge.ts |   62.94 |    65.51 |   56.25 |   62.94 | ...22-323,331-334 
  ...utContext.tsx |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-8               
 src/export        |       0 |        0 |       0 |       0 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-7               
 src/generated     |     100 |      100 |     100 |     100 |                   
  git-commit.ts    |     100 |      100 |     100 |     100 |                   
 src/i18n          |   48.26 |    76.19 |   38.88 |   48.26 |                   
  index.ts         |   26.92 |    76.92 |   26.66 |   26.92 | ...38-239,249-260 
  languages.ts     |    98.7 |       75 |     100 |    98.7 | 110               
 src/i18n/locales  |       0 |        0 |       0 |       0 |                   
  ca.js            |       0 |        0 |       0 |       0 | 1-2143            
  de.js            |       0 |        0 |       0 |       0 | 1-2064            
  en.js            |       0 |        0 |       0 |       0 | 1-2114            
  fr.js            |       0 |        0 |       0 |       0 | 1-2097            
  ja.js            |       0 |        0 |       0 |       0 | 1-1554            
  pt.js            |       0 |        0 |       0 |       0 | 1-2055            
  ru.js            |       0 |        0 |       0 |       0 | 1-2060            
  zh-TW.js         |       0 |        0 |       0 |       0 | 1-1676            
  zh.js            |       0 |        0 |       0 |       0 | 1-1915            
 ...nonInteractive |   68.34 |    71.68 |   68.88 |   68.34 |                   
  session.ts       |    73.1 |    69.52 |   81.81 |    73.1 | ...03-604,612-622 
  types.ts         |    42.5 |      100 |   33.33 |    42.5 | ...80-581,584-585 
 ...active/control |   77.55 |    88.23 |      80 |   77.55 |                   
  ...rolContext.ts |    7.69 |        0 |       0 |    7.69 | 47-79             
  ...Dispatcher.ts |   91.66 |    91.83 |   88.88 |   91.66 | ...54-372,388,391 
  ...rolService.ts |       8 |        0 |       0 |       8 | 46-179            
 ...ol/controllers |    7.04 |       80 |   13.33 |    7.04 |                   
  ...Controller.ts |   19.32 |      100 |      60 |   19.32 | 81-118,127-210    
  ...Controller.ts |       0 |        0 |       0 |       0 | 1-56              
  ...Controller.ts |    3.96 |      100 |   11.11 |    3.96 | ...61-379,389-494 
  ...Controller.ts |   14.06 |      100 |       0 |   14.06 | ...82-117,130-133 
  ...Controller.ts |    5.21 |      100 |       0 |    5.21 | ...21-433,442-471 
 .../control/types |       0 |        0 |       0 |       0 |                   
  serviceAPIs.ts   |       0 |        0 |       0 |       0 | 1                 
 ...Interactive/io |   97.59 |    93.06 |   95.18 |   97.59 |                   
  ...putAdapter.ts |   97.33 |    91.89 |   98.07 |   97.33 | ...1343,1368-1369 
  ...putAdapter.ts |      96 |    91.66 |   85.71 |      96 | 51-52             
  ...nputReader.ts |     100 |    94.73 |     100 |     100 | 67                
  ...putAdapter.ts |   98.28 |      100 |      90 |   98.28 | 81-82,122-123     
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/patches       |       0 |        0 |       0 |       0 |                   
  is-in-ci.ts      |       0 |        0 |       0 |       0 | 1-17              
 src/remoteInput   |   86.98 |       75 |   85.71 |   86.98 |                   
  ...utContext.tsx |     100 |      100 |     100 |     100 |                   
  ...putWatcher.ts |   88.12 |    76.08 |   91.66 |   88.12 | ...21-222,233-236 
  index.ts         |       0 |        0 |       0 |       0 | 1-8               
 src/services      |   89.94 |    89.25 |   94.11 |   89.94 |                   
  ...mandLoader.ts |     100 |     92.3 |     100 |     100 | 87                
  ...killLoader.ts |     100 |    96.29 |     100 |     100 | 44                
  ...andService.ts |    93.5 |      100 |      80 |    93.5 | 107,150-153       
  ...mandLoader.ts |   86.77 |     83.6 |     100 |   86.77 | ...30-335,340-345 
  ...omptLoader.ts |   75.32 |    80.64 |   83.33 |   75.32 | ...05-206,272-273 
  ...mandLoader.ts |     100 |      100 |     100 |     100 |                   
  ...nd-factory.ts |    90.9 |     90.9 |     100 |    90.9 | 122,130-137       
  ...ation-tool.ts |     100 |    95.45 |     100 |     100 | 125               
  commandUtils.ts  |      96 |       90 |     100 |      96 | 48                
  ...and-parser.ts |   90.47 |    85.71 |     100 |   90.47 | 62-65             
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...ght/generators |   85.95 |    86.42 |   90.47 |   85.95 |                   
  DataProcessor.ts |   85.68 |    86.46 |   92.85 |   85.68 | ...1110,1114-1121 
  ...tGenerator.ts |   98.21 |    85.71 |     100 |   98.21 | 46                
  ...teRenderer.ts |   45.45 |      100 |       0 |   45.45 | 13-51             
 .../insight/types |       0 |       50 |      50 |       0 |                   
  ...sightTypes.ts |       0 |        0 |       0 |       0 |                   
  ...sightTypes.ts |       0 |        0 |       0 |       0 | 1                 
 ...mpt-processors |   97.27 |    94.04 |     100 |   97.27 |                   
  ...tProcessor.ts |     100 |      100 |     100 |     100 |                   
  ...eProcessor.ts |   94.52 |    84.21 |     100 |   94.52 | 46-47,93-94       
  ...tionParser.ts |     100 |      100 |     100 |     100 |                   
  ...lProcessor.ts |   97.41 |    95.65 |     100 |   97.41 | 95-98             
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/services/tips |   92.38 |    84.12 |     100 |   92.38 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  tipHistory.ts    |    78.3 |    71.42 |     100 |    78.3 | ...33-148,151,160 
  tipRegistry.ts   |     100 |    95.23 |     100 |     100 | 33                
  tipScheduler.ts  |     100 |    91.66 |     100 |     100 | 55                
 src/test-utils    |   93.75 |    83.33 |      80 |   93.75 |                   
  ...omMatchers.ts |   69.69 |       50 |      50 |   69.69 | 32-35,37-39,45-47 
  ...andContext.ts |     100 |      100 |     100 |     100 |                   
  render.tsx       |     100 |      100 |     100 |     100 |                   
 src/ui            |   62.35 |     65.6 |   51.28 |   62.35 |                   
  App.tsx          |     100 |      100 |     100 |     100 |                   
  AppContainer.tsx |   64.92 |    59.51 |   66.66 |   64.92 | ...2207,2211-2215 
  ...tionNudge.tsx |    9.58 |      100 |       0 |    9.58 | 24-94             
  ...ackDialog.tsx |   29.23 |      100 |       0 |   29.23 | 25-75             
  ...tionNudge.tsx |    7.69 |      100 |       0 |    7.69 | 25-103            
  colors.ts        |   52.72 |      100 |   23.52 |   52.72 | ...52,54-55,60-61 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  keyMatchers.ts   |   91.83 |    88.46 |     100 |   91.83 | 25-26,54-55       
  ...tic-colors.ts |     100 |      100 |     100 |     100 |                   
  textConstants.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/ui/auth       |   29.48 |       50 |   26.08 |   29.48 |                   
  AuthDialog.tsx   |   51.74 |    51.16 |   28.57 |   51.74 | ...73,692,694,696 
  ...nProgress.tsx |       0 |        0 |       0 |       0 | 1-64              
  useAuth.ts       |    2.25 |      100 |       0 |    2.25 | 46-610            
 src/ui/commands   |   58.58 |    78.16 |   59.06 |   58.58 |                   
  aboutCommand.ts  |     100 |    85.71 |     100 |     100 | 36                
  agentsCommand.ts |   72.97 |      100 |      20 |   72.97 | ...32,37-38,42-44 
  ...odeCommand.ts |     100 |      100 |     100 |     100 |                   
  arenaCommand.ts  |   33.13 |    67.64 |    37.5 |   33.13 | ...60-565,644-649 
  authCommand.ts   |     100 |      100 |     100 |     100 |                   
  btwCommand.ts    |   95.59 |    71.42 |     100 |   95.59 | 72,154-159        
  bugCommand.ts    |   76.92 |    66.66 |      50 |   76.92 | 21-22,59-68       
  clearCommand.ts  |   91.04 |    66.66 |      50 |   91.04 | 22-23,49-50,68-69 
  ...essCommand.ts |   63.39 |       48 |      50 |   63.39 | ...48-149,163-166 
  ...extCommand.ts |    6.17 |      100 |      10 |    6.17 | ...21-522,527-528 
  copyCommand.ts   |     100 |      100 |     100 |     100 |                   
  deleteCommand.ts |     100 |      100 |     100 |     100 |                   
  ...ryCommand.tsx |   59.56 |    74.07 |      50 |   59.56 | ...24-225,234-242 
  docsCommand.ts   |   96.07 |     87.5 |      50 |   96.07 | 20-21             
  doctorCommand.ts |     100 |    93.33 |     100 |     100 | 21                
  dreamCommand.ts  |   32.43 |      100 |      50 |   32.43 | 23-50             
  editorCommand.ts |     100 |      100 |     100 |     100 |                   
  exportCommand.ts |   56.93 |    91.66 |   33.33 |   56.93 | ...52-353,361-362 
  ...onsCommand.ts |   45.08 |    85.71 |   27.27 |   45.08 | ...37-238,247-248 
  forgetCommand.ts |   26.82 |      100 |      50 |   26.82 | 18-51             
  helpCommand.ts   |     100 |      100 |     100 |     100 |                   
  hooksCommand.ts  |   19.04 |       25 |      20 |   19.04 | ...86-187,204-205 
  ideCommand.ts    |   57.33 |    57.69 |   35.29 |   57.33 | ...05-306,310-324 
  initCommand.ts   |   84.33 |    72.72 |     100 |   84.33 | 68,82-87,89-94    
  ...ghtCommand.ts |    72.8 |    66.66 |   83.33 |    72.8 | ...31-245,250-273 
  ...ageCommand.ts |   89.39 |    82.35 |   76.92 |   89.39 | ...22-325,348-349 
  mcpCommand.ts    |   86.66 |      100 |      50 |   86.66 | 14-15             
  memoryCommand.ts |   86.66 |      100 |      50 |   86.66 | 14-15             
  modelCommand.ts  |      56 |    70.58 |   66.66 |      56 | ...,67-93,118-136 
  ...onsCommand.ts |     100 |      100 |     100 |     100 |                   
  planCommand.ts   |   78.82 |    76.92 |     100 |   78.82 | 30-35,51-56,68-73 
  quitCommand.ts   |   93.93 |      100 |      50 |   93.93 | 15-16             
  recapCommand.ts  |   21.81 |      100 |      50 |   21.81 | 24-73             
  ...berCommand.ts |   32.43 |      100 |      50 |   32.43 | 23-57             
  renameCommand.ts |   85.61 |    78.18 |     100 |   85.61 | ...15-322,329-334 
  ...oreCommand.ts |    92.3 |     87.5 |     100 |    92.3 | ...,83-88,129-130 
  resumeCommand.ts |     100 |      100 |     100 |     100 |                   
  rewindCommand.ts |      80 |      100 |      50 |      80 | 19-21             
  ...ngsCommand.ts |     100 |      100 |     100 |     100 |                   
  ...hubCommand.ts |   81.43 |    65.21 |      80 |   81.43 | ...70-173,176-179 
  skillsCommand.ts |   15.04 |      100 |      25 |   15.04 | ...90-106,109-136 
  statsCommand.ts  |   79.54 |    66.66 |      50 |   79.54 | ...20-121,131-134 
  ...ineCommand.ts |     100 |      100 |     100 |     100 |                   
  ...aryCommand.ts |    6.51 |      100 |      50 |    6.51 | 28-323            
  ...tupCommand.ts |     100 |      100 |     100 |     100 |                   
  themeCommand.ts  |     100 |      100 |     100 |     100 |                   
  toolsCommand.ts  |   95.23 |      100 |      50 |   95.23 | 18-19             
  trustCommand.ts  |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
  vimCommand.ts    |   54.54 |      100 |      50 |   54.54 | 19-29             
 src/ui/components |   61.24 |    72.97 |   60.41 |   61.24 |                   
  AboutBox.tsx     |     100 |      100 |     100 |     100 |                   
  AnsiOutput.tsx   |   65.57 |      100 |      50 |   65.57 | 69-90             
  ApiKeyInput.tsx  |   18.91 |      100 |       0 |   18.91 | 30-95             
  AppHeader.tsx    |   86.79 |    42.85 |     100 |   86.79 | 32-38,40          
  ...odeDialog.tsx |     9.7 |      100 |       0 |     9.7 | 35-47,50-182      
  AsciiArt.ts      |     100 |      100 |     100 |     100 |                   
  ...Indicator.tsx |   14.63 |      100 |       0 |   14.63 | 18-56             
  ...TextInput.tsx |   66.08 |    69.76 |      50 |   66.08 | ...30-232,250,259 
  Composer.tsx     |   79.31 |    57.14 |     100 |   79.31 | ...-77,95,133,146 
  ...entPrompt.tsx |     100 |      100 |     100 |     100 |                   
  ...ryDisplay.tsx |   75.89 |    62.06 |     100 |   75.89 | ...,88,93-108,113 
  ...geDisplay.tsx |   68.42 |    57.14 |     100 |   68.42 | 16-17,31-32,42-50 
  ...ification.tsx |   28.57 |      100 |       0 |   28.57 | 16-36             
  ...gProfiler.tsx |       0 |        0 |       0 |       0 | 1-36              
  ...ogManager.tsx |   12.46 |      100 |       0 |   12.46 | 57-413            
  ...ngsDialog.tsx |    8.44 |      100 |       0 |    8.44 | 37-195            
  ExitWarning.tsx  |     100 |      100 |     100 |     100 |                   
  ...ustDialog.tsx |     100 |      100 |     100 |     100 |                   
  Footer.tsx       |   78.87 |       60 |     100 |   78.87 | ...30-134,136-140 
  ...ngSpinner.tsx |   54.28 |       50 |      50 |   54.28 | 31-48,61          
  Header.tsx       |   98.14 |    85.71 |     100 |   98.14 | 97,99             
  Help.tsx         |   98.74 |    68.75 |     100 |   98.74 | 74,129            
  ...emDisplay.tsx |   61.13 |    35.55 |     100 |   61.13 | ...75-284,287,290 
  ...ngeDialog.tsx |     100 |      100 |     100 |     100 |                   
  InputPrompt.tsx  |   82.59 |    76.66 |      80 |   82.59 | ...1214,1279,1329 
  ...Shortcuts.tsx |   20.87 |      100 |       0 |   20.87 | ...6,49-51,67-125 
  ...Indicator.tsx |     100 |    91.42 |     100 |     100 | 65,74             
  ...firmation.tsx |   91.42 |      100 |      50 |   91.42 | 26-31             
  MainContent.tsx  |   15.23 |      100 |       0 |   15.23 | 28-133            
  MemoryDialog.tsx |   53.35 |    51.21 |   57.14 |   53.35 | ...55,367,380-382 
  ...geDisplay.tsx |       0 |        0 |       0 |       0 | 1-41              
  ModelDialog.tsx  |   76.59 |    54.54 |     100 |   76.59 | ...60-476,533-537 
  ...tsDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...fications.tsx |   18.18 |      100 |       0 |   18.18 | 15-58             
  ...onsDialog.tsx |    2.13 |      100 |       0 |    2.13 | 62-133,148-1004   
  ...ryDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...icePrompt.tsx |   88.14 |    83.87 |     100 |   88.14 | ...01-105,133-138 
  PrepareLabel.tsx |   91.66 |    76.19 |     100 |   91.66 | 73-75,77-79,110   
  ...geDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...ngDisplay.tsx |   21.42 |      100 |       0 |   21.42 | 13-39             
  ...hProgress.tsx |   85.25 |    88.46 |     100 |   85.25 | 121-147           
  ...dSelector.tsx |    4.45 |      100 |       0 |    4.45 | 28-92,100-328     
  ...ionPicker.tsx |   94.76 |    87.17 |     100 |   94.76 | 99,132,253-261    
  ...onPreview.tsx |   93.38 |    78.26 |     100 |   93.38 | ...,66-67,126-128 
  ...ryDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...putPrompt.tsx |   72.56 |       80 |      40 |   72.56 | ...06-109,114-117 
  ...ngsDialog.tsx |   66.88 |    73.52 |     100 |   66.88 | ...11-819,825-826 
  ...ionDialog.tsx |    87.8 |      100 |   33.33 |    87.8 | 36-39,44-51       
  ...putPrompt.tsx |    15.9 |      100 |       0 |    15.9 | 20-63             
  ...Indicator.tsx |   57.14 |      100 |       0 |   57.14 | 12-15             
  ...MoreLines.tsx |      28 |      100 |       0 |      28 | 18-40             
  ...ionPicker.tsx |   17.59 |      100 |       0 |   17.59 | 55-172            
  StatsDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...yTodoList.tsx |   96.82 |    77.77 |     100 |   96.82 | 39-40             
  ...nsDisplay.tsx |   84.09 |    57.14 |     100 |   84.09 | ...16-118,125-127 
  ThemeDialog.tsx  |   89.95 |    46.15 |      75 |   89.95 | ...71-173,243-245 
  Tips.tsx         |   21.87 |      100 |       0 |   21.87 | 22-40,43-53       
  TodoDisplay.tsx  |     100 |      100 |     100 |     100 |                   
  ...tsDisplay.tsx |     100 |     87.5 |     100 |     100 | 31-32             
  TrustDialog.tsx  |     100 |    81.81 |     100 |     100 | 71-86             
  ...ification.tsx |   36.36 |      100 |       0 |   36.36 | 15-22             
  ...ackDialog.tsx |    7.84 |      100 |       0 |    7.84 | 24-134            
 ...nts/agent-view |   25.56 |       90 |    12.5 |   25.56 |                   
  ...tChatView.tsx |    8.28 |      100 |       0 |    8.28 | 53-275            
  ...tComposer.tsx |    9.95 |      100 |       0 |    9.95 | 57-308            
  AgentFooter.tsx  |   17.07 |      100 |       0 |   17.07 | 28-66             
  AgentHeader.tsx  |   15.38 |      100 |       0 |   15.38 | 27-64             
  AgentTabBar.tsx  |    8.25 |      100 |       0 |    8.25 | 35-55,60-167      
  ...oryAdapter.ts |     100 |    91.83 |     100 |     100 | 103,109-110,138   
  index.ts         |       0 |        0 |       0 |       0 | 1-12              
 ...mponents/arena |   45.72 |    70.53 |   60.86 |   45.72 |                   
  ArenaCards.tsx   |   73.06 |    71.79 |   85.71 |   73.06 | ...83-185,321-326 
  ...ectDialog.tsx |   83.48 |    69.86 |   88.88 |   83.48 | ...88-392,409-410 
  ...artDialog.tsx |   10.15 |      100 |       0 |   10.15 | 27-161            
  ...tusDialog.tsx |    5.63 |      100 |       0 |    5.63 | 33-75,80-288      
  ...topDialog.tsx |    6.17 |      100 |       0 |    6.17 | 33-213            
 ...nts/extensions |   45.28 |    33.33 |      60 |   45.28 |                   
  ...gerDialog.tsx |   44.31 |    34.14 |      75 |   44.31 | ...71-480,483-488 
  index.ts         |       0 |        0 |       0 |       0 | 1-9               
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...tensions/steps |   54.77 |    94.23 |   66.66 |   54.77 |                   
  ...ctionStep.tsx |   95.12 |    92.85 |   85.71 |   95.12 | 84-86,89          
  ...etailStep.tsx |    6.18 |      100 |       0 |    6.18 | 17-128            
  ...nListStep.tsx |   88.35 |    94.73 |      80 |   88.35 | 51-52,58-71,105   
  ...electStep.tsx |   13.46 |      100 |       0 |   13.46 | 20-70             
  ...nfirmStep.tsx |   19.56 |      100 |       0 |   19.56 | 23-65             
  index.ts         |     100 |      100 |     100 |     100 |                   
 ...mponents/hooks |   72.24 |    70.52 |      80 |   72.24 |                   
  ...etailStep.tsx |   96.52 |       75 |     100 |   96.52 | 33,37,50,59       
  ...etailStep.tsx |   93.27 |    73.68 |     100 |   93.27 | 41-42,99-104,110  
  ...abledStep.tsx |     100 |      100 |     100 |     100 |                   
  ...sListStep.tsx |     100 |      100 |     100 |     100 |                   
  ...entDialog.tsx |   36.09 |    47.05 |      50 |   36.09 | ...49,453-466,470 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-13              
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...components/mcp |   18.82 |    84.37 |   77.77 |   18.82 |                   
  ...entDialog.tsx |    3.64 |      100 |       0 |    3.64 | 41-717            
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-30              
  types.ts         |     100 |      100 |     100 |     100 |                   
  utils.ts         |   96.42 |    87.09 |     100 |   96.42 | 21,96-97          
 ...ents/mcp/steps |    6.65 |      100 |       0 |    6.65 |                   
  ...icateStep.tsx |     5.1 |      100 |       0 |     5.1 | 34-95,98-334      
  ...electStep.tsx |   10.95 |      100 |       0 |   10.95 | 16-88             
  ...etailStep.tsx |    5.26 |      100 |       0 |    5.26 | 31-247            
  ...rListStep.tsx |    5.88 |      100 |       0 |    5.88 | 20-176            
  ...etailStep.tsx |   10.41 |      100 |       0 |   10.41 | ...1,67-79,82-139 
  ToolListStep.tsx |    7.14 |      100 |       0 |    7.14 | 16-146            
 ...nents/messages |   78.25 |    76.19 |   68.33 |   78.25 |                   
  ...ionDialog.tsx |   77.35 |    74.54 |    62.5 |   77.35 | ...90,508,526-528 
  BtwMessage.tsx   |     100 |      100 |     100 |     100 |                   
  ...upDisplay.tsx |   82.17 |     61.9 |     100 |   82.17 | ...95,113-116,123 
  ...onMessage.tsx |   91.93 |    82.35 |     100 |   91.93 | 57-59,61,63       
  ...nMessages.tsx |   77.35 |      100 |      70 |   77.35 | ...31-244,248-260 
  DiffRenderer.tsx |   93.19 |    86.17 |     100 |   93.19 | ...09,237-238,304 
  ...ssMessage.tsx |    12.5 |      100 |       0 |    12.5 | 18-59             
  ...edMessage.tsx |   16.66 |      100 |       0 |   16.66 | 22-38             
  ...sMessages.tsx |   55.67 |       40 |   28.57 |   55.67 | ...20-125,133-145 
  ...ryMessage.tsx |   12.82 |      100 |       0 |   12.82 | 22-59             
  ...onMessage.tsx |   73.55 |    55.81 |   33.33 |   73.55 | ...41-443,450-452 
  ...upMessage.tsx |   72.52 |    65.45 |     100 |   72.52 | ...32-247,261-262 
  ToolMessage.tsx  |   90.16 |     83.8 |   91.66 |   90.16 | ...59-564,591-593 
 ...ponents/shared |   81.46 |    77.23 |   92.64 |   81.46 |                   
  ...ctionList.tsx |   99.03 |    95.65 |     100 |   99.03 | 85                
  ...tonSelect.tsx |     100 |      100 |     100 |     100 |                   
  EnumSelector.tsx |     100 |    96.42 |     100 |     100 | 58                
  MaxSizedBox.tsx  |   82.54 |    85.15 |   88.88 |   82.54 | ...12-513,618-619 
  MultiSelect.tsx  |    5.59 |      100 |       0 |    5.59 | 34-41,44-193      
  ...tonSelect.tsx |     100 |      100 |     100 |     100 |                   
  ...eSelector.tsx |     100 |       60 |     100 |     100 | 40-45             
  TextInput.tsx    |   70.44 |    53.57 |      75 |   70.44 | ...90-194,206-212 
  ...apsedTime.tsx |     100 |      100 |     100 |     100 |                   
  ...Indicator.tsx |     100 |      100 |     100 |     100 |                   
  text-buffer.ts   |   82.82 |    75.48 |   97.61 |   82.82 | ...2272,2300,2368 
  ...er-actions.ts |   86.71 |    67.79 |     100 |   86.71 | ...07-608,809-811 
 ...ents/subagents |    32.1 |      100 |       0 |    32.1 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  reducers.tsx     |    12.1 |      100 |       0 |    12.1 | 33-190            
  types.ts         |     100 |      100 |     100 |     100 |                   
  utils.ts         |   10.95 |      100 |       0 |   10.95 | ...1,56-57,60-102 
 ...bagents/create |    9.13 |      100 |       0 |    9.13 |                   
  ...ionWizard.tsx |    7.28 |      100 |       0 |    7.28 | 34-299            
  ...rSelector.tsx |   14.75 |      100 |       0 |   14.75 | 26-85             
  ...onSummary.tsx |    4.26 |      100 |       0 |    4.26 | 27-331            
  ...tionInput.tsx |    8.63 |      100 |       0 |    8.63 | 23-177            
  ...dSelector.tsx |   33.33 |      100 |       0 |   33.33 | 20-21,26-27,36-63 
  ...nSelector.tsx |    37.5 |      100 |       0 |    37.5 | 20-21,26-27,36-58 
  ...EntryStep.tsx |   12.76 |      100 |       0 |   12.76 | 34-78             
  ToolSelector.tsx |    4.16 |      100 |       0 |    4.16 | 31-253            
 ...bagents/manage |    8.39 |      100 |       0 |    8.39 |                   
  ...ctionStep.tsx |   10.25 |      100 |       0 |   10.25 | 21-103            
  ...eleteStep.tsx |   20.93 |      100 |       0 |   20.93 | 23-62             
  ...tEditStep.tsx |   25.53 |      100 |       0 |   25.53 | ...2,37-38,51-124 
  ...ctionStep.tsx |    2.29 |      100 |       0 |    2.29 | 28-449            
  ...iewerStep.tsx |   13.72 |      100 |       0 |   13.72 | 18-73             
  ...gerDialog.tsx |    6.74 |      100 |       0 |    6.74 | 35-341            
 ...agents/runtime |    7.51 |      100 |       0 |    7.51 |                   
  ...onDisplay.tsx |    7.51 |      100 |       0 |    7.51 | ...96-526,535-573 
 ...mponents/views |   42.16 |    69.23 |   21.42 |   42.16 |                   
  ContextUsage.tsx |     4.7 |      100 |       0 |     4.7 | ...52-167,170-456 
  DoctorReport.tsx |     9.8 |      100 |       0 |     9.8 | 25-54,57-131      
  ...sionsList.tsx |   87.69 |    73.68 |     100 |   87.69 | 65-72             
  McpStatus.tsx    |   89.53 |    60.52 |     100 |   89.53 | ...72,175-177,262 
  SkillsList.tsx   |   27.27 |      100 |       0 |   27.27 | 18-35             
  ToolsList.tsx    |     100 |      100 |     100 |     100 |                   
 src/ui/contexts   |   74.71 |    79.47 |   86.66 |   74.71 |                   
  ...ewContext.tsx |   65.77 |      100 |      75 |   65.77 | ...22-225,231-241 
  AppContext.tsx   |      40 |      100 |       0 |      40 | 17-22             
  ...deContext.tsx |     100 |      100 |     100 |     100 |                   
  ...igContext.tsx |   81.81 |       50 |     100 |   81.81 | 15-16             
  ...ssContext.tsx |   81.88 |    82.26 |     100 |   81.88 | ...1153,1159-1161 
  ...owContext.tsx |   89.28 |       80 |   66.66 |   89.28 | 34,47-48,60-62    
  ...onContext.tsx |   43.06 |     62.5 |    62.5 |   43.06 | ...57-260,264-267 
  ...gsContext.tsx |   83.33 |       50 |     100 |   83.33 | 17-18             
  ...usContext.tsx |     100 |      100 |     100 |     100 |                   
  ...ngContext.tsx |   71.42 |       50 |     100 |   71.42 | 17-20             
  ...nsContext.tsx |   88.88 |       50 |     100 |   88.88 | 124-125           
  ...teContext.tsx |   85.71 |       50 |     100 |   85.71 | 173-174           
  ...deContext.tsx |   76.08 |    72.72 |     100 |   76.08 | 47-48,52-59,77-78 
 src/ui/editors    |   93.33 |    85.71 |   66.66 |   93.33 |                   
  ...ngsManager.ts |   93.33 |    85.71 |   66.66 |   93.33 | 49,63-64          
 src/ui/hooks      |   80.04 |    80.55 |   84.57 |   80.04 |                   
  ...dProcessor.ts |   83.12 |    82.56 |     100 |   83.12 | ...88-389,408-435 
  keyToAnsi.ts     |    3.92 |      100 |       0 |    3.92 | 19-77             
  ...dProcessor.ts |    94.8 |    70.58 |     100 |    94.8 | ...76-277,282-283 
  ...dProcessor.ts |   72.94 |    57.26 |   61.53 |   72.94 | ...75,799,818-822 
  ...amingState.ts |   12.22 |      100 |       0 |   12.22 | 54-158            
  ...agerDialog.ts |   88.23 |      100 |     100 |   88.23 | 20,24             
  ...ationFrame.ts |      32 |       60 |     100 |      32 | 42-44,51-90       
  ...odeCommand.ts |   58.82 |      100 |     100 |   58.82 | 28,33-48          
  ...enaCommand.ts |      85 |      100 |     100 |      85 | 23-24,29          
  ...aInProcess.ts |   19.81 |    66.66 |      25 |   19.81 | 57-175            
  ...Completion.ts |   92.77 |    89.09 |     100 |   92.77 | ...86-187,220-223 
  ...ifications.ts |   88.05 |    94.73 |     100 |   88.05 | 84-93             
  ...tIndicator.ts |     100 |    93.75 |     100 |     100 | 63                
  ...waySummary.ts |   96.22 |    69.69 |     100 |   96.22 | 125-127,169       
  ...ketedPaste.ts |    23.8 |      100 |       0 |    23.8 | 19-37             
  ...lanUpdates.ts |     100 |       92 |     100 |     100 | 59,158            
  ...ompletion.tsx |   92.62 |    76.92 |     100 |   92.62 | ...14-215,251-256 
  ...dMigration.ts |   90.62 |       75 |     100 |   90.62 | 38-40             
  useCompletion.ts |    92.4 |     87.5 |     100 |    92.4 | 68-69,93-94,98-99 
  ...nitMessage.ts |     100 |      100 |     100 |     100 |                   
  ...extualTips.ts |   76.92 |       50 |     100 |   76.92 | 55,68,71-75,88-96 
  ...eteCommand.ts |   33.33 |       50 |     100 |   33.33 | 30,34,41-90       
  ...ialogClose.ts |      20 |      100 |     100 |      20 | 71-118            
  ...oublePress.ts |   53.12 |       75 |     100 |   53.12 | 33-35,41-54       
  ...orSettings.ts |     100 |      100 |     100 |     100 |                   
  ...ionUpdates.ts |   93.45 |     92.3 |     100 |   93.45 | ...83-287,300-306 
  ...agerDialog.ts |   88.88 |      100 |     100 |   88.88 | 21,25             
  ...backDialog.ts |   50.37 |    77.77 |   33.33 |   50.37 | ...58-174,195-196 
  useFocus.ts      |     100 |      100 |     100 |     100 |                   
  ...olderTrust.ts |     100 |      100 |     100 |     100 |                   
  ...ggestions.tsx |   89.15 |     62.5 |      50 |   89.15 | ...22-124,149-150 
  ...miniStream.ts |   74.76 |    72.01 |   88.88 |   74.76 | ...2086,2099-2107 
  ...BranchName.ts |    90.9 |     92.3 |     100 |    90.9 | 19-20,55-58       
  ...oryManager.ts |   93.15 |    93.75 |     100 |   93.15 | 44,107-110        
  ...ooksDialog.ts |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...stListener.ts |     100 |      100 |     100 |     100 |                   
  ...nAuthError.ts |   76.19 |       50 |     100 |   76.19 | 39-40,43-45       
  ...putHistory.ts |   92.59 |    85.71 |     100 |   92.59 | 63-64,72,94-96    
  ...storyStore.ts |     100 |    94.11 |     100 |     100 | 69                
  useKeypress.ts   |     100 |      100 |     100 |     100 |                   
  ...rdProtocol.ts |   36.36 |      100 |       0 |   36.36 | 24-31             
  ...unchEditor.ts |    9.67 |      100 |       0 |    9.67 | 11-32,39-90       
  ...gIndicator.ts |     100 |      100 |     100 |     100 |                   
  useLogger.ts     |   21.05 |      100 |       0 |   21.05 | 15-37             
  useMcpDialog.ts  |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...moryDialog.ts |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...oryMonitor.ts |     100 |      100 |     100 |     100 |                   
  ...ssageQueue.ts |     100 |      100 |     100 |     100 |                   
  ...delCommand.ts |     100 |       75 |     100 |     100 | 22                
  ...raseCycler.ts |   84.74 |    76.47 |     100 |   84.74 | ...49,52-53,69-71 
  useQwenAuth.ts   |     100 |      100 |     100 |     100 |                   
  ...lScheduler.ts |   84.52 |    93.33 |     100 |   84.52 | ...27-232,328-338 
  ...oryCommand.ts |       0 |        0 |       0 |       0 | 1-7               
  ...umeCommand.ts |   96.25 |    72.72 |     100 |   96.25 | 81-82,108         
  ...ompletion.tsx |   90.59 |    83.33 |     100 |   90.59 | ...01,104,137-140 
  ...ectionList.ts |   96.96 |    95.69 |     100 |   96.96 | ...82-183,237-240 
  ...sionPicker.ts |   91.16 |    71.69 |     100 |   91.16 | ...78-279,283-284 
  ...ngsCommand.ts |   18.75 |      100 |       0 |   18.75 | 10-25             
  ...ellHistory.ts |   91.74 |    79.41 |     100 |   91.74 | ...74,122-123,133 
  ...oryCommand.ts |       0 |        0 |       0 |       0 | 1-73              
  ...Completion.ts |   78.99 |    81.48 |   94.11 |   78.99 | ...77-579,587-624 
  ...tateAndRef.ts |     100 |      100 |     100 |     100 |                   
  useStatusLine.ts |     100 |    98.79 |     100 |     100 | 257               
  ...eateDialog.ts |   88.23 |      100 |     100 |   88.23 | 14,18             
  ...alProgress.ts |   54.23 |    45.45 |      75 |   54.23 | ...65,73-80,91-97 
  ...rminalSize.ts |   76.19 |      100 |      50 |   76.19 | 21-25             
  ...emeCommand.ts |   67.01 |    29.41 |     100 |   67.01 | ...10-111,115-116 
  useTimer.ts      |   88.09 |    85.71 |     100 |   88.09 | 44-45,51-53       
  ...lMigration.ts |       0 |        0 |       0 |       0 |                   
  ...rustModify.ts |     100 |      100 |     100 |     100 |                   
  ...elcomeBack.ts |   87.36 |     90.9 |     100 |   87.36 | ...,94-96,114-115 
  vim.ts           |   83.77 |    80.31 |     100 |   83.77 | ...55,759-767,776 
 src/ui/layouts    |   88.72 |    86.95 |     100 |   88.72 |                   
  ...AppLayout.tsx |   88.88 |    86.66 |     100 |   88.88 | 46-48,87-92       
  ...AppLayout.tsx |   88.46 |     87.5 |     100 |   88.46 | 53-58             
 src/ui/models     |   80.24 |    79.16 |   71.42 |   80.24 |                   
  ...ableModels.ts |   80.24 |    79.16 |   71.42 |   80.24 | ...,61-71,123-125 
 ...noninteractive |     100 |      100 |    7.14 |     100 |                   
  ...eractiveUi.ts |     100 |      100 |    7.14 |     100 |                   
 src/ui/state      |   94.91 |    81.81 |     100 |   94.91 |                   
  extensions.ts    |   94.91 |    81.81 |     100 |   94.91 | 68-69,88          
 src/ui/themes     |   98.53 |    70.58 |     100 |   98.53 |                   
  ansi-light.ts    |     100 |      100 |     100 |     100 |                   
  ansi.ts          |     100 |      100 |     100 |     100 |                   
  atom-one-dark.ts |     100 |      100 |     100 |     100 |                   
  ayu-light.ts     |     100 |      100 |     100 |     100 |                   
  ayu.ts           |     100 |      100 |     100 |     100 |                   
  color-utils.ts   |     100 |      100 |     100 |     100 |                   
  default-light.ts |     100 |      100 |     100 |     100 |                   
  default.ts       |     100 |      100 |     100 |     100 |                   
  ...inal-theme.ts |   88.59 |    85.96 |     100 |   88.59 | ...57-261,266-270 
  dracula.ts       |     100 |      100 |     100 |     100 |                   
  github-dark.ts   |     100 |      100 |     100 |     100 |                   
  github-light.ts  |     100 |      100 |     100 |     100 |                   
  googlecode.ts    |     100 |      100 |     100 |     100 |                   
  no-color.ts      |     100 |      100 |     100 |     100 |                   
  qwen-dark.ts     |     100 |      100 |     100 |     100 |                   
  qwen-light.ts    |     100 |      100 |     100 |     100 |                   
  ...tic-tokens.ts |     100 |      100 |     100 |     100 |                   
  ...-of-purple.ts |     100 |      100 |     100 |     100 |                   
  theme-manager.ts |   87.98 |    82.89 |     100 |   87.98 | ...48-357,362-363 
  theme.ts         |     100 |    38.02 |     100 |     100 | ...34-449,457-461 
  xcode.ts         |     100 |      100 |     100 |     100 |                   
 src/ui/utils      |   75.98 |    85.73 |   84.44 |   75.98 |                   
  ...Colorizer.tsx |   82.78 |    88.23 |     100 |   82.78 | ...10-111,197-223 
  ...nRenderer.tsx |   52.41 |    38.23 |      50 |   52.41 | ...49-151,171-180 
  ...wnDisplay.tsx |   86.79 |    88.88 |     100 |   86.79 | ...06-315,348-373 
  ...eRenderer.tsx |   94.45 |    81.25 |   94.11 |   94.45 | ...65,477,480-483 
  ...boardUtils.ts |   59.61 |    58.82 |     100 |   59.61 | ...,86-88,107-149 
  commandUtils.ts  |   83.95 |    89.09 |    87.5 |   83.95 | ...50-151,247-266 
  computeStats.ts  |     100 |      100 |     100 |     100 |                   
  displayUtils.ts  |   88.37 |    72.22 |     100 |   88.37 | 23,25,29,31,33    
  formatters.ts    |   95.23 |    98.24 |     100 |   95.23 | 117-120           
  gradientUtils.ts |     100 |      100 |     100 |     100 |                   
  highlight.ts     |   98.63 |       95 |     100 |   98.63 | 93                
  ...oryMapping.ts |     100 |    94.28 |     100 |     100 | 33,55             
  isNarrowWidth.ts |     100 |      100 |     100 |     100 |                   
  ...olDetector.ts |    8.23 |      100 |       0 |    8.23 | ...31-132,135-136 
  layoutUtils.ts   |     100 |      100 |     100 |     100 |                   
  ...nUtilities.ts |   69.84 |    85.71 |     100 |   69.84 | 75-91,100-101     
  ...ToolGroups.ts |   98.05 |    95.12 |     100 |   98.05 | 44-45             
  ...lsBySource.ts |     100 |    95.23 |     100 |     100 | 84                
  ...mConstants.ts |     100 |      100 |     100 |     100 |                   
  ...storyUtils.ts |   57.81 |    67.14 |      90 |   57.81 | ...64,412,417-439 
  ...ickerUtils.ts |     100 |      100 |     100 |     100 |                   
  ...izedOutput.ts |   94.94 |      100 |   88.88 |   94.94 | 112-117           
  ...wOptimizer.ts |     100 |    96.77 |     100 |     100 | 69                
  terminalSetup.ts |    4.37 |      100 |       0 |    4.37 | 44-393            
  textUtils.ts     |   96.36 |    93.93 |   88.88 |   96.36 | ...49-150,285-286 
  todoSnapshot.ts  |   81.92 |    88.46 |     100 |   81.92 | 39-51,59-60,106   
  updateCheck.ts   |     100 |    80.95 |     100 |     100 | 30-42             
 ...i/utils/export |    2.36 |        0 |       0 |    2.36 |                   
  collect.ts       |    0.87 |        0 |       0 |    0.87 | 40-394,401-697    
  index.ts         |     100 |      100 |     100 |     100 |                   
  normalize.ts     |     1.2 |      100 |       0 |     1.2 | 17-346            
  types.ts         |       0 |        0 |       0 |       0 | 1                 
  utils.ts         |      40 |      100 |       0 |      40 | 11-13             
 ...ort/formatters |    3.38 |      100 |       0 |    3.38 |                   
  html.ts          |    9.61 |      100 |       0 |    9.61 | ...28,34-76,82-84 
  json.ts          |      50 |      100 |       0 |      50 | 14-15             
  jsonl.ts         |     3.5 |      100 |       0 |     3.5 | 14-76             
  markdown.ts      |    0.94 |      100 |       0 |    0.94 | 13-295            
 src/utils         |   72.88 |    88.69 |   94.73 |   72.88 |                   
  acpModelUtils.ts |     100 |      100 |     100 |     100 |                   
  apiPreconnect.ts |   96.52 |    96.87 |     100 |   96.52 | 166-169           
  ...tification.ts |   92.59 |    71.42 |     100 |   92.59 | 36-37             
  checks.ts        |   33.33 |      100 |       0 |   33.33 | 23-28             
  cleanup.ts       |   84.12 |    93.33 |      80 |   84.12 | 75,106-115        
  commands.ts      |     100 |      100 |     100 |     100 |                   
  commentJson.ts   |   85.29 |    89.47 |     100 |   85.29 | 48-57             
  deepMerge.ts     |     100 |       90 |     100 |     100 | 41-43,49          
  ...ScopeUtils.ts |   97.56 |    88.88 |     100 |   97.56 | 67                
  doctorChecks.ts  |   68.59 |    64.28 |     100 |   68.59 | ...63-269,293-309 
  ...putCapture.ts |   90.65 |    86.02 |     100 |   90.65 | ...72,370,372-373 
  ...arResolver.ts |   94.28 |    88.46 |     100 |   94.28 | 28-29,125-126     
  errors.ts        |   98.43 |    95.55 |     100 |   98.43 | 45-46             
  events.ts        |     100 |      100 |     100 |     100 |                   
  gitUtils.ts      |   91.91 |    84.61 |     100 |   91.91 | 78-81,124-127     
  ...AutoUpdate.ts |   90.76 |    93.33 |   88.88 |   90.76 | 103-114           
  ...lationInfo.ts |     100 |      100 |     100 |     100 |                   
  languageUtils.ts |   97.89 |    96.42 |     100 |   97.89 | 132-133           
  math.ts          |       0 |        0 |       0 |       0 | 1-15              
  ...onfigUtils.ts |     100 |      100 |     100 |     100 |                   
  ...iveHelpers.ts |   96.79 |    93.28 |     100 |   96.79 | ...76-477,575,588 
  package.ts       |   88.88 |       80 |     100 |   88.88 | 33-34             
  processUtils.ts  |     100 |      100 |     100 |     100 |                   
  readStdin.ts     |   79.62 |       90 |      80 |   79.62 | 33-40,52-54       
  relaunch.ts      |   98.07 |    76.92 |     100 |   98.07 | 70                
  resolvePath.ts   |   66.66 |       25 |     100 |   66.66 | 12-13,16,18-19    
  sandbox.ts       |       0 |        0 |       0 |       0 | 1-980             
  settingsUtils.ts |   86.32 |    90.59 |   94.44 |   86.32 | ...38,569,632-644 
  spawnWrapper.ts  |     100 |      100 |     100 |     100 |                   
  ...upProfiler.ts |     100 |       96 |     100 |     100 | 110               
  ...upWarnings.ts |     100 |      100 |     100 |     100 |                   
  stdioHelpers.ts  |     100 |       60 |     100 |     100 | 23,32             
  systemInfo.ts    |   92.52 |     90.9 |   83.33 |   92.52 | 63-69,184         
  ...InfoFields.ts |   86.91 |    65.78 |     100 |   86.91 | ...16-117,138-139 
  ...entEmitter.ts |     100 |      100 |     100 |     100 |                   
  ...upWarnings.ts |   91.17 |    82.35 |     100 |   91.17 | 67-68,73-74,77-78 
  version.ts       |     100 |       50 |     100 |     100 | 11                
  windowTitle.ts   |     100 |      100 |     100 |     100 |                   
  ...WithBackup.ts |    62.1 |    77.77 |     100 |    62.1 | 93,107,118-157    
-------------------|---------|----------|---------|---------|-------------------
Core Package - Full Text Report
-------------------|---------|----------|---------|---------|-------------------
File               | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
-------------------|---------|----------|---------|---------|-------------------
All files          |   75.13 |    81.47 |   77.66 |   75.13 |                   
 src               |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/__mocks__/fs  |       0 |        0 |       0 |       0 |                   
  promises.ts      |       0 |        0 |       0 |       0 | 1-48              
 src/agents        |    85.8 |    83.72 |    92.3 |    85.8 |                   
  ...ound-tasks.ts |   85.33 |    83.72 |    92.3 |   85.33 | ...25-232,245-246 
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/agents/arena  |    76.9 |    66.66 |   78.94 |    76.9 |                   
  ...gentClient.ts |   79.47 |    88.88 |   81.81 |   79.47 | ...68-183,189-204 
  ArenaManager.ts  |   75.84 |     62.9 |   78.57 |   75.84 | ...1889,1895-1896 
  arena-events.ts  |   64.44 |      100 |      50 |   64.44 | ...71-175,178-183 
  diff-summary.ts  |    87.5 |    73.46 |     100 |    87.5 | ...32-133,137-138 
  index.ts         |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...gents/backends |    76.4 |    86.07 |   72.41 |    76.4 |                   
  ITermBackend.ts  |   97.97 |    93.93 |     100 |   97.97 | ...78-180,255,307 
  ...essBackend.ts |   92.17 |    90.32 |   82.35 |   92.17 | ...24-244,303,403 
  TmuxBackend.ts   |    90.7 |    76.55 |   97.36 |    90.7 | ...87,697,743-747 
  detect.ts        |   31.25 |      100 |       0 |   31.25 | 34-88             
  index.ts         |     100 |      100 |     100 |     100 |                   
  iterm-it2.ts     |     100 |     92.1 |     100 |     100 | 37-38,106         
  tmux-commands.ts |    6.64 |      100 |    3.03 |    6.64 | ...93-363,386-503 
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...agents/runtime |   80.94 |    76.01 |      70 |   80.94 |                   
  agent-core.ts    |      75 |    69.44 |   58.33 |      75 | ...1085,1112-1158 
  agent-events.ts  |   86.48 |      100 |      75 |   86.48 | 225-229           
  ...t-headless.ts |   79.52 |       75 |      55 |   79.52 | ...54-355,358-359 
  ...nteractive.ts |   81.71 |    78.12 |      75 |   81.71 | ...25,527,529,532 
  ...statistics.ts |   98.19 |    82.35 |     100 |   98.19 | 127,151,192,225   
  agent-types.ts   |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/config        |   73.62 |    75.92 |      61 |   73.62 |                   
  config.ts        |      71 |    72.98 |   54.94 |      71 | ...2693,2697-2709 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  models.ts        |     100 |      100 |     100 |     100 |                   
  storage.ts       |   95.72 |    92.85 |   91.66 |   95.72 | ...06-207,241-242 
 ...nfirmation-bus |   98.29 |    97.14 |     100 |   98.29 |                   
  message-bus.ts   |   98.14 |    97.05 |     100 |   98.14 | 42-43             
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/constants     |    4.95 |      100 |       0 |    4.95 |                   
  codingPlan.ts    |    4.95 |      100 |       0 |    4.95 | ...79-291,299-309 
 src/core          |   80.42 |    80.36 |   85.54 |   80.42 |                   
  baseLlmClient.ts |   96.77 |    96.42 |      80 |   96.77 | 123-126           
  client.ts        |   70.75 |    73.59 |   73.07 |   70.75 | ...1115,1119-1135 
  ...tGenerator.ts |    72.1 |    61.11 |     100 |    72.1 | ...45,347,354-357 
  ...lScheduler.ts |   73.63 |    76.97 |   91.17 |   73.63 | ...1888,1945-1949 
  geminiChat.ts    |    89.1 |     84.5 |   85.29 |    89.1 | ...1075,1142-1143 
  geminiRequest.ts |     100 |      100 |     100 |     100 |                   
  ...htProtocol.ts |    9.09 |      100 |       0 |    9.09 | 34-42,45-49,52-87 
  logger.ts        |   82.25 |    81.81 |     100 |   82.25 | ...57-361,407-421 
  ...tyDefaults.ts |     100 |      100 |     100 |     100 |                   
  ...olExecutor.ts |   92.59 |       75 |      50 |   92.59 | 41-42             
  ...on-helpers.ts |   76.53 |    60.71 |     100 |   76.53 | ...81-182,196-205 
  prompts.ts       |    88.8 |    88.05 |      75 |    88.8 | ...-898,1101-1102 
  tokenLimits.ts   |     100 |    89.47 |     100 |     100 | 50-51             
  ...okTriggers.ts |   99.31 |     90.9 |     100 |   99.31 | 124,135           
  turn.ts          |   96.29 |    88.46 |     100 |   96.29 | ...87,400-401,449 
 ...ntentGenerator |   93.72 |    73.43 |    90.9 |   93.72 |                   
  ...tGenerator.ts |   95.99 |    72.17 |   86.66 |   95.99 | ...03-304,438,494 
  converter.ts     |   93.47 |       75 |     100 |   93.47 | ...87-488,498,558 
  index.ts         |       0 |        0 |       0 |       0 | 1-21              
 ...ntentGenerator |   91.53 |    71.21 |   93.33 |   91.53 |                   
  ...tGenerator.ts |      90 |    70.49 |   92.85 |      90 | ...77-283,301-302 
  index.ts         |     100 |       80 |     100 |     100 | 50                
 ...ntentGenerator |   91.08 |    76.14 |   85.71 |   91.08 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...tGenerator.ts |   91.04 |    76.14 |   85.71 |   91.04 | ...23,533-534,562 
 ...ntentGenerator |   76.51 |    83.51 |   89.55 |   76.51 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  converter.ts     |   73.07 |       78 |   86.36 |   73.07 | ...1311,1332-1338 
  errorHandler.ts  |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-95              
  ...tGenerator.ts |   48.78 |    91.66 |   77.77 |   48.78 | ...10-163,166-167 
  pipeline.ts      |   94.15 |    89.58 |     100 |   94.15 | ...84,454-455,463 
  ...CallParser.ts |   90.66 |     88.4 |     100 |   90.66 | ...15-319,349-350 
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...rator/provider |   95.98 |    85.59 |   93.75 |   95.98 |                   
  dashscope.ts     |   97.22 |    87.69 |   93.33 |   97.22 | ...10-211,287-288 
  deepseek.ts      |    91.3 |    73.91 |     100 |    91.3 | 49-50,54-55,68-69 
  default.ts       |   94.62 |    86.36 |   85.71 |   94.62 | 85-86,156-158     
  index.ts         |     100 |      100 |     100 |     100 |                   
  modelscope.ts    |     100 |      100 |     100 |     100 |                   
  openrouter.ts    |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 |                   
 src/extension     |   60.71 |    79.59 |   79.03 |   60.71 |                   
  ...-converter.ts |   62.35 |    47.82 |      90 |   62.35 | ...90-791,800-832 
  ...ionManager.ts |   46.96 |    82.97 |   67.44 |   46.96 | ...1343,1364-1383 
  ...onSettings.ts |   93.46 |    93.05 |     100 |   93.46 | ...17-221,228-232 
  ...-converter.ts |   54.88 |    94.44 |      60 |   54.88 | ...35-146,158-192 
  github.ts        |   44.94 |    88.52 |      60 |   44.94 | ...53-359,398-451 
  index.ts         |     100 |      100 |     100 |     100 |                   
  marketplace.ts   |   97.29 |    93.75 |     100 |   97.29 | ...64,184-185,274 
  npm.ts           |   48.66 |    76.08 |      75 |   48.66 | ...18-420,427-431 
  override.ts      |   94.11 |    88.88 |     100 |   94.11 | 63-64,81-82       
  settings.ts      |   66.26 |      100 |      50 |   66.26 | 81-108,143-149    
  storage.ts       |   94.73 |       90 |     100 |   94.73 | 41-42             
  ...ableSchema.ts |     100 |      100 |     100 |     100 |                   
  variables.ts     |   88.75 |    83.33 |     100 |   88.75 | ...28-231,234-237 
 src/followup      |   46.18 |     92.3 |   71.87 |   46.18 |                   
  followupState.ts |      96 |    89.74 |     100 |      96 | 159-161,218-219   
  index.ts         |     100 |      100 |     100 |     100 |                   
  overlayFs.ts     |   95.06 |       84 |     100 |   95.06 | 78,108,122,133    
  speculation.ts   |   13.22 |      100 |   16.66 |   13.22 | 88-458,518-568    
  ...onToolGate.ts |     100 |    96.29 |     100 |     100 | 92                
  ...nGenerator.ts |   36.67 |    95.12 |   33.33 |   36.67 | ...24-326,361-391 
 src/generated     |       0 |        0 |       0 |       0 |                   
  git-commit.ts    |       0 |        0 |       0 |       0 | 1-10              
 src/hooks         |    80.6 |    84.37 |   84.16 |    80.6 |                   
  ...okRegistry.ts |   86.48 |    77.08 |     100 |   86.48 | ...41-344,362-369 
  ...bortSignal.ts |     100 |      100 |     100 |     100 |                   
  ...terpolator.ts |   96.66 |    93.33 |     100 |   96.66 | 66-67             
  ...HookRunner.ts |   96.68 |    87.23 |     100 |   96.68 | 110-112,231-233   
  ...Aggregator.ts |   96.37 |    90.54 |     100 |   96.37 | ...89,291-292,365 
  ...entHandler.ts |   95.58 |    84.37 |   92.59 |   95.58 | ...29,682-683,693 
  hookPlanner.ts   |   84.13 |    76.59 |      90 |   84.13 | ...38,144,162-173 
  hookRegistry.ts  |   88.83 |    86.36 |     100 |   88.83 | ...21,326,330,334 
  hookRunner.ts    |   53.63 |    72.22 |   61.11 |   53.63 | ...23-724,733-734 
  hookSystem.ts    |   75.47 |      100 |   56.41 |   75.47 | ...75-576,582-583 
  ...HookRunner.ts |   75.51 |     61.9 |      80 |   75.51 | ...05-406,424-425 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...SkillHooks.ts |   78.75 |       75 |   66.66 |   78.75 | 62-66,137-152     
  ...oksManager.ts |    96.5 |     91.8 |     100 |    96.5 | ...90,209-210,223 
  ssrfGuard.ts     |   77.22 |    85.36 |     100 |   77.22 | ...57,261-267,273 
  trustedHooks.ts  |       0 |        0 |       0 |       0 | 1-124             
  types.ts         |   90.15 |    91.02 |   85.18 |   90.15 | ...91-392,452-456 
  urlValidator.ts  |     100 |      100 |     100 |     100 |                   
 src/ide           |   74.28 |    83.39 |   78.33 |   74.28 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  detect-ide.ts    |     100 |      100 |     100 |     100 |                   
  ide-client.ts    |    64.2 |    81.48 |   66.66 |    64.2 | ...9-970,999-1007 
  ide-installer.ts |   89.06 |    79.31 |     100 |   89.06 | ...36,143-147,160 
  ideContext.ts    |     100 |      100 |     100 |     100 |                   
  process-utils.ts |   84.84 |    71.79 |     100 |   84.84 | ...37,151,193-194 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/lsp           |   33.39 |    43.56 |   44.91 |   33.39 |                   
  ...nfigLoader.ts |   70.27 |    35.89 |   94.73 |   70.27 | ...20-422,426-432 
  ...ionFactory.ts |    4.29 |        0 |       0 |    4.29 | ...20-371,377-394 
  ...Normalizer.ts |   23.09 |    13.72 |   30.43 |   23.09 | ...04-905,909-924 
  ...verManager.ts |   10.47 |       75 |      25 |   10.47 | ...56-675,681-711 
  ...eLspClient.ts |   17.89 |      100 |       0 |   17.89 | ...37-244,254-258 
  ...LspService.ts |   45.87 |    62.13 |   66.66 |   45.87 | ...1282,1299-1309 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/mcp           |   78.69 |    75.34 |   75.92 |   78.69 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...h-provider.ts |   86.95 |      100 |   33.33 |   86.95 | ...,93,97,101-102 
  ...h-provider.ts |   73.82 |    53.92 |     100 |   73.82 | ...88-895,902-904 
  ...en-storage.ts |   98.62 |    97.72 |     100 |   98.62 | 87-88             
  oauth-utils.ts   |   70.58 |    85.29 |    90.9 |   70.58 | ...70-290,315-344 
  ...n-provider.ts |   89.83 |    95.83 |   45.45 |   89.83 | ...43,147,151-152 
 .../token-storage |   79.48 |    86.66 |   86.36 |   79.48 |                   
  ...en-storage.ts |     100 |      100 |     100 |     100 |                   
  ...en-storage.ts |   82.75 |    82.35 |   92.85 |   82.75 | ...62-172,180-181 
  ...en-storage.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...en-storage.ts |   68.14 |    82.35 |   64.28 |   68.14 | ...81-295,298-314 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/memory        |   61.95 |    74.52 |    65.3 |   61.95 |                   
  const.ts         |     100 |      100 |     100 |     100 |                   
  dream.ts         |   88.07 |    66.66 |      80 |   88.07 | ...23,131,141-147 
  ...entPlanner.ts |    55.2 |       75 |   28.57 |    55.2 | ...30,135-142,147 
  entries.ts       |   59.84 |       70 |      50 |   59.84 | ...72-180,183-189 
  extract.ts       |    95.2 |    79.16 |     100 |    95.2 | 81-86,125         
  ...entPlanner.ts |   63.08 |    65.71 |   41.17 |   63.08 | ...17,222-223,332 
  ...ionPlanner.ts |       0 |        0 |       0 |       0 | 1                 
  forget.ts        |    8.04 |      100 |       0 |    8.04 | 67-342            
  governance.ts    |       0 |        0 |       0 |       0 | 1-352             
  indexer.ts       |   83.87 |    45.45 |     100 |   83.87 | ...50,56-57,69-70 
  manager.ts       |   74.16 |    76.23 |   70.27 |   74.16 | ...77-878,891-893 
  memoryAge.ts     |   80.95 |     87.5 |      75 |   80.95 | 48-51             
  paths.ts         |   55.47 |    88.88 |   85.71 |   55.47 | ...,88-89,105-113 
  prompt.ts        |   93.36 |    71.42 |     100 |   93.36 | ...58,161,228-229 
  recall.ts        |   82.24 |    78.04 |   88.88 |   82.24 | ...71-188,246-257 
  ...ceSelector.ts |   91.56 |    73.68 |     100 |   91.56 | ...01,103-104,112 
  scan.ts          |   87.91 |    68.42 |     100 |   87.91 | ...47-48,58,82-87 
  status.ts        |   10.52 |      100 |       0 |   10.52 | 41-98             
  store.ts         |   94.44 |    83.33 |     100 |   94.44 | 56-57,92-93       
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/mocks         |       0 |        0 |       0 |       0 |                   
  msw.ts           |       0 |        0 |       0 |       0 | 1-9               
 src/models        |   89.51 |     85.5 |   87.14 |   89.51 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...tor-config.ts |   88.67 |     90.9 |     100 |   88.67 | 112,118,121-130   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...nfigErrors.ts |   74.22 |       44 |   84.61 |   74.22 | ...,67-74,106-117 
  ...igResolver.ts |   98.65 |     92.3 |     100 |   98.65 | 135,297,303       
  modelRegistry.ts |     100 |    98.21 |     100 |     100 | 182               
  modelsConfig.ts  |   85.37 |    83.54 |   81.57 |   85.37 | ...1210,1239-1240 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/output        |     100 |      100 |     100 |     100 |                   
  ...-formatter.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/permissions   |    70.5 |    87.96 |    48.2 |    70.5 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...on-manager.ts |   79.18 |    82.65 |   79.16 |   79.18 | ...85-786,793-802 
  rule-parser.ts   |   95.88 |    93.56 |     100 |   95.88 | ...40-841,990-992 
  ...-semantics.ts |   58.28 |    85.27 |    30.2 |   58.28 | ...1604-1614,1643 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/prompts       |   83.63 |      100 |    87.5 |   83.63 |                   
  mcp-prompts.ts   |   18.18 |      100 |       0 |   18.18 | 11-19             
  ...t-registry.ts |     100 |      100 |     100 |     100 |                   
 src/qwen          |   86.03 |    79.48 |   97.18 |   86.03 |                   
  ...tGenerator.ts |   98.64 |    98.18 |     100 |   98.64 | 105-106           
  qwenOAuth2.ts    |   85.01 |    74.81 |   93.33 |   85.01 | ...,986-1002,1032 
  ...kenManager.ts |   83.79 |    76.22 |     100 |   83.79 | ...63-768,789-794 
 src/services      |   82.46 |    82.11 |   84.32 |   82.46 |                   
  ...ionService.ts |   97.95 |    94.04 |     100 |   97.95 | 255,257-261       
  ...ingService.ts |   72.04 |    78.88 |   73.07 |   72.04 | ...01-902,919-920 
  cronScheduler.ts |   97.56 |    92.98 |     100 |   97.56 | 62-63,77,155      
  ...eryService.ts |   80.43 |    95.45 |      75 |   80.43 | ...19-134,140-141 
  ...temService.ts |   89.76 |     85.1 |   88.88 |   89.76 | ...89,191,266-273 
  gitInit.ts       |     100 |      100 |     100 |     100 |                   
  gitService.ts    |   68.75 |     92.3 |   55.55 |   68.75 | ...12-122,125-129 
  ...reeService.ts |   71.83 |    68.47 |    91.3 |   71.83 | ...89-790,806,822 
  ...ionService.ts |   98.13 |     97.8 |   95.45 |   98.13 | ...32-333,380-381 
  sessionRecap.ts  |   10.71 |      100 |       0 |   10.71 | 48-161            
  ...ionService.ts |   83.91 |    71.72 |      92 |   83.91 | ...-989,1021-1022 
  sessionTitle.ts  |   93.95 |    70.37 |     100 |   93.95 | ...36-239,270-271 
  ...ionService.ts |   84.72 |    82.38 |   83.78 |   84.72 | ...8-989,995-1000 
 ...icrocompaction |   98.62 |    86.44 |     100 |   98.62 |                   
  microcompact.ts  |   98.62 |    86.44 |     100 |   98.62 | 138,142           
 src/skills        |   83.14 |    78.86 |   90.32 |   83.14 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  skill-load.ts    |    90.9 |    77.77 |     100 |    90.9 | ...32,152,164-166 
  skill-manager.ts |   80.51 |    77.55 |   88.46 |   80.51 | ...83-891,898-902 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/subagents     |   82.44 |    80.76 |   91.11 |   82.44 |                   
  ...tin-agents.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...-selection.ts |     100 |      100 |     100 |     100 |                   
  ...nt-manager.ts |   76.05 |    72.81 |   87.09 |   76.05 | ...1112,1134-1135 
  types.ts         |     100 |      100 |     100 |     100 |                   
  validation.ts    |   92.46 |    95.18 |     100 |   92.46 | 51-56,69-74,78-83 
 src/telemetry     |   68.06 |       82 |   73.68 |   68.06 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...-exporters.ts |   46.37 |      100 |   44.44 |   46.37 | ...85,88-89,92-93 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...t.circular.ts |       0 |        0 |       0 |       0 | 1-111             
  ...t.circular.ts |       0 |        0 |       0 |       0 | 1-128             
  loggers.ts       |   52.09 |    61.64 |   57.77 |   52.09 | ...1218,1235-1255 
  metrics.ts       |    74.9 |    82.95 |   74.54 |    74.9 | ...58-978,981-992 
  sanitize.ts      |      80 |    83.33 |     100 |      80 | 35-36,41-42       
  sdk.ts           |   85.13 |    56.25 |     100 |   85.13 | ...78,184-185,191 
  ...etry-utils.ts |     100 |      100 |     100 |     100 |                   
  ...l-decision.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |   79.13 |    85.83 |   83.33 |   79.13 | ...1136,1139-1168 
  uiTelemetry.ts   |   93.04 |    96.55 |   81.25 |   93.04 | ...95-196,202-209 
 ...ry/qwen-logger |   68.05 |    80.21 |   64.91 |   68.05 |                   
  event-types.ts   |       0 |        0 |       0 |       0 |                   
  qwen-logger.ts   |   68.05 |       80 |   64.28 |   68.05 | ...1043,1081-1082 
 src/test-utils    |   93.07 |    95.65 |   73.52 |   93.07 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  ...st-helpers.ts |   94.11 |       90 |     100 |   94.11 | 69-70             
  index.ts         |     100 |      100 |     100 |     100 |                   
  mock-tool.ts     |   91.02 |    96.87 |   68.96 |   91.02 | ...32,196-197,210 
  ...aceContext.ts |     100 |      100 |     100 |     100 |                   
 src/tools         |   73.63 |    79.16 |   79.26 |   73.63 |                   
  ...erQuestion.ts |   87.89 |     73.8 |    90.9 |   87.89 | ...44-345,349-350 
  cron-create.ts   |   97.61 |    88.88 |   83.33 |   97.61 | 30-31             
  cron-delete.ts   |   96.55 |      100 |   83.33 |   96.55 | 26-27             
  cron-list.ts     |   96.36 |      100 |   83.33 |   96.36 | 25-26             
  diffOptions.ts   |     100 |      100 |     100 |     100 |                   
  edit.ts          |   80.72 |    85.05 |   73.33 |   80.72 | ...09-510,593-643 
  exitPlanMode.ts  |   84.61 |    85.71 |     100 |   84.61 | ...60-163,177-189 
  glob.ts          |   90.56 |    88.33 |   84.61 |   90.56 | ...24,167,297,300 
  grep.ts          |   71.24 |    87.34 |   72.22 |   71.24 | ...88,528,536-543 
  ls.ts            |   96.74 |    90.27 |     100 |   96.74 | 171-176,207,211   
  lsp.ts           |   72.58 |    60.29 |   90.32 |   72.58 | ...1202,1204-1205 
  ...nt-manager.ts |   47.47 |       60 |   44.44 |   47.47 | ...73-491,494-531 
  mcp-client.ts    |   29.65 |    71.05 |   46.87 |   29.65 | ...1434,1438-1441 
  mcp-tool.ts      |   90.92 |    88.88 |   96.42 |   90.92 | ...89-590,640-641 
  memory-config.ts |       0 |        0 |       0 |       0 | 1-48              
  ...iable-tool.ts |     100 |    84.61 |     100 |     100 | 102,109           
  read-file.ts     |   91.94 |    86.79 |   88.88 |   91.94 | ...,94-95,166-175 
  ripGrep.ts       |   94.42 |    89.33 |   91.66 |   94.42 | ...34,337,415-416 
  ...-transport.ts |    6.34 |        0 |       0 |    6.34 | 47-145            
  shell.ts         |   82.69 |    78.12 |    92.3 |   82.69 | ...84-488,694-695 
  skill-utils.ts   |     100 |      100 |     100 |     100 |                   
  skill.ts         |   86.97 |    87.71 |   83.33 |   86.97 | ...11,315,338-360 
  todoWrite.ts     |   85.42 |    84.09 |   84.61 |   85.42 | ...05-410,432-433 
  tool-error.ts    |     100 |      100 |     100 |     100 |                   
  tool-names.ts    |     100 |      100 |     100 |     100 |                   
  tool-registry.ts |   67.49 |    68.91 |   65.71 |   67.49 | ...59-660,668-669 
  tools.ts         |   84.18 |    89.58 |   82.35 |   84.18 | ...25-426,442-448 
  web-fetch.ts     |   88.44 |    76.92 |    92.3 |   88.44 | ...05-306,308-309 
  write-file.ts    |   83.04 |    77.19 |   83.33 |   83.04 | ...11-414,426-461 
 src/tools/agent   |   78.23 |    82.31 |   84.37 |   78.23 |                   
  agent.ts         |   81.34 |    81.94 |   88.88 |   81.34 | ...1112,1138-1142 
  fork-subagent.ts |   42.02 |      100 |      60 |   42.02 | 54-72,91-128      
 src/utils         |   86.95 |    87.13 |   90.63 |   86.95 |                   
  LruCache.ts      |       0 |        0 |       0 |       0 | 1-41              
  ...ssageQueue.ts |     100 |      100 |     100 |     100 |                   
  ...cFileWrite.ts |   76.08 |    44.44 |     100 |   76.08 | 61-70,72          
  bareMode.ts      |   27.27 |      100 |       0 |   27.27 | 9-15,18-19        
  browser.ts       |    7.69 |      100 |       0 |    7.69 | 17-56             
  ...igResolver.ts |     100 |      100 |     100 |     100 |                   
  cronDisplay.ts   |   42.85 |    23.07 |     100 |   42.85 | 26-31,33-45,47-54 
  cronParser.ts    |   89.74 |    85.71 |     100 |   89.74 | ...,63-64,183-186 
  debugLogger.ts   |   96.12 |    93.75 |   93.75 |   96.12 | 164-168           
  editHelper.ts    |   92.67 |    82.14 |     100 |   92.67 | ...52-454,463-464 
  editor.ts        |   97.61 |    95.71 |     100 |   97.61 | ...70-271,273-274 
  ...arResolver.ts |   94.28 |    88.88 |     100 |   94.28 | 28-29,125-126     
  ...entContext.ts |     100 |       95 |     100 |     100 | 83                
  errorParsing.ts  |   97.05 |       95 |     100 |   97.05 | 39-40             
  ...rReporting.ts |   88.46 |       90 |     100 |   88.46 | 69-74             
  errors.ts        |   70.92 |    80.39 |   53.33 |   70.92 | ...03-219,223-229 
  fetch.ts         |   70.18 |    71.42 |   71.42 |   70.18 | ...42,148,161,186 
  fileUtils.ts     |   89.08 |    85.06 |   94.73 |   89.08 | ...68-875,879-885 
  forkedAgent.ts   |   62.98 |    54.54 |      75 |   62.98 | ...23-432,434-447 
  formatters.ts    |   54.54 |       50 |     100 |   54.54 | 12-16             
  ...eUtilities.ts |   89.21 |    86.66 |     100 |   89.21 | 16-17,49-55,65-66 
  ...rStructure.ts |   94.36 |    94.28 |     100 |   94.36 | ...17-120,330-335 
  getPty.ts        |    12.5 |      100 |       0 |    12.5 | 21-34             
  ...noreParser.ts |    92.3 |    89.36 |     100 |    92.3 | ...15-116,186-187 
  gitUtils.ts      |   36.66 |    76.92 |      50 |   36.66 | ...4,88-89,97-148 
  iconvHelper.ts   |     100 |      100 |     100 |     100 |                   
  ...rePatterns.ts |     100 |      100 |     100 |     100 |                   
  ...ionManager.ts |     100 |     90.9 |     100 |     100 | 26                
  ...lPromptIds.ts |     100 |      100 |     100 |     100 |                   
  jsonl-utils.ts   |   10.07 |      100 |       0 |   10.07 | ...67-200,206-212 
  ...-detection.ts |     100 |      100 |     100 |     100 |                   
  ...yDiscovery.ts |   83.85 |    79.36 |     100 |   83.85 | ...15,318,410-413 
  ...tProcessor.ts |   93.63 |       90 |     100 |   93.63 | ...96-302,384-385 
  ...Inspectors.ts |   61.53 |      100 |      50 |   61.53 | 18-23             
  ...kerChecker.ts |   82.55 |    78.57 |     100 |   82.55 | 68-69,79-84,92-98 
  notebook.ts      |   94.35 |    84.78 |     100 |   94.35 | ...10,122,174-176 
  openaiLogger.ts  |   86.27 |    82.14 |     100 |   86.27 | ...05-107,130-135 
  partUtils.ts     |     100 |      100 |     100 |     100 |                   
  pathReader.ts    |     100 |      100 |     100 |     100 |                   
  paths.ts         |   93.43 |     92.1 |     100 |   93.43 | ...50-351,353-355 
  pdf.ts           |   93.68 |    87.05 |     100 |   93.68 | ...96-297,321-325 
  ...ectSummary.ts |   89.39 |    72.41 |     100 |   89.39 | ...37-142,193-196 
  ...tIdContext.ts |     100 |      100 |     100 |     100 |                   
  proxyUtils.ts    |     100 |      100 |     100 |     100 |                   
  ...rDetection.ts |   58.57 |       76 |     100 |   58.57 | ...4,88-89,95-100 
  ...noreParser.ts |   85.45 |    85.18 |     100 |   85.45 | ...59,65-66,72-73 
  rateLimit.ts     |   91.48 |    94.11 |     100 |   91.48 | 80,93-95          
  readManyFiles.ts |   87.96 |    86.95 |     100 |   87.96 | ...05-207,223-234 
  retry.ts         |   89.81 |    88.05 |     100 |   89.81 | ...29,350,357-358 
  ripgrepUtils.ts  |   46.53 |    83.33 |   66.66 |   46.53 | ...32-233,245-322 
  ...sDiscovery.ts |   97.47 |    93.15 |     100 |   97.47 | ...03,181-182,201 
  ...tchOptions.ts |   63.85 |    64.28 |   83.33 |   63.85 | ...29-130,187-188 
  safeJsonParse.ts |   74.07 |    83.33 |     100 |   74.07 | 40-46             
  ...nStringify.ts |     100 |      100 |     100 |     100 |                   
  ...aConverter.ts |   90.78 |    87.87 |     100 |   90.78 | ...41-42,93,95-96 
  ...aValidator.ts |   93.43 |    77.41 |     100 |   93.43 | ...46,155-158,212 
  ...r-launcher.ts |   76.92 |     91.3 |   66.66 |   76.92 | ...34,136,157-195 
  ...orageUtils.ts |   92.41 |    82.82 |     100 |   92.41 | ...39,423-430,441 
  shell-utils.ts   |    83.6 |    90.63 |     100 |    83.6 | ...1040,1047-1051 
  ...lAstParser.ts |   95.58 |    85.79 |     100 |   95.58 | ...1059-1061,1071 
  ...nlyChecker.ts |   95.75 |    92.47 |     100 |   95.75 | ...00-301,313-314 
  sideQuery.ts     |     100 |    92.85 |     100 |     100 | 43                
  ...tGenerator.ts |     100 |      100 |     100 |     100 |                   
  ...ameContext.ts |     100 |      100 |     100 |     100 |                   
  symlink.ts       |   77.77 |       50 |     100 |   77.77 | 44,54-59          
  ...emEncoding.ts |   96.36 |    91.17 |     100 |   96.36 | 59-60,124-125     
  terminalSafe.ts  |     100 |      100 |     100 |     100 |                   
  ...Serializer.ts |   98.72 |       90 |     100 |   98.72 | 42-43,134,201-203 
  testUtils.ts     |   53.33 |      100 |   33.33 |   53.33 | ...53,59-64,70-72 
  textUtils.ts     |      60 |      100 |   66.66 |      60 | 36-55             
  thoughtUtils.ts  |     100 |    92.85 |     100 |     100 | 71                
  ...-converter.ts |   94.59 |    85.71 |     100 |   94.59 | 35-36             
  tool-utils.ts    |    93.6 |     91.3 |     100 |    93.6 | ...58-159,162-163 
  truncation.ts    |     100 |       92 |     100 |     100 | 52,71             
  windowsPath.ts   |   89.47 |    79.31 |     100 |   89.47 | ...57-58,62,90-91 
  ...aceContext.ts |   93.71 |    88.88 |   93.33 |   93.71 | ...24-225,249-251 
  yaml-parser.ts   |      92 |    84.31 |     100 |      92 | 49-53,65-69       
 ...ils/filesearch |   96.34 |    91.66 |     100 |   96.34 |                   
  crawlCache.ts    |     100 |      100 |     100 |     100 |                   
  crawler.ts       |   96.87 |    94.44 |     100 |   96.87 | 83-84             
  fileSearch.ts    |   93.29 |    86.76 |     100 |   93.29 | ...40-241,243-244 
  ignore.ts        |     100 |      100 |     100 |     100 |                   
  result-cache.ts  |     100 |     92.3 |     100 |     100 | 46                
 ...uest-tokenizer |   56.63 |    74.52 |   74.19 |   56.63 |                   
  ...eTokenizer.ts |   41.86 |    76.47 |   69.23 |   41.86 | ...70-443,453-507 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...tTokenizer.ts |   68.39 |    69.49 |    90.9 |   68.39 | ...24-325,327-328 
  ...ageFormats.ts |      76 |      100 |   33.33 |      76 | 45-48,55-56       
  textTokenizer.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 | 1                 
-------------------|---------|----------|---------|---------|-------------------

For detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run.

@B-A-M-N B-A-M-N closed this Apr 27, 2026
@B-A-M-N B-A-M-N deleted the refactor/dedupe-timeout-env-override branch April 27, 2026 14:46
B-A-M-N pushed a commit that referenced this pull request May 6, 2026
…QwenLM#3774)

* feat(core): enforce prior read before Edit / WriteFile mutates a file

Introduces a session-scoped invariant: the model cannot mutate an
existing file without having actually Read it (or its post-write
state) earlier in this conversation. Builds on the FileReadCache
landed in QwenLM#3717.

Two new ToolErrorType codes:
- EDIT_REQUIRES_PRIOR_READ — file has no entry in the session cache.
  The model is told to use read_file first.
- FILE_CHANGED_SINCE_READ — file has an entry but its mtime or size
  drifted since the recorded fingerprint. The model is told to
  re-read before retrying.

EditTool blocks the existing-file path on cache.check; new-file
creation (old_string === '' on a non-existent target) is exempt.
WriteFileTool blocks the overwrite path; new-file creation
(fileExists === false) is exempt.

Both tools route through the existing fileReadCacheDisabled escape
hatch on Config — flipping it bypasses enforcement byte-for-byte,
matching pre-cache behaviour. Operators can use this as a kill switch
if a session falls into a state where the cache cannot be trusted.

ReadFile fix on the auto-memory path: PR QwenLM#3717 had auto-memory reads
skip the cache entirely (both lookup and record), but with the new
enforcement that means a model that just Read AGENTS.md cannot then
Edit it. Decoupled the two: auto-memory reads still skip the
file_unchanged fast-path (the per-read freshness <system-reminder>
must always reach the model) but DO record into the cache so the
follow-up Edit sees `fresh`. New regression test asserts this.

Test plan
- vitest run (all of @qwen-code/qwen-code-core): 6308 passed, 2 skipped
- 9 new enforcement tests across edit.test.ts and write-file.test.ts:
  unknown rejects, stale rejects, new-file exempt, edit chain stays
  authorised, escape hatch bypasses, plus the auto-memory record
  regression in read-file.test.ts.
- tsc --noEmit clean. eslint clean. core build succeeds.

* test(core): clear shared fileReadCache between write-file.test.ts cases

CI surfaced one Linux-only failure: the prior-read enforcement test
'rejects a write that would overwrite an unread existing file'
returned FILE_CHANGED_SINCE_READ instead of EDIT_REQUIRES_PRIOR_READ.

Root cause: the FileReadCache instance is declared at module scope
(line 41) and shared across every test in write-file.test.ts. State
from earlier tests — most recently the 'records a write' integration
test that records the same path — leaks forward. On Linux the test
ordering puts a record-bearing test before the enforcement test, so
the cache reports `stale` (mtime drifted) instead of `unknown`.
macOS / Windows happen to order them differently and never hit it.

Adding a fileReadCache.clear() to beforeEach gives every test a
known-empty cache, matching how edit.test.ts already isolates its
per-test cache by re-instantiating it.

* fix(core): close prior-read enforcement gaps flagged in 3rd review

Three concrete loopholes / regressions that the original PR-B
introduction left open. All three are addressed in the same commit
because the underlying refactor (move enforcement earlier and tighten
the fresh predicate) is shared across them.

1. fresh != "model has seen the bytes". Pre-fix, requirePriorRead()
   accepted any cache.check === 'fresh'. ReadFileTool records every
   successful read into the cache, including ranged reads
   (offset/limit), truncated full reads, and non-cacheable
   binary/image/audio/video/PDF/notebook reads (lastReadCacheable
   = false). This let the model peek at a slice or a structured
   payload of a file and then mutate the whole thing. Tightened the
   accept predicate to fresh && lastReadAt && lastReadWasFull &&
   lastReadCacheable.

2. Read-less content oracle through calculateEdit error codes. Pre-fix,
   execute() ran calculateEdit (which reads file bytes and counts
   matches) before the enforcement check. A model could probe an
   unread file by attempting Edits with candidate old_strings and
   observing NO_OCCURRENCE_FOUND vs EXPECTED_OCCURRENCE_MISMATCH vs
   EDIT_NO_CHANGE — reverse-engineering content without ever calling
   read_file. Moved enforcement to the top of calculateEdit, before
   any content read; only a stat is performed up to the rejection
   point.

3. Confirmation flow regression. Pre-fix, getConfirmationDetails()
   read the existing file to render a diff for the user, then
   approval flowed to execute() which would freshly check the cache
   and reject. The user could approve a diff computed from current
   bytes the model never saw, and the call would still fail. Moved
   enforcement before the confirmation read in both EditTool (via the
   shared calculateEdit path) and WriteFileTool (explicit check at
   the top of getConfirmationDetails). The user now never sees a
   confirmation diff for an unread file — the call rejects up front.

Public API surface change: requirePriorRead() -> checkPriorRead() that
returns a structured decision, so the same predicate can route into
a CalculatedEdit.error (calculateEdit), a thrown error
(getConfirmationDetails), or a ToolResult (execute) without
duplicating the boolean / message / type plumbing in three shapes.

Reported by pomelo-nwu (3 inline comments on PR QwenLM#3774).

* refactor(core): close 4 prior-read enforcement gaps from 4th review

1. recordWrite now seeds read metadata on brand-new entries
   (lastReadAt / lastReadWasFull / lastReadCacheable). The strict
   accept predicate added in the previous round (#3 review) requires
   all three, but recordWrite only set lastWriteAt — so a model
   creating a file with Edit (old_string="") or WriteFile and then
   editing it again was rejected on the second edit. The model
   authored the bytes it just wrote; for the purposes of prior-read
   enforcement that counts as having seen them. New regression test
   in edit.test.ts: "allows a create-then-edit-then-edit chain
   without an intervening read".

2. Extracted checkPriorRead into src/tools/priorReadEnforcement.ts.
   The two copies in edit.ts and write-file.ts had already drifted
   (one used ${ReadFileTool.Name}, the other hardcoded 'read_file');
   the boolean guard is security-sensitive and a one-sided fix
   would silently weaken the boundary. The shared utility takes a
   verb ('editing' | 'overwriting') so the user-facing prose can
   differ between callers without duplicating the decision logic.

3. WriteFileTool.execute now runs prior-read enforcement BEFORE
   readTextFile. Pre-fix, an unread overwrite still slurped the
   entire file into memory (encoding / BOM / line-ending detection)
   and only then rejected it: wasted I/O, and momentary in-memory
   custody of bytes the model never legitimately read. Now matches
   the order in getConfirmationDetails().

4. The "rejects a write that would overwrite an unread existing
   file" test now spies on FileSystemService.readTextFile and
   asserts not.toHaveBeenCalled() — without that, the test gave
   false confidence: it passed both pre-fix (read happened, then
   reject) and post-fix (reject before read), so the ordering
   regression in (3) was invisible to the assertion.

Reported by glm-5.1 via /review on PR QwenLM#3774.

* refactor(core): close 4 prior-read enforcement gaps from 4th review (Copilot)

Five concrete gaps that the previous round of enforcement work left
open. Reported by Copilot via /review on PR QwenLM#3774.

1. Confirmation-time rejections lost their ToolErrorType code.
   getConfirmationDetails() in both EditTool and WriteFileTool threw
   a plain Error on prior-read failure, which coreToolScheduler
   collapsed into UNHANDLED_EXCEPTION — silently breaking the
   EDIT_REQUIRES_PRIOR_READ / FILE_CHANGED_SINCE_READ contract for
   any approval-required flow.

   Fix: introduce PriorReadEnforcementError that carries
   `errorType: ToolErrorType`. Both confirmation paths now throw it,
   and coreToolScheduler reads `error.errorType` (falling back to
   UNHANDLED_EXCEPTION when absent). New regression tests assert
   the thrown error's `errorType` field for both tools.

2. checkPriorRead's "re-read with read_file" advice was wrong for
   binary / image / audio / video / PDF / notebook files. Their
   ReadFile result always sets lastReadCacheable=false, so the
   message would loop the agent forever on the same rejection.

   Fix: detect the fresh-but-non-cacheable case explicitly and
   return a dedicated message that explains the dead end ("Edit /
   WriteFile cannot mutate that payload safely") instead of asking
   for another read. Updated the existing non-cacheable regression
   test to assert the new message and the absence of "use the
   read_file tool first".

3. checkPriorRead swallowed every stat() failure and returned
   ok:true. EACCES, EBUSY, NFS hiccups, etc. would silently
   re-open the blind-write path the helper exists to block.

   Fix: only ENOENT continues to return ok:true (disappearance
   race). Any other code is fail-closed: returns
   EDIT_REQUIRES_PRIOR_READ with a message that names the errno.
   New regression test in write-file.test.ts spies on fs.promises
   .stat to inject EACCES and asserts the rejection.

4. The auto-memory record regression test only asserted `state ===
   'fresh'`. A future change that recorded auto-memory reads as
   partial / non-cacheable would still satisfy that assertion but
   would actually fail enforcement on every follow-up Edit.

   Fix: also assert lastReadAt is defined, lastReadWasFull is true,
   and lastReadCacheable is true. The full "what enforcement
   requires" predicate is now explicit in the test.

(The 5th item, the WriteFile mirror of (1), is covered by the same
PriorReadEnforcementError change.)

* refactor(core): tighten StructuredToolError naming + add scheduler test

Four follow-ups raised by deepseek-v4-pro on PR QwenLM#3774. None of them
change the enforcement boundary; they are all about making the
contract clearer and harder to break in future changes.

1. PriorReadEnforcementError -> StructuredToolError. The class now
   wraps any content-derived ToolErrorType from calculateEdit
   (EDIT_NO_OCCURRENCE_FOUND, EDIT_EXPECTED_OCCURRENCE_MISMATCH,
   EDIT_NO_CHANGE, ATTEMPT_TO_CREATE_EXISTING_FILE) on top of the
   prior-read codes. The old name suggested the class was prior-
   read-specific, which would mislead any oncall engineer seeing
   it paired with one of the calculateEdit error codes.

2. EDIT_REQUIRES_PRIOR_READ kept its name (the prefix mentions
   "edit" but the enum is shared with WriteFileTool) — chose
   documentation over rename to avoid the churn of a value rename
   across logs/dashboards already keyed on it. JSDoc now spells
   out the cross-tool usage explicitly.

3. Stat failures other than ENOENT now map to a new
   PRIOR_READ_VERIFICATION_FAILED code instead of being conflated
   with EDIT_REQUIRES_PRIOR_READ. The failure mode is "we cannot
   verify" rather than "definitely not read" — operators routing
   on error codes can distinguish the two populations.

4. Added a coreToolScheduler test (`surfaces error.errorType from
   a confirmation throw instead of UNHANDLED_EXCEPTION`) that
   constructs a stub tool whose getConfirmationDetails throws
   StructuredToolError and asserts the surfaced ToolCall response
   carries the correct ToolErrorType. Without this test the
   scheduler's explicitErrorType branch would have no coverage at
   all.

Tool tests updated for the new StructuredToolError class name and
the PRIOR_READ_VERIFICATION_FAILED code on the EACCES path.

* fix(core): close TOCTOU + grammar + directory regressions in PR-B

Six concrete issues that the previous round of enforcement work
left open. Reported by Copilot via /review on PR QwenLM#3774.

1. TOCTOU window between pre-read checkPriorRead and readTextFile.
   The pre-read stat could pass enforcement, then an external writer
   could land between that stat and the actual read, leaving
   currentContent reflecting bytes the model never saw — exactly the
   stale-write path the PR is supposed to block. Closed by re-running
   checkPriorRead immediately after every readTextFile that fed
   currentContent / originalContent: EditTool.calculateEdit and the
   two WriteFileTool paths (execute + getConfirmationDetails). A
   `stale` outcome now fails the operation with
   FILE_CHANGED_SINCE_READ at the correct moment.

2. Directory targets sent the model into an enforcement loop.
   `fileExists` is a plain access check, so directories also entered
   the enforcement branch — the model would be told to call
   `read_file`, but `read_file` rejects directories with
   TARGET_IS_DIRECTORY, so the loop never terminated. Fixed in
   checkPriorRead: if `fs.stat` reports the path is not a regular
   file, return `ok: true` so the downstream readTextFile / write
   path can surface its own EISDIR / similar error.

3. Confirmation-time error messages used the short `display` form
   instead of the full `raw` form. Approval-required Edit calls
   therefore lost the remediation detail (file path, stale-vs-unread
   distinction, "without offset / limit / pages" hint) that the
   execute path already surfaced and that the WriteFile confirmation
   path already preserved. EditTool.getConfirmationDetails now
   throws StructuredToolError with `editData.error.raw`.

4. Non-text payload displayMessage was grammatically broken: built
   from the gerund `editing` / `overwriting`, it rendered as
   "cannot editing via this tool" / "cannot overwriting via this
   tool". Fixed by deriving a bare-verb form (`edit` / `overwrite`)
   alongside the gerund and using it in displayMessage.

(Items 1, 5 and 6 from Copilot's batch are the same TOCTOU class —
EditTool calculateEdit + WriteFile execute + WriteFile confirmation —
addressed together in (1) above.)

The "bypasses enforcement entirely" test now uses mockReturnValue
instead of mockReturnValueOnce because calculateEdit calls
getFileReadCacheDisabled twice — once for the pre-read check and
once for the post-read TOCTOU re-check. Both must see disabled=true
to actually bypass.

* fix(core): close fileExists TOCTOU on WriteFile prior-read enforcement

WriteFile gated prior-read enforcement on `fileExists` from
`isFilefileExists()`, but a file that sprang into existence between
that check and the write would still be overwritten without
enforcement — `fileExists === false` skipped the check entirely.

Made the gate unconditional on `fileExists`. checkPriorRead's own
`fs.stat` decides what to do:
- ENOENT → ok:true, fall through to the new-file path as before
- file exists right now (whether or not isFilefileExists saw it) →
  unknown / stale check runs, the race-created file is rejected.

Applied to both getConfirmationDetails and execute. The path that
actually creates new files is unchanged because checkPriorRead's
ENOENT branch is the disappearance-race exit, which is the correct
exit for "the file truly does not exist yet".

Reported by glm-5.1 via /review on PR QwenLM#3774.

* fix(core): close 4 enforcement gaps + 1 critical bug from 5th Copilot review

Six issues raised by deepseek-v4-pro / glm-5.1 / qwen3.6-plus on
PR QwenLM#3774. Listed by reviewer-assigned severity.

[Critical] (qwen3.6-plus) recordWrite previously only seeded the
read metadata for brand-new entries. The reproduction was real:
ReadFile(limit=10) → WriteFile(full content) → Edit. The partial
read's lastReadWasFull=false would persist through the write, and
the Edit would be rejected with EDIT_REQUIRES_PRIOR_READ even
though the model just authored every byte. recordWrite now
unconditionally refreshes lastReadAt, lastReadWasFull=true, and
lastReadCacheable=true. The fileReadCache.test.ts case that
previously asserted "preserves lastReadAt" is rewritten to assert
the new "refreshes lastReadAt to match the write" contract, and a
new "upgrades lastReadWasFull / lastReadCacheable after a full
write" regression test pins the reproduction reviewer described.

[Suggestion] (deepseek-v4-pro) Narrowed the non-regular-file
bypass in priorReadEnforcement from `!stats.isFile()` to
`stats.isDirectory()`. The earlier broad form covered FIFOs,
sockets, and devices that the model has no legitimate "read first"
recourse for and that can block readTextFile (FIFO) or
over-allocate (/dev/urandom). Those now flow through to
cache.check() and reject with the unread-file path before any I/O.

[Suggestion] (glm-5.1) Removed the `fileExists && ...` gate from
EditTool.calculateEdit, mirroring the f4ef756 fix on WriteFile.
A file that springs into existence between isFilefileExists() and
the enforcement check is now caught here as well; ENOENT inside
checkPriorRead remains the disappearance-race exit and new-file
creation flow is unchanged.

[Suggestion] (deepseek-v4-pro) Added debugLogger.warn() at every
post-read TOCTOU rejection site (Edit calculateEdit, WriteFile
getConfirmationDetails, WriteFile execute). These rejections are
rare and self-healing — without a debug record, an operator
investigating "why did this Edit fail once?" had nothing to grep.
debugLogger uses dedicated 'EDIT_PRIOR_READ' / 'WRITE_FILE' tags.

[Suggestion] (qwen3.6-plus) Added a final pre-write checkPriorRead
in EditTool.execute() and WriteFileTool.execute(). The earlier
post-read check ran inside calculateEdit (Edit) or before mkdirSync
(WriteFile), but the actual writeTextFile call could be arbitrarily
later — user approval, modify-and-confirm, etc. The window from
"post-read check → writeTextFile" is now bounded to "pre-write
stat → writeTextFile" (two adjacent syscalls).

* fix(core): close new-file race + special-file enforcement loop

Three issues from the latest Copilot review on PR QwenLM#3774.

1. New-file race in pre-write enforcement (write-file.ts:348,
   edit.ts:487). The earlier pre-write checkPriorRead was gated on
   `fileExists` (WriteFile) and `!editData.isNewFile` (Edit). If the
   path was absent at planning time and another process created it
   while approval was pending, the gated form would skip enforcement
   and silently overwrite a pre-existing file the model never read.
   Run unconditionally in both tools — checkPriorRead's own ENOENT
   branch is the disappearance-race exit, so genuine new-file
   creation is unaffected, but a race-created file now hits the
   `unknown` branch and is rejected as unread.

2. FIFO / socket / device sent the model into an enforcement loop
   (priorReadEnforcement.ts:220). After narrowing the
   non-regular-file bypass to directories only, FIFOs etc. fell
   through to cache.check, returned `unknown`, and produced a
   "use read_file first" message — but read_file rejects those same
   targets as "not a regular file", so the model would loop on
   read_file forever. Added a dedicated `!stats.isFile()` branch
   (after the directory exemption) that returns a "special file;
   cannot edit/overwrite via this tool — use shell instead" message,
   matching the shape of the existing non-text-payload guidance.

(Tool-error.ts and the non-cacheable policy notes are addressed in
the PR description update — not in code.)

* fix(core): close 4 enforcement gaps from 6th Copilot review

(Plus a doc-only update for the 5th — the mtime+size limitation
warning in the Risk section now mentions the silent-overwrite
escalation that this PR's mutation paths bring along.)

1. ENOENT after the model has already read the file is no longer
   silently treated as `ok: true`. Added an `expectExisting` option
   to `checkPriorRead`; post-read and pre-write callers pass `true`.
   ENOENT under that flag now rejects with `FILE_CHANGED_SINCE_READ`
   ("file disappeared after the model read it") rather than falling
   through to the new-file path with stale bytes. Pre-read callers
   keep the old default (ENOENT → ok:true → fall through to genuine
   new-file creation). EditTool's pre-write check derives the flag
   from `editData.isNewFile`; WriteFile's pre-write check derives it
   from the post-read `fileExists` value.

2. Directory targets now reject with `TARGET_IS_DIRECTORY` and a
   structured message instead of returning `ok: true`. The previous
   form fell through to readTextFile(), which on the WriteFile
   confirmation path threw a plain Error and was surfaced by the
   scheduler as `UNHANDLED_EXCEPTION`. Both Edit and WriteFile now
   emit a structured rejection at enforcement time. (WriteFile's
   build-time validateToolParamValues already rejects directories,
   so the change matters most for EditTool.)

3. Non-cacheable rejection's `rawMessage` no longer hard-codes
   "overwrite" — it now uses the same `verbBare` derivation as the
   `displayMessage`, so EditTool's path correctly says "if you need
   to edit it" and WriteFile's path stays "if you need to overwrite
   it". The previous form was confusing for in-place edits.

4. WriteFile.getConfirmationDetails now mirrors execute()'s
   ENOENT-to-new-file fallback: a file that disappears between
   isFilefileExists() and the readTextFile-for-diff call no longer
   throws a plain Error (which would surface as
   UNHANDLED_EXCEPTION) — it falls back to the brand-new-file diff
   so the user sees a clean confirmation rather than an unstructured
   crash.

Tests:
- New: `rejects an edit on a directory with TARGET_IS_DIRECTORY`
- New: `confirmation falls back to a new-file diff when the file
  disappears mid-flight` (WriteFile)
- Updated: non-cacheable rejection asserts `verbBare` is "edit" on
  the EditTool path and "overwrite" on the WriteFile path.

Reported by Copilot via /review on PR QwenLM#3774.

* docs(core): clarify stat→write race + EDIT_REQUIRES_PRIOR_READ scope

Three doc-only follow-ups from Copilot's latest review pass on PR
QwenLM#3774. None change behaviour; the pre-fix code state was already
the actual contract — the docs just lagged it.

1. EDIT_REQUIRES_PRIOR_READ enum comment now lists the three cases
   the code actually returns it for (never-read, partial / ranged /
   non-cacheable read, structural dead end — non-text payload or
   special file). The previous one-liner described only the first
   case and would mislead future maintainers.

2. The Final pre-write freshness check blocks in EditTool.execute
   and WriteFileTool.execute now spell out that they DO NOT
   eliminate the stat → writeTextFile race. The window narrows
   from the previously-unbounded post-read-to-write gap down to
   two adjacent syscalls, but a concurrent writer landing in
   that pair can still be clobbered. Closing the residual would
   require an atomic write (write-to-temp + rename) or a
   content-hash post-write recheck — both deferred. Operators who
   need strict protection set `fileReadCacheDisabled: true` and
   rely on application-level locking.

3. PR description Risk section gains a "Known unmitigated: stat →
   write race window" subsection (English + Chinese mirror)
   matching the code comments.

* chore(core): minor follow-ups from review #4229917446

Three of the five MINOR items raised in the independent code review
on 2026-05-05 — the cheap, isolated ones. The other two (race-
simulating integration test, moving StructuredToolError out of
priorReadEnforcement.ts) are deferred as the reviewer suggested.

1. EditTool now has a symmetric `PRIOR_READ_VERIFICATION_FAILED`
   regression test (mocks fs.promises.stat to reject with EACCES,
   asserts the EditTool path produces the same fail-closed result
   that the existing WriteFile EACCES test pins). Five-line fix to
   close the asymmetry that, while harmless today (the helper is
   shared), would let a future Edit-side change to checkPriorRead
   slip through without test coverage.

2. ensureParentDirectoriesExist / mkdirSync now run AFTER the
   pre-write checkPriorRead in both EditTool.execute() and
   WriteFileTool.execute(). Doing it before would leak intermediate
   directories on the rejection path — a real (if minor) FS litter
   the previous order created on every rejected new-file write.

3. EDIT_REQUIRES_PRIOR_READ enum docstring gains a one-line note
   for operators routing alerts on this code: a single
   `edit_requires_prior_read` signal can mean any of the three
   cases (no read / partial read / structural dead-end), and if
   per-cause monitoring becomes important the enum can be split
   in a follow-up. The originating tool name and the message text
   already disambiguate at runtime.

* fix(core): close 2 correctness gaps from maintainer review #4232751470

Both tracked back to the cache's "track most recent read shape"
model diverging from prior-read enforcement's "model has seen
these bytes" model.

1. SVG (and similar string-content fallbacks) recorded as
   non-cacheable, blocking subsequent Edit / WriteFile.

   `read-file.ts` derives `cacheable` from
   `originalLineCount !== undefined && !isTruncated`. The SVG
   branch in `fileUtils.ts` returned content without
   `originalLineCount`, so `cacheable` collapsed to false and a
   follow-up Edit hit the dead-end "non-text payload — use shell"
   rejection — telling the model to use shell to mutate a file it
   had just successfully read as text. This was a real regression
   vs pre-PR behaviour where SVG-as-text editing worked.

   Fix: SVG-as-text branch now sets `originalLineCount` (split
   on '\n') and `isTruncated: false`, so ReadFile records it as
   a full cacheable read. The binary-fallback string and
   over-1MB SVG branches are deliberately left non-cacheable —
   they return placeholder strings ("Cannot display content of
   ...") rather than file content, so blocking edits there is
   correct. New regression test in `read-file.test.ts`:
   `records SVG-as-text reads with cacheable=true so a follow-up
   Edit passes enforcement`.

2. recordRead unconditionally overwriting lastReadWasFull /
   lastReadCacheable, revoking prior write-author or full-read
   rights.

   The `WriteFile(create) → ReadFile(offset/limit) → Edit`
   sequence rejected the Edit because the partial read clobbered
   the `lastReadWasFull = true` that `recordWrite` had stamped
   at create time. Same shape applies to a full text read
   followed by a partial one of the same inode.

   Fix: `recordRead` is now sticky-on-true for the read flags —
   `if (opts.full) entry.lastReadWasFull = true;` and the
   matching guard for `cacheable`. Prior `true` survives a later
   partial / non-cacheable read. The fast-path `file_unchanged`
   check still gates on the incoming request's own `isFullRead`
   in `read-file.ts`, so a partial read still does not get a
   placeholder it shouldn't. Updated the existing
   "overwrites earlier lastReadWasFull" test to assert the new
   sticky semantics, and added a `lastReadCacheable` symmetric
   test plus a `Write → partial-Read → Edit` end-to-end test in
   `edit.test.ts`.

Reported by tanzhenxin via independent maintainer review on
2026-05-06.

* fix(core): close 3 correctness gaps from re-review #4233904930

All three are tightenings of the prior `de8ddf530` round.

1. **Sticky-on-true narrowed to "no fingerprint drift"**.
   `fileReadCache.recordRead` previously kept `lastReadWasFull` /
   `lastReadCacheable` true across drifted recordings, which
   re-opened a `Read full @x → external write @y → Read partial
   @y → Edit` hole: the partial recordRead silently advanced the
   entry's mtime+size to Y while preserving the sticky `full=true`
   from X, so a follow-up Edit ran against bytes the model only
   saw the first 10 lines of. Now the sticky branch only fires
   when `(mtimeMs, sizeBytes)` matches the existing entry; on
   drift, both flags reset to exactly what this read produced.
   New regression test in `fileReadCache.test.ts` reproduces the
   reviewer's reported sequence.

2. **Subagent FileReadCache isolation now covers the
   inherits-model + same-approval-mode common case**. The
   own-property machinery from QwenLM#3717 only triggers when an
   `Object.create(parent)` actually fires; both
   `agent.ts:990-993` (same-approval-mode) and
   `subagent-manager.ts:699-701` (inherits-model) had paths that
   returned the parent Config directly, so the subagent's
   `getFileReadCache()` resolved to the parent's instance — a
   parent Read could satisfy the subagent's Edit on a path the
   subagent's transcript never contained. Both sites now build
   a thin `Object.create(base)` override unconditionally; no
   method changes for the inherits / same-mode cases, but a
   distinct instance triggers the lazy-init in
   `Config.getFileReadCache()` so the subagent gets an isolated
   cache.

3. **Cache records the read pipeline's internal stat instead of
   a post-read re-stat**. `processSingleFileContent` now
   surfaces its internal stat via `result.stats`, and read-file
   uses that for `recordRead` instead of running its own stat
   after the read returns. Pre-fix, an external write between
   the pipeline call and the post-read stat let the cache record
   fingerprint Y for content the model received at X — a
   subsequent Edit would pass enforcement against bytes the
   model never legitimately saw. The internal-stat-to-read
   window is still a few microseconds wide; that residue is the
   same content-hash territory acknowledged in the Risk section.

Reported by tanzhenxin via re-review on PR QwenLM#3774.

* docs(core): clarify partial subagent isolation per review #4234090906

tanzhenxin's third review correctly observed that the
`Object.create(parent)` wrappers in `agent.ts:createApprovalModeOverride`
and `subagent-manager.ts:maybeOverrideContentGenerator` only isolate
the FileReadCache for code that consults `Config.getFileReadCache()`
directly. Bound `EditTool` / `WriteFileTool` instances were registered
against the parent's tool registry at initialise time, so tool
invocations still resolve `this.config` to the parent and reach the
parent's cache. `InProcessBackend.createPerAgentConfig` already does
the right thing (`override.createToolRegistry()` +
`copyDiscoveredToolsFrom(base.getToolRegistry())`); bringing that to
these two spawn sites is the real fix.

Reviewer's verdict was COMMENT, not REQUEST_CHANGES — the gap
pre-dates this PR (it's a property of QwenLM#3717's per-Config own-property
machinery) and pre-PR there was no enforcement on subagent mutations
at all, so the PR is strictly an improvement on every spawn path.
Documented the partial guarantee explicitly:

- Inline comments on both spawn sites note the bound-tool caveat
  and point at `InProcessBackend.createPerAgentConfig` as the model
  for the follow-up.
- PR description's subagent paragraph (English + Chinese mirror) now
  splits into "fully isolated" (`InProcessBackend.createPerAgentConfig`)
  and "partial isolation" (the two sites in this PR) so readers don't
  walk away with the wrong contract.

Filing the registry-rebuild work as a follow-up; not in this PR.
B-A-M-N pushed a commit that referenced this pull request May 8, 2026
…wenLM#3831 PR-1 of 3) (QwenLM#3842)

* feat(core): add signal.reason convention for ShellExecutionService.execute()

Foundation for QwenLM#3831 Phase D (b) — Ctrl+B promote of a running foreground
shell to background. Defines a discriminated `ShellAbortReason` union that
the AbortSignal carries; default behavior (no reason / `{ kind: 'cancel' }`)
keeps the existing tree-kill on abort. `{ kind: 'background' }` is a takeover
signal — execute() skips the kill, drops the child from its active set (so
cleanup() won't kill it later), flushes a snapshot of captured output, and
resolves the result Promise immediately with `promoted: true` so the
awaiting caller unblocks.

Pure plumbing: no caller sets the reason yet, so this is a zero-behavior
change for existing call sites. The `promoted?: boolean` field is optional
on ShellExecutionResult so existing consumers compile against the new shape
without source changes.

Tests pin both branches in both childProcessFallback and executeWithPty:
default abort still SIGTERM-tree-kills; `{ kind: 'cancel' }` is identical to
default (pin against accidental routing through the background branch);
`{ kind: 'background' }` skips the kill, snapshot output is preserved,
mockProcessKill / mockPtyProcess.kill are NOT called.

Part of QwenLM#3831 (Phase D part b — Ctrl+B promote running shell to background).
PR-1 of 3.

* fix(core): detach service listeners on background-promote (resolve review)

Addresses 4 Critical + 2 Suggestion findings on PR-1 of QwenLM#3831:

- **childProcess listener detach** (review line 555 + 573): Anonymous arrow
  listeners on stdout/stderr/error/exit could not be off()'d. After
  background-promote, post-promote bytes would re-enter handleOutput, which
  then calls decoder.decode() on a now-finalized text decoder (cleanup()
  already called .decode() without stream:true) → TypeError crash. Even
  without the crash, old onOutputEvent would fire for new data → ownership
  contract violation + duplication. Fix: extract named handler refs
  (stdoutHandler / stderrHandler / errorHandler / exitHandler) and call
  off() on all four in the background-promote branch via a
  detachServiceListeners() helper.

- **PTY listener detach** (review line 967 + 990): node-pty's onData / onExit
  return IDisposable handles; the abort handler now captures
  dataDisposable / exitDisposable and calls .dispose() in the
  background-promote branch. ptyProcess.on('error') is EventEmitter-style
  (not IDisposable) — extract a named ptyErrorHandler ref and off() it.
  Without these, post-promote PTY error throws → Node.js crash; post-promote
  data continues writing to headlessTerminal and calling old onOutputEvent
  → ownership violation.

- **PTY in-flight chain item ownership** (related to review line 990):
  processingChain may have already-enqueued callbacks past the early
  listenersDetached check. Refactored from "early-return short-circuit" to
  "guard each onOutputEvent emit individually" so in-flight writes still
  LAND in headlessTerminal (snapshot reflects them) but no events leak to
  the foreground onOutputEvent. Also clear renderTimeout in the abort
  handler so a pending throttled render doesn't fire post-promote.

- **PTY snapshot freshness** (review line 972, suggestion): The original
  abort handler called serializeTerminalToText immediately. Now we
  await Promise.race([processingChain drain, SIGKILL_TIMEOUT_MS]) first
  (mirrors the onExit finalize pattern at ~line 970) so in-flight
  headlessTerminal.write callbacks land before serialization. Skipped
  render(true) intentionally because it would emit final onOutputEvent
  data (renderFn calls onOutputEvent), violating the "no emit post-promote"
  invariant — added a comment explaining why direct serialize is correct.

- **Handoff-boundary tests** (review line 1257, suggestion): Added 4 new
  tests pinning the ownership contract — 2 for child_process (post-promote
  stdout/stderr does NOT route to onOutputEvent; child exit does NOT
  re-resolve result), 2 for PTY (data/exit disposables ARE called; result
  shape stays promoted: true even if post-promote events fire).

Also: test setup now stubs mockPtyProcess.onData / .onExit to return
{ dispose: vi.fn() } so the background-promote path's dispose() calls
don't crash on undefined (the stub's mock.results[0].value is then
inspected by the new handoff tests).

58 / 58 tests pass (50 baseline + 4 first-pass + 4 handoff). Total +235 / -35
on top of the prior commit.

* fix(core): defensive hardening for ShellExecutionService background-promote (resolve 2nd review pass)

Addresses 6 follow-up [Suggestion] threads on PR-1 of QwenLM#3831 — all
substantive code-quality issues raised by the second-pass review of
the dispose-based detach commit (8e8e18c):

- **Exhaustive switch on `ShellAbortReason.kind`** (both abort handlers).
  Earlier `if (reason?.kind === 'background')` form silently fell
  through to kill for any unrecognized variant — a future
  `{ kind: 'suspend' }` would have killed the process with zero
  compile-time signal. Switched to `switch (kind)` with a `never`-typed
  default that runs `debugLogger.warn` and falls back to the safest
  behavior (cancel/kill). Each branch is now extracted into a named
  helper (`performBackgroundPromote` / `performCancelKill`) so the
  switch body stays a single screenful.

- **Each `dispose()` wrapped in its own try/catch** (PTY). node-pty's
  `IDisposable` contract doesn't guarantee no-throw. Without per-dispose
  try/catch a single throwing dispose() would skip subsequent cleanup
  (the other dispose, off('error'), activePtys.delete, drain, resolve)
  and the caller would hang forever on `await result`. Each call now
  logs via debugLogger.warn on failure but continues.

- **`.catch(() => undefined)` on the processingChain side of the drain
  race** (PTY). `Promise.race([processingChain.then(drain).then(drain),
  timeout])` would propagate a chain rejection out of the race; since
  `addEventListener` doesn't await our handler, the rejection became
  unhandled and `resolve()` was never called → caller hung. Now the
  rejection is swallowed; the timeout side still terminates the race
  on time.

- **Drain-timeout truncation now emits a diagnostic warning** (PTY).
  Previously the 200ms drain timeout could fire, the snapshot would be
  taken with the buffer in mid-write state, and the result.output
  would be silently truncated. Race result is now observed via a
  symbol sentinel; when the timeout side wins, debugLogger.warn fires
  pointing the user at rawOutput as the un-truncated fallback.

- **Snapshot serialize failure logs instead of swallowing silently**
  (PTY). Empty `catch {}` made result.output indistinguishable from
  "command produced no output" if serializeTerminalToText threw. Now
  `debugLogger.warn` with the error message leaves a trail for support
  bundles.

- **Dedicated `PROMOTE_DRAIN_TIMEOUT_MS` constant** separated from
  `SIGKILL_TIMEOUT_MS`. Both are 200ms today, but they have unrelated
  reasons-to-change (kill escalation timing vs. promote drain
  ceiling) — sharing the constant means tuning one would silently
  change the other.

Also adds a module-level `debugLogger = createDebugLogger('SHELL_EXECUTION')`
since the service had no logging surface before this commit.

58 / 58 tests pass; tsc clean; ESLint clean. No new tests added: the new
behaviors (timeout sentinel firing, dispose throw, exhaustive switch
default) are defensive log-only paths; existing handoff tests already
cover the happy path. Adding mock-throw tests is reasonable
follow-up but not blocking.

* fix(core): real bug — ptyProcess.off → removeListener; defensive abort-reason read

Resolves the third review pass on PR-1 of QwenLM#3831 — 1 real bug + 2
defensive hardenings:

- **Real bug: `ptyProcess.off('error', ...)` throws TypeError at
  runtime** (line ~1074). `@lydell/node-pty`'s `IPty` interface
  exposes the legacy Node EventEmitter `removeListener`, not the
  modern `off` alias. Previous form threw, the surrounding try/catch
  swallowed it (post-prior-pass dispose hardening), but the old
  `ptyErrorHandler` stayed registered — so a post-promote PTY error
  would still hit our foreground handler and `throw err`, breaking
  the handoff contract that PR-1's whole listener-detach work is
  supposed to enforce. Switched to `removeListener`. The catch +
  warn stays as defense-in-depth; the message wording is updated.

- **Prototype-pollution-safe `kind` read** (extracted to module-level
  helper `getShellAbortReasonKind`). The previous `reason?.kind`
  walked the prototype chain — a polluted
  `Object.prototype.kind = 'background'` would silently route
  `abortController.abort({})` (any plain object reason) into the
  promote branch and skip the kill. Lifecycle/safety branch deserves
  the extra check. Helper now: rejects non-object reasons; reads
  `kind` only as an OWN property (`hasOwnProperty`); whitelists
  against `'background' | 'cancel'`; defaults to `'cancel'` (the
  safe historical behavior) for everything else. Both abort handlers
  (childProcess + PTY) now share this helper.

- **`streamStdout: true` + background-promote = silent empty
  snapshot** (childProcess `performBackgroundPromote`). The promote
  snapshot reads from the `stdout` / `stderr` string accumulators;
  but in `streamStdout` mode `handleOutput` forwards bytes through
  `onOutputEvent` and skips the accumulators entirely. Today PR-1's
  only call site (foreground shell.ts) uses `streamStdout: false`,
  so the combination is unreachable — but if a future caller pairs
  the two, `result.output` would be empty with no diagnostic. Added a
  `debugLogger.warn` when the combination occurs, pointing the caller
  at `rawOutput` as the fallback. Cheaper than building a parallel
  accumulator just for this latent case.

58 / 58 tests pass; tsc clean; ESLint clean.

* fix(core): liveness check + throw-safe abort-reason read + encoding-aware PTY snapshot (resolve 4th review pass)

Resolves 6 threads on PR-1 of QwenLM#3831 — 1 Critical + 1 real bug + 2
quality + 2 test-coverage:

- **[Critical] `getShellAbortReasonKind` throw-safe property read.**
  Previous form read `reason.kind` after only checking that `kind` is
  an own property. An own accessor that throws (or a Proxy with a
  trapping getter) would throw before the helper reached either the
  cancel kill path or the background promote path. Abort handlers are
  dispatched async and not awaited by AbortSignal, so a leaked throw
  here would have left the shell process alive instead of being killed
  on cancel — quietly. Wrapped the property read in try/catch with a
  fall-back to the safe 'cancel' kill behavior.

- **Real bug: child_process post-exit race in background-promote**
  (`performBackgroundPromote`). The child may have already exited but
  the 'exit' event hasn't reached our handler yet (Node delivers
  events on the next microtask). Promoting in that window would
  detach our exit listener and report `promoted: true` for a process
  that's already dead — the caller would hold an inert pid expecting
  to take over. Now we read `child.exitCode` / `child.signalCode`
  before detaching: if either is non-null, fall through and let the
  pending exit handler resolve normally with the real exit info.
  Mirrored mock setup so `exitCode` / `signalCode` default to `null`
  (matching real ChildProcess) instead of `undefined`.

- **PTY snapshot: re-decode + replay (mirror exit-path encoding).**
  The promoted snapshot was serializing `headlessTerminal` directly,
  which was fed by a streaming decoder initialized from the
  first-chunk encoding heuristic. When early output is ASCII-only but
  later output is in a different encoding (GBK / Shift-JIS / etc.),
  this produces mojibake — and the normal exit path doesn't, because
  it re-decodes `finalBuffer` with `getCachedEncodingForBuffer` and
  replays through a fresh terminal. Now mirrors that logic so
  `result.output` shape matches across the two paths. Direct-serialize
  remains as a last-ditch fallback if replay throws.

- **Switch `default` no longer emits a runtime warn.** Reviewer noted
  the helper's whitelist made the `default: { _exhaustive: never }`
  branch unreachable at runtime — the `debugLogger.warn` in it could
  never fire. Kept the `_: never = kind` type assertion (so a future
  ShellAbortReason variant forces a TS error here, directing the
  developer to extend BOTH the helper's whitelist AND add a `case`),
  removed the unreachable warn. Added a comment that the assertion is
  the static-only safety net the union expansion would trigger.

- **Direct unit tests for `getShellAbortReasonKind`** (8 cases). The
  helper's prototype-pollution defense is the main reason it exists;
  if `hasOwnProperty` is accidentally removed the regression would
  silently send `abortController.abort({})` (any plain reason) into
  the promote path. Exported the helper and added direct tests for:
  null / undefined, non-object, empty object (no own kind), prototype-
  only kind (pollution), unknown kind value, throwing accessor, Proxy
  trap, and the two happy paths.

- **`removeListener` regression guard.** The fix to call
  `ptyProcess.removeListener('error', ...)` instead of `.off(...)`
  matters because `@lydell/node-pty`'s IPty interface only exposes
  `removeListener` — `.off()` throws TypeError on a real PTY but the
  EventEmitter mock tolerates both. Added a test that spies on both
  methods and asserts the production code uses `removeListener` for
  the 'error' event, so a future swap back to `.off()` regresses
  loudly under the mock instead of silently.

68 / 68 tests pass (58 baseline + 9 helper boundary + 1 removeListener
guard + 1 post-exit race); tsc clean; ESLint clean.

* fix(core): PTY background-promote post-exit race guard (resolve 5th review pass)

Mirrors the child_process post-exit race fix from 4cc558b into the
PTY path — addresses 1 [Critical] thread on PR-1 of QwenLM#3831:

The PTY may have already exited but our `exitDisposable` (onExit
callback) hasn't run yet — node-pty delivers the exit event
asynchronously after the PTY's native SIGCHLD, so there's a window
between "PTY actually dead" and "service onExit fires". Promoting in
that window detaches our exit listener and reports `promoted: true`
for a dead PTY, losing the real exit status; the caller would hold an
inert pid expecting to take over.

The IPty interface doesn't expose an `exitCode` field we can read
directly (unlike `child.exitCode` / `child.signalCode` for
child_process), so use `process.kill(pid, 0)` as a best-effort
liveness check via the existing `ShellExecutionService.isPtyActive`
helper. If kill(pid, 0) throws ESRCH, the pid is gone — log at debug
level and fall through, letting the pending onExit callback resolve
normally with the real exit info.

Also adds a unit test mirroring the child_process race test: mocks
`process.kill(pid, 0)` to throw ESRCH on the liveness probe, asserts
the result has no `promoted: true` and reports the real exitCode.

69 / 69 tests pass; tsc clean; ESLint clean.

* docs(core): correct getShellAbortReasonKind boundary-test count in JSDoc

Doc said 'all six edge cases' but the test suite has 8 cases (added
Proxy-trap and undefined later). Off-by-2 cosmetic only — no behavior
change. Caught during a multi-round self-audit of PR-1 of QwenLM#3831.

Audit summary: 7 rounds (correctness / reverse / consistency / coverage
/ build / exception paths / style) found one false-positive (a sync-
abort registration-order race I initially thought existed). Verified
that Node's WHATWG AbortSignal does NOT auto-fire 'abort' listeners
on already-aborted signals, so the race window cannot open. No code
change needed for that scenario; this commit is just the JSDoc fix.

69 / 69 tests still pass; tsc + ESLint clean.

* docs(core): document the helper / union / switch sync invariant explicitly

Multi-round self-audit found that `getShellAbortReasonKind`'s value
whitelist has no compile-time tie to the `ShellAbortReason` union: when
the union grows, TypeScript's `_exhaustive: never` in each switch
forces #3 (the case arm) to be added, but the helper's whitelist
(#2) silently keeps degrading the new variant to 'cancel', and the
new case arm is never reached at runtime.

Reviewer #4 raised this on the second pass; the original commit chose
to accept it (option B in that thread) but didn't leave a strong
in-code signal for future contributors. Added an INVARIANT block
inside the helper enumerating the three sites that must be kept in
sync, so the next person extending `ShellAbortReason` sees the
coupling at the place where they're most likely to forget it.

No behavior change — comment-only. 69 / 69 tests still pass; tsc +
ESLint clean.

Audit summary (this round + prior round): 18 angles total over two
sweeps and one reverse-attack pass. Found:
  - 0 real bugs
  - 1 false-positive race (sync-abort registration order — Node WHATWG
    AbortSignal does NOT auto-fire on already-aborted signals;
    investigated, reverted)
  - 1 cosmetic doc fix (boundary-test count off-by-2)
  - 1 cosmetic INVARIANT block (this commit)

Areas reviewed without finding new issues: caller-side
ShellExecutionResult shape compatibility (optional `promoted?` field,
existing callers spread-untouched); `exited` flag lifecycle
(monotonic, cleanup() idempotent); processingChain in-flight
ownership (listenersDetached guards every onOutputEvent emit
including the renderFn-rendered case via the same flag); race
between exit event and abort handler (both microtasks, FIFO ordering
gives correct outcome either way); Node version dependence
(`AbortSignal.reason` is Node 17.2+, engines: >=20 covers it);
test isolation (mockImplementationOnce + module-level mockProcessKill
clears each beforeEach); `process.kill(pid, 0)` Windows liveness
reliability (best-effort, acceptable for PR-1 plumbing); PID reuse
race on the PTY liveness check (theoretically possible, microsecond
window, unavoidable at the OS level — rejected in spec discussion);
PR-2/PR-3 contract surface (caller MUST attach listeners before
abort — documented; any future caller violating this is its own bug).

* test(core): align mockChildProcess.exitCode/signalCode in second beforeEach

The 'execution method selection' describe block has its own
beforeEach (separate from 'child_process fallback') that builds
mockChildProcess but does not set `exitCode` / `signalCode = null`.
Real Node `ChildProcess.exitCode` / `signalCode` are `null` while the
process is alive — and production now reads these in the
background-promote race guard. The current tests in this block don't
exercise the promote path, so they pass regardless, but any future
promote-related test landing here would silently trip the guard
(`undefined !== null` is true) and fall through to the normal-exit
branch instead of promoting.

Mirror the `child_process fallback` block's mock setup so the two
beforeEach hooks produce equivalent ChildProcess shapes, eliminating
a quiet foot-gun for future contributors.

Comment-only / test-fixture change. 69 / 69 tests still pass; tsc clean.
Found during a deeper third-round self-audit of PR-1 of QwenLM#3831.
B-A-M-N pushed a commit that referenced this pull request May 8, 2026
…Change emit (QwenLM#3919)

* fix(cli,core): isPending gate on subagent scrollback summary + post-delete statusChange emit

Two follow-ups from PR QwenLM#3909 review.

1. **Re-introduce `isPending` gate on `SubagentExecutionRenderer`'s
   scrollback summary** (Copilot finding on PRRT_kwDOPB-92c6AUQHn).
   The verbose inline frame retirement collapsed
   `SubagentExecutionRenderer` to "render the summary whenever a
   subagent reaches a terminal status" — but with `isPending`
   removed in QwenLM#3909, that fired in BOTH live (pendingHistoryItems)
   AND committed (Static) phases. Live-phase rendering duplicated
   the row LiveAgentPanel already paints below the composer until
   the parent turn committed.

   Add `isPending` back to `ToolMessageProps` purely as a gate for
   this one render path: the summary fires only when `!isPending`
   (committed). `ToolGroupMessage` forwards the flag (it kept the
   prop on its own interface for upstream compat the whole time).
   Test gap closed by the new `live (isPending) terminal subagent
   → no scrollback summary (panel owns the row)` case.

2. **Emit `statusChange` AFTER delete in `unregisterForeground`**
   (Copilot finding on PRRT_kwDOPB-92c6AUQGc + the panel-only
   reconciliation it spawned). The shared snapshot in
   `useBackgroundTaskView` only refreshes on `statusChange`, and
   `unregisterForeground` previously fired exactly once — BEFORE
   delete — so the snapshot froze with the agent as "running"
   while `registry.get()` returned undefined. Result:
   `BackgroundTasksDialog` list mode showed a ghost "running" row
   with cancel hints whose `x` was a no-op, contradicting what the
   panel already showed (synthesized neutral terminal).

   Fire `statusChange` a second time AFTER `agents.delete()` so
   snapshot consumers see the registry-less state and stop
   surfacing the agent. The first emit still mirrors
   complete/fail/cancel/finalize ordering (callbacks that re-read
   `registry.get` see the entry); the second emit is the new
   contract for snapshot-based views. React batches the two
   resulting setState calls into one re-render so consumers
   re-render exactly once.

   Updated the existing "emits status change before removing the
   entry" test to capture both emits and explicitly assert that
   the second observes the registry-less state. Added a sibling
   test covering the post-delete `getAll()` count.

Coverage: 190 passing tests across core + cli (background-view +
ToolMessage + ToolGroupMessage + useBackgroundTaskView).

* fix(cli,core): compact-mode terminal subagent expansion + statusChange context flag

Five review findings on PR QwenLM#3919:

1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via
   /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns
   `CompactToolGroupDisplay` before the ToolMessage path when
   `compactMode === true`, so the new `isPending` gate on
   `SubagentExecutionRenderer` only protected the expanded path —
   committed terminal subagents in compact mode never reached
   `SubagentScrollbackSummary` and the LiveAgentPanel → committed-
   summary handoff broke for users who turned compact mode on.

   Force-expand the group when `!isPending` AND any tool call has a
   terminal `task_execution` resultDisplay. Stay compact while the
   parent turn is still live (`isPending`) — the panel below the
   composer owns that surface and an inline summary would
   duplicate it. Coverage: 4 new ToolGroupMessage cases (compact +
   completed-committed expands; compact + running-live stays compact;
   compact + completed-live stays compact; compact + failed-committed
   expands).

2. **Snapshot-coupled comment in `packages/core`** (Copilot,
   background-tasks.ts:292). The comment named CLI/UI consumers
   (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted
   React batching guarantees from a core file. Reword to
   "snapshot-style consumers that re-pull `getAll()` from inside
   the callback" and drop the framework-specific batching claim.

3. **Two-phase emit needed an explicit signal** (Copilot,
   background-tasks.ts:283). Emitting `statusChange` twice without
   distinguishing the phases forced consumers to either do
   duplicate work or risk persisting a stale `entry` from the
   second callback. Add an optional second arg
   `context?: { removed?: boolean }` to
   `BackgroundStatusChangeCallback`; the post-delete emit passes
   `{ removed: true }` so consumers can disambiguate without
   re-querying the registry. Backwards compatible — existing
   callbacks ignore the new arg. Tests updated to assert both
   `mock.calls[0][1] === undefined` and
   `mock.calls[1][1] === { removed: true }`.

4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507).
   Made the default semantics explicit: omitted/undefined is
   treated as committed (not pending); live-area renderers MUST
   pass `true` explicitly to suppress the scrollback summary.

5. (4 of the threads were duplicate Copilot fires of #2 + #3.)

Coverage: 219 test files / 3369 passing across cli/ui + core/agents.

* docs(cli): update ToolGroupMessageProps.isPending JSDoc

The previous prop comment claimed `isPending` was "not consumed by the
group body" — true at the time, but the body now reads it for two real
purposes (compact-mode gating + forwarding to ToolMessage). Update
the doc so future callers / tests don't treat it as legacy.

Addresses Copilot finding on PRRT_kwDOPB-92c6AYE0V.

* fix(cli): hide live-phase subagent tool entries — LiveAgentPanel owns the row

User report: with compact mode OFF, a running subagent shows up
twice — once as the parent tool group's `task` row (status icon +
name + description), once as the LiveAgentPanel row beneath the
composer. Same agent, two surfaces, redundant.

Filter `task_execution` tool entries out of the expanded
`ToolGroupMessage` while `isPending=true` so the panel is the
single source of truth for in-flight subagents. The entry returns
once the parent turn commits (`isPending=false`), letting
`SubagentScrollbackSummary` land inside the parent's tool group
as a persistent audit trail.

Exception: subagents with a pending approval still render, because
the focus-routed banner / queued marker is the only inline surface
that lets users answer the prompt without opening the dialog.

If a group is purely panel-owned (e.g. a single Task call with no
sibling tools), the entire `ToolGroupMessage` returns `null` so
an empty bordered container doesn't float above the panel.

Coverage: +4 ToolGroupMessage cases — running entry hidden in
live phase / mixed group keeps siblings / pending-approval entry
still renders / committed entry comes back for the audit trail.

* refactor(cli): tighten subagent-tool helper naming + ANSI-safe scrollback summary

Self-audit + independent review found 5 cleanup items on the live-phase
hide path; all addressed in one commit since none are behavioral
changes:

1. **Move `allEntriesPanelOwned` short-circuit BEFORE `showCompact`**
   so a pure-subagent group in compact mode is also hidden during the
   live phase (previously CompactToolGroupDisplay rendered a single
   summary line above the panel — a mild duplicate on top of what the
   non-compact path already fixed).
2. **Rename `isLiveSubagentTool` → `isSubagentToolEntry`.** The helper
   identifies a tool's resultDisplay shape; it doesn't check live-state.
   The previous name conflated "predicate" with "use case" and read as
   if it returned true only during the live phase.
3. **DRY up `hasCommittedTerminalSubagent`** to use `isSubagentToolEntry`
   instead of inlining its own type-narrowing.
4. **ANSI-escape `subagentName` / `taskDescription` / `terminateReason`**
   in `SubagentScrollbackSummary`. Same threat model as the panel rows
   and HistoryItemDisplay — these strings come from subagent config
   (user-authored) and LLM output and could carry terminal control
   sequences. The stats fields (tool count / duration / tokens) flow
   through trusted formatters and don't need escaping.
5. **Doc comments updated** to reflect the four real responsibilities
   of `isPending` on `ToolGroupMessageProps` (hide pure groups,
   force-expand committed compact, per-tool filter, forward to
   ToolMessage), to clarify that the keyboard-focused subagent id can
   point at a hidden tool harmlessly (the iterator returns `null`
   before the focus prop is computed), and to drop the redundant
   "EXCEPT" clause on the per-tool filter in favor of a single
   sentence.

Coverage unchanged: 251 passing tests across messages /
background-view / core/agents; broader 3374-test sweep clean; TS
clean on both cli and core packages.

* fix(cli,core): address 3 critical review findings + ANSI/doc cleanups

Three real bugs flagged by gpt-5.5 via /qreview, plus 4 doc /
sanitization nits from Copilot. All 7 threads close together since
they share the same surfaces.

## Critical fixes

1. **Foreground subagents disappeared mid-parent-turn**
   (PRRT_kwDOPB-92c6AYvL9). Post-QwenLM#3921 swap-order, `unregisterForeground`
   drops the entry from the panel snapshot the moment the subagent
   finishes. The previous round's `!isPending` gate on
   `SubagentScrollbackSummary` then suppressed the inline summary
   too, leaving the user with nothing on screen for the run until
   the parent committed.

   - Drop the `!isPending` gate — `unregisterForeground` already
     removes the row from the panel, so the inline summary can fire
     in BOTH live and committed phases without duplicating it.
   - Tighten the `ToolGroupMessage` live-phase hide so it only
     filters `running` / `paused` / `background` task entries
     (`isPanelOwnedSubagentTool`), not terminal ones. Terminal
     entries pass through immediately so the summary lands.
   - The "panel-owned" predicate is now distinct from the broader
     "subagent tool entry" predicate (`isSubagentToolEntry`) and the
     "terminal subagent" predicate (`isTerminalSubagentTool`); each
     usage site picks the one it actually means.

2. **Compact mode dropped the scrollback summary**
   (PRRT_kwDOPB-92c6AYvLw). Force-expanding the group made the
   container go through the expanded path, but `ToolMessage`'s own
   compact-mode gate (`!compactMode || forceShowResult ? renderer
   : 'none'`) still suppressed the result block, so
   `SubagentScrollbackSummary` never rendered for compact-mode
   users. Pass `forceShowResult={true}` for terminal subagent tool
   entries so the result block is always rendered.

3. **`mergeCompactToolGroups.isForceExpandGroup` didn't know about
   terminal subagents** (PRRT_kwDOPB-92c6AYvMC). The committed-
   history preprocessor merged adjacent tool_groups before render,
   so a terminal `task_execution` group could be absorbed into a
   compact batch (its `tool_use_summary` label dropped), and the
   render-time force-expand check never got a chance to override.
   Mirror the `hasCommittedTerminalSubagent` predicate inside
   `isForceExpandGroup` so preprocessing and rendering agree.

## Doc / sanitization nits

- `BackgroundStatusChangeCallback` doc now lists every emitter
  (register / complete / fail / cancel / finalizeCancelled /
  finalizeCancellationIfPending / abandon / unregisterForeground /
  reset) and groups them by ordering camp (keeps-the-entry vs
  removes-the-entry — `reset` joins `unregisterForeground` in the
  delete-then-emit camp).
- ANSI-escape `data.subagentName` in the focus-holder banner and
  the queued marker (`SubagentExecutionRenderer`) — same threat
  model as the panel rows and `SubagentScrollbackSummary`.

## Coverage delta

- New ToolMessage case: live-phase terminal subagent now renders
  inline (replaces the prior "no scrollback summary" assertion that
  was the symptom of the AYvL9 bug).
- New ToolGroupMessage cases: terminal subagent in live phase
  renders inline; `forceShowResult=true` propagates for terminal
  subagent tools (mock now exposes the prop).
- New mergeCompactToolGroups parametrized cases: terminal subagent
  in any of completed / failed / cancelled stays its own batch.

280 tests pass across cli messages + utils + background-view +
core/agents. TS clean.

* fix(cli): drop `'paused'` arm from isPanelOwnedSubagentTool — not in AgentResultDisplay union

CI Lint failed with TS2367: the previous round's
`isPanelOwnedSubagentTool` checked for `status === 'paused'` but
`AgentResultDisplay.status` (the tool-result-side type) only carries
`'running' | 'completed' | 'failed' | 'cancelled' | 'background'`.
The `'paused'` status lives on the registry-side
`BackgroundTaskStatus` union and is only ever surfaced through
`LiveAgentPanel` directly, never through a `task_execution` payload.

Drop the dead arm and add a comment so a future "let's also check
paused here" doesn't get re-introduced.

* fix(cli): apply panel-ownership filter once before compact-mode decision

Mixed live groups (running subagent + sibling tool) leaked the
panel-owned subagent into `CompactToolGroupDisplay`'s count and
`getActiveTool` selection, because `showCompact` returned BEFORE the
inline `.map()` filter ran. Compact-mode users would see e.g.
`task × 2 Delegate task to subagent` even though LiveAgentPanel
already owned the subagent row below the composer.

Derive `inlineToolCalls` once via `useMemo` immediately after the
existing hook block and use it consistently for the compact summary,
sizing math, and the render map. The early-return for
"all-entries-panel-owned" collapses into `inlineToolCalls.length === 0`
(gated on `isPending` so the legacy empty-input committed-phase
snapshot is preserved). Remove the inner `.map()` filter — the
upstream derivation already excluded the same entries.

JSDoc updates:
- `ToolGroupMessageProps.isPending` now describes the real flow
  (build inlineToolCalls / force-expand / forward to ToolMessage for
  parity).
- `ToolMessageProps.isPending` is documented as forwarded-but-inert
  (`SubagentExecutionRenderer` doesn't gate on it; the live-phase
  filter and the unconditional terminal summary do the actual work).

Regression test: live mixed group in compact mode → sibling wins
active-tool, count collapses to 1, no `× 2` suffix, no subagent
description in the header.

Addresses Copilot review comments 3205262972 / 3205263020 (doc/code
mismatch) and gpt-5.5 critical 3205288299 (compact-mode leak).

* fix(cli): force-expand compact groups on terminal subagent in live phase too

Resolved comment 3203286936 codified the design intent that
`SubagentScrollbackSummary` "fires in BOTH live and committed phases"
to bridge `unregisterForeground`'s post-delete panel-snapshot drop
and the parent turn committing. Non-compact mode honored that
contract (terminal subagents render the summary inline whenever they
appear in `inlineToolCalls`), but compact mode still gated
`hasCommittedTerminalSubagent` on `!isPending`, so a foreground
subagent finishing mid-turn under compact mode produced NOTHING
inline until the parent committed — exactly the gap the bridge was
meant to close.

Drop the `!isPending` arm and rename `hasCommittedTerminalSubagent`
→ `hasTerminalSubagent`. The force-expand now applies to terminal
subagents in either phase; compact-mode users see the same outcome
line non-compact users already get. Mirrors
`SubagentExecutionRenderer`'s ungated terminal-summary path and
`mergeCompactToolGroups.isForceExpandGroup`'s no-isPending-gate
preprocessing rule.

Tests:
- Flip "compact mode: live group with completed subagent stays
  compact" → "force-expands so the summary bridges the panel-snapshot
  drop". Update rationale to reflect post-QwenLM#3921 reality (panel evicts
  terminal foreground rows immediately).
- Add "compact mode: live mixed group with terminal subagent +
  sibling force-expands and renders both" — covers the bridge in
  mixed groups.
- Update two stale `hasCommittedTerminalSubagent` cross-references
  in `mergeCompactToolGroups.{ts,test.ts}` comments.
B-A-M-N pushed a commit that referenced this pull request May 8, 2026
…nLM#3892)

* fix(core): close bound-tool gap on runForkedAgent's YOLO wrapper

Follow-up to QwenLM#3873 review (#3 of the three flagged adjacent
Config-wrapper sites). `runForkedAgent`'s AgentHeadless path used to
build its YOLO override via a local `Object.create(parent) +
getApprovalMode = YOLO` helper that did NOT rebuild the tool
registry, so:

1. The YOLO approval mode was silently ignored on the bound-tool
   path — parent's already-bound `EditTool` / `WriteFileTool` /
   `ReadFileTool` resolved `this.config.getApprovalMode()` back to
   the parent.
2. The fork's reads / mutations went through the parent's
   `FileReadCache` instead of a per-fork cache.
3. Memory-extraction and dream-agent paths stack the YOLO wrapper
   over a `getPermissionManager`-overriding scoped wrapper. Since
   the bound tools resolved to the parent, BOTH overrides — the
   YOLO approval mode AND the scoped permission manager — were
   bypassed.

The fix routes through the existing `createApprovalModeOverride`
helper, which:
- rebuilds the tool registry on the wrapper (so bound tools resolve
  `this.config` to the wrapper),
- copies discovered tools from the upstream registry,
- sets the `TOOL_REGISTRY_REBUILT` Symbol marker so any further
  downstream wrapper layer recognises the rebuild and skips
  redundant work.

The memory-extraction / dream-agent composition now resolves
correctly via prototype walk — the YOLO wrapper sits above the
scoped wrapper, so bound tools observe `getApprovalMode() = YOLO`
on the wrapper itself and `getPermissionManager() = scopedPm` one
prototype level up.

Adds a try/finally around the AgentHeadless run so the per-fork
ToolRegistry is stopped after execution — same shape as the spawn
finallys in `agent.ts` and `background-agent-resume.ts`. Without
this, every AgentTool / SkillTool the fork's model later
instantiates leaks its change-listener on shared SubagentManager /
SkillManager.

Adds `forkedAgent.agent.test.ts` covering: marker + YOLO + distinct
registry on the wrapper passed to AgentHeadless.create; bound
EditTool resolves to the wrapper; memory-scoped composition
preserves both YOLO and scopedPm; `stop()` fires after the
AgentHeadless body finishes. Uses `vi.spyOn(AgentHeadless, 'create')`
rather than module mocking so the real `ContextState` /
`AgentEventEmitter` keep working.

`npx vitest run packages/core/src` — 269 files / 6992 passed.

* test(core): cover stop() lifecycle on AgentHeadless.create + execute failure paths

Self-review feedback on QwenLM#3892: the stop lifecycle test only covered
the success path. A future refactor could move the stop() out of
the `finally` block and onto the success branch, reintroducing
listener leaks when create or execute rejects, while every existing
test still passes.

Two new tests pin the cleanup to the `finally`:

1. `stops the per-fork ToolRegistry even when AgentHeadless.create rejects`
   — make `AgentHeadless.create` return a rejected promise; assert
   the rejection propagates and the stop spy still fires once.
2. `stops the per-fork ToolRegistry even when headless.execute rejects`
   — return a headless object whose `execute` rejects; same shape.

Together with the success-path test these three cases cover every
exit edge of the AgentHeadless body.

`npx vitest run packages/core/src` — 269 files / 6994 passed.
B-A-M-N pushed a commit that referenced this pull request May 8, 2026
…Change emit (QwenLM#3919)

* fix(cli,core): isPending gate on subagent scrollback summary + post-delete statusChange emit

Two follow-ups from PR QwenLM#3909 review.

1. **Re-introduce `isPending` gate on `SubagentExecutionRenderer`'s
   scrollback summary** (Copilot finding on PRRT_kwDOPB-92c6AUQHn).
   The verbose inline frame retirement collapsed
   `SubagentExecutionRenderer` to "render the summary whenever a
   subagent reaches a terminal status" — but with `isPending`
   removed in QwenLM#3909, that fired in BOTH live (pendingHistoryItems)
   AND committed (Static) phases. Live-phase rendering duplicated
   the row LiveAgentPanel already paints below the composer until
   the parent turn committed.

   Add `isPending` back to `ToolMessageProps` purely as a gate for
   this one render path: the summary fires only when `!isPending`
   (committed). `ToolGroupMessage` forwards the flag (it kept the
   prop on its own interface for upstream compat the whole time).
   Test gap closed by the new `live (isPending) terminal subagent
   → no scrollback summary (panel owns the row)` case.

2. **Emit `statusChange` AFTER delete in `unregisterForeground`**
   (Copilot finding on PRRT_kwDOPB-92c6AUQGc + the panel-only
   reconciliation it spawned). The shared snapshot in
   `useBackgroundTaskView` only refreshes on `statusChange`, and
   `unregisterForeground` previously fired exactly once — BEFORE
   delete — so the snapshot froze with the agent as "running"
   while `registry.get()` returned undefined. Result:
   `BackgroundTasksDialog` list mode showed a ghost "running" row
   with cancel hints whose `x` was a no-op, contradicting what the
   panel already showed (synthesized neutral terminal).

   Fire `statusChange` a second time AFTER `agents.delete()` so
   snapshot consumers see the registry-less state and stop
   surfacing the agent. The first emit still mirrors
   complete/fail/cancel/finalize ordering (callbacks that re-read
   `registry.get` see the entry); the second emit is the new
   contract for snapshot-based views. React batches the two
   resulting setState calls into one re-render so consumers
   re-render exactly once.

   Updated the existing "emits status change before removing the
   entry" test to capture both emits and explicitly assert that
   the second observes the registry-less state. Added a sibling
   test covering the post-delete `getAll()` count.

Coverage: 190 passing tests across core + cli (background-view +
ToolMessage + ToolGroupMessage + useBackgroundTaskView).

* fix(cli,core): compact-mode terminal subagent expansion + statusChange context flag

Five review findings on PR QwenLM#3919:

1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via
   /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns
   `CompactToolGroupDisplay` before the ToolMessage path when
   `compactMode === true`, so the new `isPending` gate on
   `SubagentExecutionRenderer` only protected the expanded path —
   committed terminal subagents in compact mode never reached
   `SubagentScrollbackSummary` and the LiveAgentPanel → committed-
   summary handoff broke for users who turned compact mode on.

   Force-expand the group when `!isPending` AND any tool call has a
   terminal `task_execution` resultDisplay. Stay compact while the
   parent turn is still live (`isPending`) — the panel below the
   composer owns that surface and an inline summary would
   duplicate it. Coverage: 4 new ToolGroupMessage cases (compact +
   completed-committed expands; compact + running-live stays compact;
   compact + completed-live stays compact; compact + failed-committed
   expands).

2. **Snapshot-coupled comment in `packages/core`** (Copilot,
   background-tasks.ts:292). The comment named CLI/UI consumers
   (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted
   React batching guarantees from a core file. Reword to
   "snapshot-style consumers that re-pull `getAll()` from inside
   the callback" and drop the framework-specific batching claim.

3. **Two-phase emit needed an explicit signal** (Copilot,
   background-tasks.ts:283). Emitting `statusChange` twice without
   distinguishing the phases forced consumers to either do
   duplicate work or risk persisting a stale `entry` from the
   second callback. Add an optional second arg
   `context?: { removed?: boolean }` to
   `BackgroundStatusChangeCallback`; the post-delete emit passes
   `{ removed: true }` so consumers can disambiguate without
   re-querying the registry. Backwards compatible — existing
   callbacks ignore the new arg. Tests updated to assert both
   `mock.calls[0][1] === undefined` and
   `mock.calls[1][1] === { removed: true }`.

4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507).
   Made the default semantics explicit: omitted/undefined is
   treated as committed (not pending); live-area renderers MUST
   pass `true` explicitly to suppress the scrollback summary.

5. (4 of the threads were duplicate Copilot fires of #2 + #3.)

Coverage: 219 test files / 3369 passing across cli/ui + core/agents.

* docs(cli): update ToolGroupMessageProps.isPending JSDoc

The previous prop comment claimed `isPending` was "not consumed by the
group body" — true at the time, but the body now reads it for two real
purposes (compact-mode gating + forwarding to ToolMessage). Update
the doc so future callers / tests don't treat it as legacy.

Addresses Copilot finding on PRRT_kwDOPB-92c6AYE0V.

* fix(cli): hide live-phase subagent tool entries — LiveAgentPanel owns the row

User report: with compact mode OFF, a running subagent shows up
twice — once as the parent tool group's `task` row (status icon +
name + description), once as the LiveAgentPanel row beneath the
composer. Same agent, two surfaces, redundant.

Filter `task_execution` tool entries out of the expanded
`ToolGroupMessage` while `isPending=true` so the panel is the
single source of truth for in-flight subagents. The entry returns
once the parent turn commits (`isPending=false`), letting
`SubagentScrollbackSummary` land inside the parent's tool group
as a persistent audit trail.

Exception: subagents with a pending approval still render, because
the focus-routed banner / queued marker is the only inline surface
that lets users answer the prompt without opening the dialog.

If a group is purely panel-owned (e.g. a single Task call with no
sibling tools), the entire `ToolGroupMessage` returns `null` so
an empty bordered container doesn't float above the panel.

Coverage: +4 ToolGroupMessage cases — running entry hidden in
live phase / mixed group keeps siblings / pending-approval entry
still renders / committed entry comes back for the audit trail.

* refactor(cli): tighten subagent-tool helper naming + ANSI-safe scrollback summary

Self-audit + independent review found 5 cleanup items on the live-phase
hide path; all addressed in one commit since none are behavioral
changes:

1. **Move `allEntriesPanelOwned` short-circuit BEFORE `showCompact`**
   so a pure-subagent group in compact mode is also hidden during the
   live phase (previously CompactToolGroupDisplay rendered a single
   summary line above the panel — a mild duplicate on top of what the
   non-compact path already fixed).
2. **Rename `isLiveSubagentTool` → `isSubagentToolEntry`.** The helper
   identifies a tool's resultDisplay shape; it doesn't check live-state.
   The previous name conflated "predicate" with "use case" and read as
   if it returned true only during the live phase.
3. **DRY up `hasCommittedTerminalSubagent`** to use `isSubagentToolEntry`
   instead of inlining its own type-narrowing.
4. **ANSI-escape `subagentName` / `taskDescription` / `terminateReason`**
   in `SubagentScrollbackSummary`. Same threat model as the panel rows
   and HistoryItemDisplay — these strings come from subagent config
   (user-authored) and LLM output and could carry terminal control
   sequences. The stats fields (tool count / duration / tokens) flow
   through trusted formatters and don't need escaping.
5. **Doc comments updated** to reflect the four real responsibilities
   of `isPending` on `ToolGroupMessageProps` (hide pure groups,
   force-expand committed compact, per-tool filter, forward to
   ToolMessage), to clarify that the keyboard-focused subagent id can
   point at a hidden tool harmlessly (the iterator returns `null`
   before the focus prop is computed), and to drop the redundant
   "EXCEPT" clause on the per-tool filter in favor of a single
   sentence.

Coverage unchanged: 251 passing tests across messages /
background-view / core/agents; broader 3374-test sweep clean; TS
clean on both cli and core packages.

* fix(cli,core): address 3 critical review findings + ANSI/doc cleanups

Three real bugs flagged by gpt-5.5 via /qreview, plus 4 doc /
sanitization nits from Copilot. All 7 threads close together since
they share the same surfaces.

## Critical fixes

1. **Foreground subagents disappeared mid-parent-turn**
   (PRRT_kwDOPB-92c6AYvL9). Post-QwenLM#3921 swap-order, `unregisterForeground`
   drops the entry from the panel snapshot the moment the subagent
   finishes. The previous round's `!isPending` gate on
   `SubagentScrollbackSummary` then suppressed the inline summary
   too, leaving the user with nothing on screen for the run until
   the parent committed.

   - Drop the `!isPending` gate — `unregisterForeground` already
     removes the row from the panel, so the inline summary can fire
     in BOTH live and committed phases without duplicating it.
   - Tighten the `ToolGroupMessage` live-phase hide so it only
     filters `running` / `paused` / `background` task entries
     (`isPanelOwnedSubagentTool`), not terminal ones. Terminal
     entries pass through immediately so the summary lands.
   - The "panel-owned" predicate is now distinct from the broader
     "subagent tool entry" predicate (`isSubagentToolEntry`) and the
     "terminal subagent" predicate (`isTerminalSubagentTool`); each
     usage site picks the one it actually means.

2. **Compact mode dropped the scrollback summary**
   (PRRT_kwDOPB-92c6AYvLw). Force-expanding the group made the
   container go through the expanded path, but `ToolMessage`'s own
   compact-mode gate (`!compactMode || forceShowResult ? renderer
   : 'none'`) still suppressed the result block, so
   `SubagentScrollbackSummary` never rendered for compact-mode
   users. Pass `forceShowResult={true}` for terminal subagent tool
   entries so the result block is always rendered.

3. **`mergeCompactToolGroups.isForceExpandGroup` didn't know about
   terminal subagents** (PRRT_kwDOPB-92c6AYvMC). The committed-
   history preprocessor merged adjacent tool_groups before render,
   so a terminal `task_execution` group could be absorbed into a
   compact batch (its `tool_use_summary` label dropped), and the
   render-time force-expand check never got a chance to override.
   Mirror the `hasCommittedTerminalSubagent` predicate inside
   `isForceExpandGroup` so preprocessing and rendering agree.

## Doc / sanitization nits

- `BackgroundStatusChangeCallback` doc now lists every emitter
  (register / complete / fail / cancel / finalizeCancelled /
  finalizeCancellationIfPending / abandon / unregisterForeground /
  reset) and groups them by ordering camp (keeps-the-entry vs
  removes-the-entry — `reset` joins `unregisterForeground` in the
  delete-then-emit camp).
- ANSI-escape `data.subagentName` in the focus-holder banner and
  the queued marker (`SubagentExecutionRenderer`) — same threat
  model as the panel rows and `SubagentScrollbackSummary`.

## Coverage delta

- New ToolMessage case: live-phase terminal subagent now renders
  inline (replaces the prior "no scrollback summary" assertion that
  was the symptom of the AYvL9 bug).
- New ToolGroupMessage cases: terminal subagent in live phase
  renders inline; `forceShowResult=true` propagates for terminal
  subagent tools (mock now exposes the prop).
- New mergeCompactToolGroups parametrized cases: terminal subagent
  in any of completed / failed / cancelled stays its own batch.

280 tests pass across cli messages + utils + background-view +
core/agents. TS clean.

* fix(cli): drop `'paused'` arm from isPanelOwnedSubagentTool — not in AgentResultDisplay union

CI Lint failed with TS2367: the previous round's
`isPanelOwnedSubagentTool` checked for `status === 'paused'` but
`AgentResultDisplay.status` (the tool-result-side type) only carries
`'running' | 'completed' | 'failed' | 'cancelled' | 'background'`.
The `'paused'` status lives on the registry-side
`BackgroundTaskStatus` union and is only ever surfaced through
`LiveAgentPanel` directly, never through a `task_execution` payload.

Drop the dead arm and add a comment so a future "let's also check
paused here" doesn't get re-introduced.

* fix(cli): apply panel-ownership filter once before compact-mode decision

Mixed live groups (running subagent + sibling tool) leaked the
panel-owned subagent into `CompactToolGroupDisplay`'s count and
`getActiveTool` selection, because `showCompact` returned BEFORE the
inline `.map()` filter ran. Compact-mode users would see e.g.
`task × 2 Delegate task to subagent` even though LiveAgentPanel
already owned the subagent row below the composer.

Derive `inlineToolCalls` once via `useMemo` immediately after the
existing hook block and use it consistently for the compact summary,
sizing math, and the render map. The early-return for
"all-entries-panel-owned" collapses into `inlineToolCalls.length === 0`
(gated on `isPending` so the legacy empty-input committed-phase
snapshot is preserved). Remove the inner `.map()` filter — the
upstream derivation already excluded the same entries.

JSDoc updates:
- `ToolGroupMessageProps.isPending` now describes the real flow
  (build inlineToolCalls / force-expand / forward to ToolMessage for
  parity).
- `ToolMessageProps.isPending` is documented as forwarded-but-inert
  (`SubagentExecutionRenderer` doesn't gate on it; the live-phase
  filter and the unconditional terminal summary do the actual work).

Regression test: live mixed group in compact mode → sibling wins
active-tool, count collapses to 1, no `× 2` suffix, no subagent
description in the header.

Addresses Copilot review comments 3205262972 / 3205263020 (doc/code
mismatch) and gpt-5.5 critical 3205288299 (compact-mode leak).

* fix(cli): force-expand compact groups on terminal subagent in live phase too

Resolved comment 3203286936 codified the design intent that
`SubagentScrollbackSummary` "fires in BOTH live and committed phases"
to bridge `unregisterForeground`'s post-delete panel-snapshot drop
and the parent turn committing. Non-compact mode honored that
contract (terminal subagents render the summary inline whenever they
appear in `inlineToolCalls`), but compact mode still gated
`hasCommittedTerminalSubagent` on `!isPending`, so a foreground
subagent finishing mid-turn under compact mode produced NOTHING
inline until the parent committed — exactly the gap the bridge was
meant to close.

Drop the `!isPending` arm and rename `hasCommittedTerminalSubagent`
→ `hasTerminalSubagent`. The force-expand now applies to terminal
subagents in either phase; compact-mode users see the same outcome
line non-compact users already get. Mirrors
`SubagentExecutionRenderer`'s ungated terminal-summary path and
`mergeCompactToolGroups.isForceExpandGroup`'s no-isPending-gate
preprocessing rule.

Tests:
- Flip "compact mode: live group with completed subagent stays
  compact" → "force-expands so the summary bridges the panel-snapshot
  drop". Update rationale to reflect post-QwenLM#3921 reality (panel evicts
  terminal foreground rows immediately).
- Add "compact mode: live mixed group with terminal subagent +
  sibling force-expands and renders both" — covers the bridge in
  mixed groups.
- Update two stale `hasCommittedTerminalSubagent` cross-references
  in `mergeCompactToolGroups.{ts,test.ts}` comments.
B-A-M-N pushed a commit that referenced this pull request May 8, 2026
…nLM#3892)

* fix(core): close bound-tool gap on runForkedAgent's YOLO wrapper

Follow-up to QwenLM#3873 review (#3 of the three flagged adjacent
Config-wrapper sites). `runForkedAgent`'s AgentHeadless path used to
build its YOLO override via a local `Object.create(parent) +
getApprovalMode = YOLO` helper that did NOT rebuild the tool
registry, so:

1. The YOLO approval mode was silently ignored on the bound-tool
   path — parent's already-bound `EditTool` / `WriteFileTool` /
   `ReadFileTool` resolved `this.config.getApprovalMode()` back to
   the parent.
2. The fork's reads / mutations went through the parent's
   `FileReadCache` instead of a per-fork cache.
3. Memory-extraction and dream-agent paths stack the YOLO wrapper
   over a `getPermissionManager`-overriding scoped wrapper. Since
   the bound tools resolved to the parent, BOTH overrides — the
   YOLO approval mode AND the scoped permission manager — were
   bypassed.

The fix routes through the existing `createApprovalModeOverride`
helper, which:
- rebuilds the tool registry on the wrapper (so bound tools resolve
  `this.config` to the wrapper),
- copies discovered tools from the upstream registry,
- sets the `TOOL_REGISTRY_REBUILT` Symbol marker so any further
  downstream wrapper layer recognises the rebuild and skips
  redundant work.

The memory-extraction / dream-agent composition now resolves
correctly via prototype walk — the YOLO wrapper sits above the
scoped wrapper, so bound tools observe `getApprovalMode() = YOLO`
on the wrapper itself and `getPermissionManager() = scopedPm` one
prototype level up.

Adds a try/finally around the AgentHeadless run so the per-fork
ToolRegistry is stopped after execution — same shape as the spawn
finallys in `agent.ts` and `background-agent-resume.ts`. Without
this, every AgentTool / SkillTool the fork's model later
instantiates leaks its change-listener on shared SubagentManager /
SkillManager.

Adds `forkedAgent.agent.test.ts` covering: marker + YOLO + distinct
registry on the wrapper passed to AgentHeadless.create; bound
EditTool resolves to the wrapper; memory-scoped composition
preserves both YOLO and scopedPm; `stop()` fires after the
AgentHeadless body finishes. Uses `vi.spyOn(AgentHeadless, 'create')`
rather than module mocking so the real `ContextState` /
`AgentEventEmitter` keep working.

`npx vitest run packages/core/src` — 269 files / 6992 passed.

* test(core): cover stop() lifecycle on AgentHeadless.create + execute failure paths

Self-review feedback on QwenLM#3892: the stop lifecycle test only covered
the success path. A future refactor could move the stop() out of
the `finally` block and onto the success branch, reintroducing
listener leaks when create or execute rejects, while every existing
test still passes.

Two new tests pin the cleanup to the `finally`:

1. `stops the per-fork ToolRegistry even when AgentHeadless.create rejects`
   — make `AgentHeadless.create` return a rejected promise; assert
   the rejection propagates and the stop spy still fires once.
2. `stops the per-fork ToolRegistry even when headless.execute rejects`
   — return a headless object whose `execute` rejects; same shape.

Together with the success-path test these three cases cover every
exit edge of the AgentHeadless body.

`npx vitest run packages/core/src` — 269 files / 6994 passed.
B-A-M-N pushed a commit that referenced this pull request May 14, 2026
…e + Agent isolation (QwenLM#4073)

* feat(tools): add generic worktree support (Phase A + B of QwenLM#4056)

Adds first-class git worktree as a general-purpose capability:

Phase A — User-facing tools
- enter_worktree: creates `<projectRoot>/.qwen/worktrees/<slug>` on a
  `worktree-<slug>` branch and returns the absolute path. Slug auto-generated
  when omitted; validated against path traversal and disallowed characters.
- exit_worktree: keeps or removes the worktree (and its branch). Refuses to
  remove a worktree with uncommitted tracked changes or untracked files
  unless `discard_changes: true` is set.

Phase B — Agent isolation
- Agent tool gains an `isolation: 'worktree'` parameter that provisions a
  temporary `agent-<7hex>` worktree, prepends a worktree notice to the task
  prompt, and on completion either removes the worktree (no changes) or
  preserves it and reports its path/branch in the result. Background and
  foreground execution paths both wired up; rejected for fork agents.
- worktreeCleanup.cleanupStaleAgentWorktrees: fail-closed sweep for
  ephemeral `agent-<7hex>` worktrees older than 30 days with no tracked
  changes and no unpushed commits. User-named worktrees are never swept.
- buildWorktreeNotice helper for fork subagents (parity with claude-code).

Arena compatibility
- The existing Arena worktree implementation (GitWorktreeService.setupWorktrees,
  ArenaManager, agents.arena.worktreeBaseDir) is untouched. Arena uses its
  own batch APIs and `~/.qwen/arena` base dir; the new general-purpose APIs
  live alongside under `<projectRoot>/.qwen/worktrees/`.

Subagent safety
- enter_worktree / exit_worktree are added to EXCLUDED_TOOLS_FOR_SUBAGENTS
  so a subagent cannot mutate the parent session's worktree state.

Refs QwenLM#4056

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(worktree): use path.join in expected paths so the test passes on Windows

The Windows CI run reported `enter-worktree.test.ts` failing because the
expected string was hardcoded with `/` while `getUserWorktreesDir()` uses
`path.join`, which returns `\\` on Windows. Build the expected path via
`path.join` so the platform-correct separator is compared.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(enter-worktree): treat empty name as auto-generate

Some models pass `{ "name": "" }` when calling EnterWorktree, because the
schema marks `name` as optional and they emit an empty placeholder. The
previous validation rejected the empty string with "Worktree name must be
a non-empty string", which surprised users running the auto-slug path.

Now both `validateToolParams` and `execute` treat `name: ""` as equivalent
to `name: undefined` and fall back to the auto-generated `{adj}-{noun}-{4hex}`
slug. Explicit invalid slugs (`'../etc'`, `'a/b'`, etc.) are still rejected
as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review findings 1-6 from PR QwenLM#4073

Six issues raised on the initial review; each addressed with a verifiable
guarantee.

1. Real isolation for `agent isolation: 'worktree'`
   Before: subagent's Config still resolved `getTargetDir()` to the parent
   project root, so Edit/Write/Read workspace checks and Shell's default cwd
   silently operated on the parent tree. The cleanup helper then saw a
   "clean" worktree and removed it — destroying the evidence.
   After: the worktree is provisioned BEFORE `createApprovalModeOverride`,
   and the resulting agent Config has `getTargetDir`/`getCwd`/`getWorkingDir`
   rebound to the worktree path. Relative paths, unqualified shell
   commands, and glob/grep roots all confine to the worktree.

2. `exit_worktree action='remove'` now prompts in default/auto-edit modes
   Added `getDefaultPermission()` on the invocation: `'ask'` when action is
   `remove`, `'allow'` when `keep`. Brings it in line with edit, write_file,
   and run_shell_command.

3. Force-delete no longer silently destroys unpushed commits
   `removeUserWorktree` now uses `git branch -d` (refuses unmerged) by
   default and surfaces `branchPreserved: true` when git refuses. Added
   `hasUnmergedWorktreeCommits` (checks if branch tip is reachable from any
   other local branch or remote ref). Both the agent isolation cleanup and
   `exit_worktree action='remove'` use this check: if the branch has work
   not covered elsewhere, the worktree+branch are preserved even when
   `discard_changes: true` is set (there is no `discard_commits` flag —
   committed work is rarely what `remove` means to discard).

4. Both new tools are now deferred behind ToolSearch
   `shouldDefer: true` + `searchHint` on both. Verified via openai-logging:
   `enter_worktree` and `exit_worktree` no longer appear in the function-
   declaration list sent on every API request.

5. Stale-worktree cleanup is wired in
   `Config.initialize()` fires `cleanupStaleAgentWorktrees(targetDir)` as a
   non-awaited startup sweep (skipped in bare mode). Picks up orphaned
   `agent-<7hex>` worktrees left by crashed runs.

6. Foreground isolation no longer leaks on uncaught throw
   The foreground try block tracks whether the cleanup helper ran on the
   success path; the finally block invokes it as a fallback when the try
   bailed early. Mirrors the background path's pattern.

Verification:
- Unit tests: 83 passed (16 worktree + 64 existing agent + 3 cleanup) — no
  regressions.
- E2E #1: agent told to write `hello.txt` via RELATIVE path — file landed
  at `.qwen/worktrees/agent-XXXXXXX/hello.txt`, NOT at the parent root.
- E2E #3: created worktree, committed work inside it, called exit_worktree
  with `discard_changes=true` — refused with clear message; worktree and
  branch both preserved.
- E2E #4: openai-logging confirms worktree tools absent from API tool list
  (7 tools sent instead of 9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 2 findings (1 from tanzhenxin, 7+8 from wenshao)

The first round closed the data-loss-class issues. This round addresses
follow-ups from a deeper audit:

1. Stale-worktree sweep was inert on common-case repos
   `cleanupStaleAgentWorktrees` previously ran `git log --branches --not
   --remotes --oneline` from each worktree's directory — that lists
   unpushed commits across EVERY local branch, not just the worktree's
   own branch. On any repo with no remote configured (or with stray
   unpushed branches), the sweep refused to remove every candidate.
   Replaced with `service.hasUnmergedWorktreeCommits(slug)` which scopes
   the check to the worktree branch via `for-each-ref --contains <tip>`.
   Also added the `branchPreserved` warn log requested in M7 and an
   `fs.access` shortcut for the empty-worktrees-dir case (M8).

2. `cleanupWorktreeIsolation` and `worktreeIsolation` were inside the
   inner try (~660 lines from the outer catch). Hoisted both to the top
   of `execute()` so the outer catch can reap or preserve the worktree
   when anything between provisioning and the inner try throws (e.g.
   `createApprovalModeOverride`, agent creation). Closure carries the
   resolved `repoRoot` so cleanup never has to re-resolve.

3. Background error path discarded the cleanup result. Now captures
   `formatWorktreeSuffix(...)` and appends it to the registry's failure
   /cancel message, so users see the preserved path/branch even when
   the agent crashed before reporting.

4. `cleanupWorktreeIsolation` now treats `result.success === false` as
   "worktree still on disk" and surfaces it as preserved instead of
   silently dropping it from the result.

5. Override was incomplete. Several Config methods read `this.targetDir`
   directly (`getProjectRoot`, `getFileService`, etc.) — own-property
   getter overrides did not redirect them. Now also shadows `targetDir`
   and `cwd` as own properties on the agent's Config override, swaps in
   a `FileDiscoveryService` rooted at the worktree, and rebuilds
   `WorkspaceContext` to point at the worktree only. Verified
   end-to-end: shell `pwd > pwd-record.txt` (no directory arg) lands at
   `.qwen/worktrees/agent-<7hex>/pwd-record.txt`, not the parent root.

6. monorepo subdir issue. Both `enter_worktree` and the agent isolation
   path now resolve `git rev-parse --show-toplevel` first and anchor
   `.qwen/worktrees/<slug>` at the repo root. Worktrees created from
   any subdirectory now end up where the startup sweep can find them.

7. Replaced `git worktree add -B` (silent force-reset of pre-existing
   branches) with `git worktree add -b` plus an explicit existence
   check via `git for-each-ref` (NOT `show-ref --quiet`, which
   simple-git swallows). Pre-existing `worktree-<slug>` branches now
   trigger a clear error instead of clobbering committed work.

8. First worktree creation in a repo writes `<projectRoot>/.qwen/.gitignore`
   with `worktrees/` so worktree contents stay out of the parent's
   `git status`, glob/grep results, and bundle tools. Idempotent: never
   overwrites an existing file.

9. Logging across the failure paths (`enter_worktree` errors,
   `agent.ts:failWorktreeProvisioning`, `cleanupWorktreeIsolation`,
   `hasUnmergedWorktreeCommits` swallowed errors,
   `cleanupStaleAgentWorktrees`'s `branchPreserved` race).

10. `exit_worktree` no longer suggests `discard_changes: true` when the
    git status check itself fails — that would be advising the user to
    bypass a safety check whose precondition is unknown. Now points at
    the underlying repo problem.

11. `generateAutoSlug` switched from `Math.random()` (4 hex, weak RNG,
    one-in-65k collision) to `randomBytes` (6 hex, ~16M combinations).
    Two RNG sources in this file collapsed to one.

Pushed back: the TOCTOU swap in `removeUserWorktree` (S6 round 1) is
left as-is — `git branch -d` is the real safety, and reordering does
not eliminate the window. Windows reserved-name validation (M5 round 2)
deferred to a follow-up; the current allowlist already rejects path
separators, `..`, leading dot/dash, and the >64-char case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): use randomInt to silence CodeQL biased-modulo finding

CodeQL's `js/biased-cryptographic-random` flagged
`randomBytes(4)[i] % ARRAY.length` in `generateAutoSlug`. The math is
actually exact for the current word-list lengths (256 % 8 == 0), but
the lint rule does not know that — and a future contributor changing
the list to a non-power-of-two length would silently introduce bias.

Switched the index lookups to `crypto.randomInt(0, length)`, which uses
rejection sampling and is uniform by construction. Suffix still uses
`randomBytes(3).toString('hex')` since hex encoding is unbiased.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 3 findings 1-6 from PR QwenLM#4073

The previous round added `getRepoTopLevel` for `enter_worktree`'s
provisioning, but missed three sibling call sites that still used the
raw cwd. The double-cleanup race in the foreground path also leaked
stale `[worktree preserved]` suffixes on rejected promises. All six
findings from the deeper audit are addressed:

1. exit_worktree now resolves through `getRepoTopLevel()` before
   building its `GitWorktreeService`, mirroring `enter_worktree`. Without
   this, launching `qwen` from a monorepo subdirectory created the
   worktree under the repo root but exit_worktree looked under the
   subdir's `.qwen/worktrees/` and always returned "Worktree not found".
   Verified end-to-end: enter + exit from `packages/core/` works.

2. agent.ts cleanup helper now nulls `worktreeIsolation` immediately
   after capturing the closure value. The previous structure could
   reach the helper twice — once in the foreground try's success path
   and once in the foreground finally fallback (or once in the inner
   try and once in the outer catch on a thrown rejection). The second
   call would `hasWorktreeChanges()` against a directory the first
   call already removed, fail-closed, and emit a bogus
   `[worktree preserved: <missing path>]` suffix.

3. Config.initialize's startup sweep now resolves `getRepoTopLevel()`
   before invoking `cleanupStaleAgentWorktrees`. Without this, every
   subdir launch scanned a non-existent `<subdir>/.qwen/worktrees/`
   and the 30-day expiry sweep was permanently a no-op.

4. agent.ts's `buildWorktreeNotice` now passes
   `worktreeIsolation.repoRoot` as `parentCwd` instead of
   `this.config.getTargetDir()`. The notice's path-translation
   guidance (≈ "translate paths from <parent> to <worktree>") would
   otherwise misdirect the subagent in a monorepo subdir launch.

5. Removed dead method `GitWorktreeService.listUserWorktrees`. It had
   no callers anywhere in the codebase and used `execSync` in a loop
   (would have blocked the event loop if anyone wired it up).

6. `localBranchExists` no longer swallows git failures silently. The
   defensive `false` default is preserved (so `git worktree add -b`
   itself surfaces the conflict if the check missed an existing
   branch), but the catch now logs via `debugLogger.warn` so disk-full
   / permission / ref-store-corruption cases are visible in debug
   output instead of being invisible.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 4 findings (data-loss + visibility)

Seven actionable findings from a deeper audit, all closed:

1. User worktree slugs could collide with ephemeral-agent shape
   `validateUserWorktreeSlug` did not reject names starting with
   `agent-`, so a user-named `agent-1234567` matched the cleanup regex
   `/^agent-[0-9a-f]{7}$/` and would be silently swept after 30 days
   along with whatever work was in it. Now reserved — clear error
   message points users at the cause.

2. Slug producer and consumer were string-coupled across files
   `agent.ts` hardcoded `agent-${hex(7)}` and `worktreeCleanup.ts`
   independently hardcoded `/^agent-[0-9a-f]{7}$/`. Future change to
   hex length on one side would silently break the other. Lifted
   `AGENT_WORKTREE_PREFIX`, `AGENT_WORKTREE_HEX_LENGTH`,
   `AGENT_WORKTREE_SLUG_PATTERN`, and `generateAgentWorktreeSlug()` to
   `gitWorktreeService.ts`; both call sites import them.

3. Startup sweep was invisible at default log level
   Fire-and-forget sweep used `debug` for errors and discarded the
   success count. A leak-chasing operator had no log breadcrumb.
   Errors promoted to `warn`; successful removals (count > 0) logged
   at `info`.

4. `getRepoTopLevel()` silent catch
   Returned `null` on any git failure with no log. Combined with
   `?? cwd` fallback in callers, a flaky git would have made worktree
   creators and the startup sweep disagree silently about which dir to
   use. Now logs the underlying error.

5. `hasTrackedChanges()` silent catch
   Cleanup's fail-closed `return true` had no log. Couldn't tell
   "has real changes — leave alone" from "git index unreadable — repo
   may be corrupt". Now logs.

6. `cleanupWorktreeIsolation` claimed `preservedPath` for a removed dir
   When `removeUserWorktree` returns `{ success: true, branchPreserved:
   true }` it has already deleted the directory and failed only on
   `git branch -d`. The helper still reported the (now non-existent)
   path as preserved. Now returns only `preservedBranch` for that
   case; `formatWorktreeSuffix` emits a distinct message instructing
   recovery via `git worktree add <new-path> <branch>`.

7. `removeUserWorktree` swallowed branch-delete failures
   Both `-d` and `-D` catch blocks were empty. Locked refs, perms,
   disk full all looked identical to "unmerged commits". Both now
   `debugLogger.warn` with the underlying error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): self-review pass — reuse, parallelism, dead code

Self-review caught a handful of issues across three categories:

Reuse:
- `pathExists` in the new code now uses the existing `fileExists` from
  `utils/fileUtils.ts` instead of duplicating an `fs.access` wrapper.
- `worktree-` branch prefix was string-literalled in five places. Added
  `WORKTREE_BRANCH_PREFIX` and `worktreeBranchForSlug(slug)` exports in
  `gitWorktreeService.ts`; updated `gitWorktreeService.ts`,
  `worktreeCleanup.ts`, and `exit-worktree.ts` to use them. Future
  prefix changes are a single edit.

Efficiency:
- `Config.initialize` used two `await import(...)` calls inside the
  startup-sweep IIFE, paying that cost on every CLI start. Switched to
  static imports at the top of `config.ts` — the modules are tiny and
  the dynamic indirection bought nothing.
- `cleanupWorktreeIsolation` in `agent.ts` ran `hasWorktreeChanges` and
  `hasUnmergedWorktreeCommits` sequentially. They have no data
  dependency on each other and each spawns its own `git` invocation;
  `Promise.all` halves the cleanup wall-clock on the common path.
  Same fix in `worktreeCleanup.ts`'s per-entry loop.
- `ensureWorktreesGitignored` used `fs.access` then `fs.writeFile`, a
  TOCTOU race when two agent invocations created worktrees concurrently
  (both could pass the `access` check and the second would clobber the
  first's `.gitignore`). Now writes with `flag: 'wx'` and treats
  `EEXIST` as the no-op case — atomic in one syscall.

Quality:
- Dropped the `worktreeCleanupRan` boolean in the foreground execution
  path. `cleanupWorktreeIsolation` already nulls its closure variable
  at the top of every call (see the comment at its definition), so
  re-entries are no-ops. The boolean and its tracking were dead weight
  that obscured the real guard.
- Trimmed the Phase-2 override comment block to drop the WHAT-stating
  enumerations (items 3 and 4 just narrated the lines below) and
  removed a navigation comment about hoisted helpers — the helpers are
  visible at the top of the same method.

84 unit tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 5 — design-doc commitments + correctness

Five critical findings + four suggestions, all closed.

Critical:
1. Wrong base branch for agent isolation. `createUserWorktree(slug)` with
   no `baseBranch` arg fell back to `getCurrentBranch()` on the **main**
   working tree, returning `main` regardless of which branch the user
   was actually on. A subagent invoked from `feature-x` would silently
   start from `main` and produce diffs against the wrong baseline.
   `enter_worktree` had the same bug. Both now resolve the parent's
   current branch first and pass it explicitly. Verified end-to-end:
   `git checkout feature-x` → `enter_worktree` → worktree HEAD includes
   the feature-x commit.

2. `countWorktreeChanges` (used by `exit_worktree`'s dirty-state guard)
   missed `status.conflicted[]`. In simple-git that array is mutually
   exclusive with the staged/modified/etc. arrays, so a worktree
   mid-merge with only conflicts looked `{tracked: 0, untracked: 0}`
   to the guard and `action='remove'` would proceed without
   `discard_changes: true`. Added `+ status.conflicted.length`.

3. `exit_worktree` had no session-ownership check, contradicting the
   design doc's "only operates on worktrees created by THIS session".
   In yolo mode a prompt injection could enumerate `.qwen/worktrees/`
   and pass any name to drop another session's work. Now:
   `enter_worktree` and agent isolation write a `.qwen-session`
   marker into the worktree at provisioning time; `exit_worktree
   action='remove'` reads it and refuses if it does not match the
   current `Config.getSessionId()`. Worktrees from before this guard
   (no marker file) are treated as "owner unknown" — allowed with a
   warn log so the change is observable.

4. `enter_worktree` did not refuse nested invocations from inside an
   existing worktree, contradicting the design doc. Now rejects any
   cwd containing `.qwen/worktrees/` as a path component, with a
   clear "Already inside a git worktree…" message. Verified: enter
   from inside a worktree returns is_error with that text.

6. `hasTrackedChanges` (cleanup sweep) had the same `conflicted[]`
   gap. Rewrote to use raw `git status --porcelain --untracked-files=no`
   which lists every tracked change including `UU` conflict markers
   in a single git call and explicitly skips the untracked walk
   (the prior comment claimed to skip it, but `status()` always
   does the scan).

Suggestion:
7. `buildWorktreeNotice` now receives the parent agent's actual
   `getTargetDir()` again (was switched to `repoRoot` in round 3 on
   a different reviewer's suggestion; round-5 caught that the model's
   inherited paths reference the parent's cwd, not necessarily the
   repo root, so the prior behaviour was correct).

8. Startup sweep now does `fs.access(<targetDir>/.qwen/worktrees)`
   *before* importing GitWorktreeService and spawning `git
   rev-parse --show-toplevel`. The git probe is reserved for users
   who actually have a worktrees directory locally — 99% of users
   pay only one syscall on startup.

9. Tests:
   - New `exit-worktree.test.ts` covers metadata, validation,
     `getDefaultPermission` (ask vs allow), and getDescription.
   - `agent.test.ts` adds three `validateToolParams` cases for the
     `isolation` parameter (accepted with subagent_type, rejected
     without, rejected for non-"worktree" values).
   - `enter-worktree.test.ts` adds round-trip tests for
     `writeWorktreeSessionMarker` / `readWorktreeSessionMarker` plus
     a `worktreeBranchForSlug` sanity check.
   - Total: 101 tests pass (was 86 → +15).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): drop unused @ts-expect-error in exit-worktree.test.ts

Empty string `''` is a valid `string` type, so the @ts-expect-error
directive on `validateToolParams({ name: '', action: 'keep' })` did
nothing — TypeScript correctly accepted the line, and `tsc --build`
in CI reported TS2578 ("Unused '@ts-expect-error' directive"). The
runtime assertion already covers the case; the directive was leftover
from an earlier draft.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): use importActual in ArenaManager mock to preserve new exports

The Arena test mocks `gitWorktreeService.js` with a factory that
returns only `{ GitWorktreeService }`. PR QwenLM#4073 added several other
exports to that module (`AGENT_WORKTREE_SLUG_PATTERN`,
`WORKTREE_BRANCH_PREFIX`, `worktreeBranchForSlug`,
`generateAgentWorktreeSlug`, `writeWorktreeSessionMarker`,
`readWorktreeSessionMarker`, `WORKTREE_SESSION_FILE`).

Other modules in the dep graph reach the mocked surface — most
notably `worktreeCleanup.ts` imports `AGENT_WORKTREE_SLUG_PATTERN`
and `worktreeBranchForSlug`, and now reaches the mock via the static
`config.ts` → `worktreeCleanup.ts` import chain added in the
self-review pass. The Arena test failed at module-load with:

  Caused by: Error: [vitest] No "AGENT_WORKTREE_SLUG_PATTERN" export
  is defined on the "../../services/gitWorktreeService.js" mock. Did
  you forget to return it from "vi.mock"?

Use `importOriginal` to capture every real export, spread it into
the return object, and only replace `GitWorktreeService` (the class
the test actually needs to mock). The class-level mock keeps its
existing static-method shims.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 6 (5 critical + 6 suggestions)

The biggest item — #1 — is a self-inflicted regression from round 5:
the new agent- prefix reservation in `validateUserWorktreeSlug`
rejected EVERY slug that `generateAgentWorktreeSlug` produces, since
that helper emits exactly `agent-<7hex>`. Net effect: every
`AgentTool isolation: 'worktree'` invocation failed at validation.
The reservation now allows the canonical pattern through (everything
the helper can produce) and only rejects user-chosen `agent-*` names
that don't match it. Added a round-trip regression guard: 50
`generateAgentWorktreeSlug()` outputs are fed back through
`validateUserWorktreeSlug` and must all pass.

Other critical fixes:

2. `hasWorktreeChanges` (used by agent isolation cleanup) was the
   one remaining caller relying solely on `status.isClean()`.
   Defensive `|| status.conflicted.length > 0` so a future simple-git
   bookkeeping change can't let a mid-merge worktree appear clean and
   get auto-deleted.

3. `readWorktreeSessionMarker` swallowed every I/O error as "marker
   missing", which let a disk error / EACCES silently bypass the
   session-ownership guard. ENOENT is still treated as missing
   (legitimate); every other code now logs.

4. `exit_worktree` `fs.stat` catch was the same shape — every error
   collapsed to "Worktree not found". ENOENT → not found; everything
   else logs and returns a distinct "cannot access" error.

5. `cleanupStaleAgentWorktrees` `fs.stat` catch was again the same.
   ENOENT → silently skip (entry vanished between readdir and stat);
   everything else logs.

Suggestions:

6. Startup sweep fast-bail was running BEFORE resolving the repo
   top-level. For monorepo subdir launches, `targetDir/.qwen/worktrees`
   never exists and the sweep early-returned — permanently a no-op.
   Now resolves the root first, then fast-bails against the resolved
   `<root>/.qwen/worktrees`. Also logs the skip case so operators can
   tell "skipped" from "ran, found nothing".

7. `.qwen-session` marker was visible to `git add -A` inside the
   worktree. Now writes a `.git/info/exclude` rule (resolved via
   `git rev-parse --git-dir`, since worktree `.git` is a file
   pointing at the parent repo's `.git/worktrees/<name>/`).
   Best-effort: failure to write the rule does not abort
   provisioning.

8. Agent isolation now refuses to provision when the parent's cwd is
   already inside a worktree — same regex guard as `enter_worktree`.

9. `exit_worktree`'s wrapper around `hasUnmergedWorktreeCommits` now
   logs at the call site so the chain (caller → reason it asked →
   underlying git error) is complete in operator logs.

10. Sweep now logs unconditionally at `info`. Three distinct messages:
    "skipped (no worktrees dir)", "ran, nothing to remove", "removed N".

Tests:

11. New `execute()` coverage:
    • exit-worktree: session-ownership refusal, keep happy path,
      legacy/no-marker fallthrough with warn log, missing-worktree
      error, unmerged-commits guard with `discard_changes: true`,
      `writeWorktreeSessionMarker` round-trip.
    • enter-worktree: nested-guard rejection, non-git-repo error.
    These spin up real temp git repos (no filesystem mocking) and
    drive the actual tool invocation pipeline.

   Total: 135 tests pass (was 101 → +34).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): demote noise startup-sweep logs to debug

Self-review pass applying the round-6 review-triage framework
(filter #5: "If a log only fires on the happy path, it's noise.")
to my own round-6 changes:

- "Stale worktree sweep skipped: <dir> does not exist" — fires on
  every CLI start for ~99% of users who never use worktrees.
- "Stale worktree sweep ran under <root>: nothing to remove" —
  fires on every CLI start for users who have any worktrees but
  no stale ones at the moment.

Both are happy-path noise at `info`. Demoted to `debug` so an
operator can opt in via `--debug` when they want to confirm the
sweep is wired up, but normal output stays clean.

Only the actually-actionable case ("removed N worktrees") stays at
`info` — that's the signal someone chasing a worktree leak would
grep for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): close AUTO_EDIT bypass + parent-dirty stale-code hazard

Round-7 review caught two correctness gaps:

1. exit_worktree action='remove' was still auto-approved in AUTO_EDIT
   `getDefaultPermission` returning 'ask' is necessary but not
   sufficient. `permissionFlow.isAutoEditApproved` auto-approves any
   tool whose `confirmationDetails.type` is 'edit' OR 'info', and
   `BaseToolInvocation` returns 'info' by default. So a session in
   AUTO_EDIT could silently destroy a worktree (with branch deletion)
   without a confirmation prompt — the data-loss path the round-1
   `'ask'` switch was meant to close. Now overrides
   `getConfirmationDetails` to return `type: 'exec'` for action=remove,
   which keeps the prompt in AUTO_EDIT. The `keep` action still falls
   through to the base info-type since it is non-destructive.

   Regression-guard test asserts the type is 'exec' (not 'info') for
   remove and that the command field describes both the worktree-remove
   and branch-delete operations.

2. Agent isolation worktrees ran against parent's HEAD, not its
   working tree
   `git worktree add -b <branch> <path> <base>` only checks out the
   base ref's tip — uncommitted edits in the parent's working tree do
   NOT propagate. The "edit code → ask review/test agent before
   committing" workflow silently ran the subagent against the
   pre-edit HEAD and returned results that looked authoritative but
   reflected stale code.

   Reviewer offered two options: overlay parent's dirty state à la
   Arena (~50 LOC, edge cases), or refuse isolation when parent is
   dirty (~10 LOC, clear UX). Chose the latter for Phase B scope —
   simpler, decisive, and matches the design-doc's explicit
   commitment that dirty-state overlay is Arena-specific. Users can
   commit/stash before re-invoking agent isolation; overlay can be a
   follow-up if users complain about the friction.

   Fail-closed on the dirty-check itself (assume dirty rather than
   silently launch on a possibly-stale tree).

   Test exercises both "dirty parent → guard fires" and
   "clean parent → guard passes" against real temp git repos.

139 unit tests pass (was 135, +4 regression guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
B-A-M-N pushed a commit that referenced this pull request May 19, 2026
* feat(serve): mutation gating helper and --require-auth

Implements issue QwenLM#4175 Wave 4 PR 15. Adds the centralized
state-changing-route gate that Wave 4 follow-ups (memory CRUD, file
edit, MCP restart, device-flow auth) will reuse, plus the
`--require-auth` deployment knob that hardens the loopback developer
default for shared dev hosts / CI runners.

- `createMutationGate({ tokenConfigured, requireAuth })` factory in
  serve/auth.ts — per-route middleware with a 4-cell behavior matrix:
  pass-through under `requireAuth` or any token configured;
  `401 token_required` for `strict: true` routes on no-token loopback
  defaults; baseline pass-through otherwise.
- Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load,
  resume,prompt,cancel,model}, /permission/:requestId) opt into the
  default non-strict factory call as the centralization marker. Wave 4
  routes will pass `{ strict: true }` to require a token even on
  loopback.
- `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses
  without a token; closes the `/health` exemption when on so loopback
  `/health` also requires bearer auth; stderr breadcrumb so the
  hardened mode is visible in journald/docker logs.
- Conditional `require_auth` capability tag advertised only when the
  flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so
  future per-deployment toggles follow the same shape.
- 5 new unit tests in auth.test.ts covering the gate matrix; 5 added
  in server.test.ts for capability advertisement, conditional tag,
  /health 401 under --require-auth, and runQwenServe boot
  refusal + happy path. 245/245 serve tests pass; typecheck + eslint
  clean.

Refs: QwenLM#4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR QwenLM#4236 review feedback

Three small follow-ups from the automated reviewers on PR QwenLM#4236:

1. **Drop misleading `--require-auth` from `token_required` error
   message** (Copilot inline auth.ts:262). The strict-mode 401
   listed three remediations but `--require-auth` is paired-required
   with a token at boot — naming it standalone would loop the operator
   into a different boot error. Keep the two valid standalone fixes
   (env var, --token); add inline note explaining the omission.
   `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)`
   to anchor the new wording.

2. **Mention `/health` gating in `--require-auth` CLI description**
   (auto-reviewer Medium #2). Operators flipping the flag without
   reading the protocol doc would get paged when k8s/Compose probes
   start 401-ing. One sentence in the yargs description prevents that.

3. **Drift insurance comment between registry and
   `CONDITIONAL_SERVE_FEATURES`** (auto-reviewer Low #3). Document
   the four-step procedure for adding a new conditional tag so a
   future contributor doesn't update only the registry and silently
   advertise the tag unconditionally. Notes the Map<predicate>
   refactor as the right move when a second tag lands.

Deferred (not in this fix-up):
- Module-level PASSTHROUGH singleton (High #1) — micro-optimization,
  unmeasurable.
- Map<feature, predicate> for conditional features (High #2) —
  premature abstraction with one tag.
- Per-route `// non-strict marker` comments (Medium #1) — noise.
- `@see` cross-ref in types.ts (Low #2) — sugar.
- JSDoc bullet-list vs table (Low #1) — current format is fine.

Refs: QwenLM#4175 QwenLM#4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR QwenLM#4236 round-2 review feedback

Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review)
on PR QwenLM#4236:

1. **Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES`**
   (review threads #3254467192 + #3254485912). Two reviewers asked
   for the same shape on the grounds that the `Set` + per-feature
   `if`-branch needed FOUR coordinated changes per new conditional
   tag and silently fail-CLOSED when the branch was missed. The Map
   collapses the predicate-decision and the set-membership into one
   entry per feature — adding a new conditional tag is now two
   coordinated changes (registry + Map entry) and a missing predicate
   is a TypeScript error rather than a silent omission. JSDoc
   updated.

2. **Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES`**
   (review thread #3254467192 option 1, layered on top of #1).
   `server.test.ts` now walks every Map entry and asserts the
   predicate accepts/rejects as expected; future entries that don't
   add an assertion branch fail the test loudly so a missing
   predicate cannot ship silently. Adoption-of-record for the Map
   shape rather than relying on a hand-maintained invariant.

3. **Cache `strictDenier` for allocation symmetry** (review thread
   #3254467193). Wave 4 PRs will mount strict mode on multiple
   routes; without the cache each `mutate({strict:true})` call would
   allocate a fresh 401 closure. Now both the passthrough and the
   strict denier are pre-built singletons. Identity assertion in
   `auth.test.ts` anchors the cache so a future change that loses it
   surfaces in CI.

4. **Doc cosmetic — extra blank line in qwen-serve.md** (review
   thread #3254467198). Single blank line between the `>` quoted
   example and the following non-quoted bash block now.

5. **Doc correctness — `require_auth` is post-auth confirmation**
   (review thread #3254485910 from DeepSeek). When `--require-auth`
   is on, the global `bearerAuth` middleware gates every route
   including `/capabilities`, so an unauthenticated client cannot
   pre-flight `caps.features` to discover that auth is required —
   the discovery surface is the 401 response body itself. Both
   `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to
   describe the tag as a post-authentication confirmation, matching
   the auth.ts JSDoc which already stated this correctly.

Trade-offs documented (no code change):

- **Body-parser ordering** (review thread #3254485915 from DeepSeek)
  noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER
  `express.json()` because the gate is per-route middleware. On
  loopback no-token defaults a strict route therefore parses the
  request body before refusing it — bounded by
  `express.json({limit: '10mb'})` × `--max-connections` (256
  default). Strict routes Wave 4 actually adds carry small bodies in
  legitimate use, so this isn't a production hot path. Future routes
  accepting large bodies should lift the gate to app-level (maintain
  a strict-path Set in `createServeApp`); flagged as a Wave 4
  follow-up rather than re-architecting the helper.

- **`bearerAuth` body-shape inconsistency** (review thread
  #3254467197 from @wenshao) flagged as a Wave 4 cross-PR
  follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while
  the strict gate returns `{code: 'token_required', error: '...'}`;
  SDK clients have to branch on both shapes. Standardizing
  `bearerAuth` to also carry a `code` field is orthogonal to this
  PR's scope.

Validation: 260/260 cli serve tests pass (was 258 — added the drift
insurance test + strict denier identity test); typecheck + eslint
clean.

Refs: QwenLM#4175 QwenLM#4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
B-A-M-N pushed a commit that referenced this pull request May 20, 2026
…#4247)

* feat(serve): MCP client guardrails (QwenLM#4175 Wave 3 PR 14)

Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`.

Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked.

* fixup(serve): address PR 14 review (QwenLM#4247) findings 1-7

Addresses Codex + Copilot review feedback on QwenLM#4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision).

1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode.
2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell.
3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children.
4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag.
5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot.
6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract.
7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code.

Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve).

* docs(serve): clarify v1 snapshot-based budget warning detection (QwenLM#4247)

Address github-actions review-summary finding (I) on PR QwenLM#4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts.

* fixup(serve): address PR 14 review round 2 (QwenLM#4247 wenshao)

Addresses @wenshao review on PR QwenLM#4247. Three critical safety fixes + four suggestion-level improvements.

Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime):
- C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot.
- C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched).
- (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011.)

Suggestion:
- S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement.
- S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`.
- S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected).
- S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs.

(S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.)

+ 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean.

* fixup(serve): address PR 14 review round 3 (QwenLM#4247 wenshao second pass)

Addresses @wenshao's second review pass on PR QwenLM#4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications.

Code:
- R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic.
- R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row.
- R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file.
- R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3fe S4 — reply-only on the new thread).

Docs:
- R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap.
- R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit.
- R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window.

+ 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 4 (QwenLM#4247 wenshao second + third pass)

Addresses @wenshao's second + third review passes on PR QwenLM#4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads.

Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs):
- Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect.
- Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside.
- Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel + ⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint).

Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929:
- Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity.
- Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode.
- Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect).
- Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation).

+ 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 5 (QwenLM#4247 wenshao fourth pass)

Addresses @wenshao's fourth review pass on PR QwenLM#4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads).

Critical fixes:
- R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test.
- R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test.

Reviewer findings deferred / already done (replied + resolved):
- R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3fe added the try/catch with the weReservedSlot cleanup pattern.
- R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3fe renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic.
- R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug.

46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean.

* fixup(serve): address PR 14 review round 6 (QwenLM#4247 wenshao fifth pass)

Addresses @wenshao's fifth review pass on PR QwenLM#4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak).

Critical fixes:
- R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside.

- R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface:
  - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`.
  - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state.
  - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub).
  - `runQwenServe` closes over per-handle overrides; never touches `process.env`.

+ 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 7 (QwenLM#4247 wenshao sixth pass)

Addresses @wenshao's sixth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor.

Critical:
- R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log.

Real bugs:
- R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method.
- R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup.
- R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now.
- R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`.

Polish:
- R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log.
- R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals.

+ 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 8 (QwenLM#4247 wenshao seventh pass)

Addresses @wenshao's seventh review pass on PR QwenLM#4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred.

Critical:
- R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports.

Soundness:
- R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case).

- R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth.

- R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught.

Deferred:
- R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup.

+ 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 9 (QwenLM#4247 wenshao eighth pass)

Addresses @wenshao's eighth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination).

Critical / real bugs:
- R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket.
- R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior.
- R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern.

Operator-visibility:
- R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern.

Doc fixes:
- R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one.
- R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both.

Not adopted:
- R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding.
- R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813) had already closed. Reviewer was on the pre-R8 commit view.

+ 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean.

* fixup(serve): address PR 14 review round 10 (QwenLM#4247 wenshao ninth pass)

Two non-blocking 🟢 nits — both adopted for symmetry / explicitness.

- R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths.
- R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since QwenLM#3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change.

57/57 core tests pass. Typecheck + lint clean.
B-A-M-N pushed a commit that referenced this pull request May 20, 2026
…M#4255)

* feat(serve): auth device-flow route

Implements issue #4175 Wave 4 PR 21. Brokers OAuth 2.0 Device
Authorization Grant (RFC 8628) through the `qwen serve` daemon so a
remote SDK client can trigger a Qwen-account login whose tokens land
on the **daemon** filesystem, not on the client. The daemon polls the
IdP itself; the client's only job is to display the verification URL +
user code.

Runtime locality (#4175 §11): the daemon NEVER spawns a browser or
calls `open(url)` — even when running locally. Static-source grep
test fails the build on `node:child_process` / `open` / `xdg-open` /
`shell.openExternal` / `execa` / `shelljs` / `process.spawn` and
their dynamic-import / require variants.

- `POST /workspace/auth/device-flow` — strict mutation gate; returns
  201 fresh / 200 idempotent take-over with `attached: true`. Per
  per-`providerId` singleton: a second POST while pending takes over
  rather than allocating a new `device_code`.
- `GET /workspace/auth/device-flow/:id` — public state read. Pending
  entries echo `userCode/verificationUri/expiresAt/intervalMs`;
  terminal entries (5-min grace) drop them and surface
  `status/errorKind/hint`.
- `DELETE /workspace/auth/device-flow/:id` — strict; idempotent
  (terminal → 204 no-op; unknown → 404).
- `GET /workspace/auth/status` — pending flows + supported providers
  snapshot. v1 stub for `providers: []` (populated in fold-in 1).

`DeviceFlowRegistry` (`packages/cli/src/serve/auth/deviceFlow.ts`)
is the in-memory state holder:
- per-`providerId` singleton with idempotent take-over
- workspace-wide cap of 4 active flows (abuse defense)
- 5-min terminal grace so SDK reconnects can still observe results
- TTL sweeper evicts grace-expired entries every 30s
- in-flight `Promise` map coalesces concurrent `start()` calls so two
  parallel POSTs don't double-allocate IdP `device_code`
- `transitionTerminal` returns `boolean` so caller-side emit/audit
  guard prevents sweeper × poll-tick double-fire
- `dispose()` wired into `runQwenServe.close()`'s shutdown drain;
  cancels `provider.poll()` mid-flight via `cancelController`,
  records `lost_success` audit when an IdP-minted token is dropped
  by transition

`DeviceFlowProvider` interface accepts `start({signal})` +
`poll(state, {signal})`. `QwenOAuthDeviceFlowProvider` wraps the
existing `QwenOAuth2Client.requestDeviceAuthorization` /
`pollDeviceToken` primitives directly (NOT
`authWithQwenDeviceFlow`, which calls `open(url)`). PKCE is
provider-required by Qwen but optional in the interface for future
non-PKCE providers. `success.persist()` writes to disk FIRST, then
updates the in-process client — a failed disk write no longer
leaves the daemon with a zombie in-memory token. Maps RFC 8628
errors via an anchored regex (`^Device token poll failed:
(expired_token|access_denied|invalid_grant)`) so an
`error_description` containing one of those literals can't
mis-classify an unrelated upstream error.

`BrandedSecret<T extends string>` holds the `device_code` and PKCE
verifier. Earlier draft used `new String()` wrapper which leaked
through `+` / template literals (`Symbol.toPrimitive` →
`valueOf` returned the primitive). Final shape: frozen plain object
+ `WeakMap` indirection + 4-way redaction
(`toString` / `toJSON` / `Symbol.toPrimitive` / numeric coercion →
`'[redacted]'` or `NaN`) + `unique symbol` brand. 6 leak-path
tests: `JSON.stringify` / `String()` / concat / template / `+x` /
reveal-roundtrip.

5 new daemon events (workspace-scoped, fanned out to every active
session bus via `bridge.broadcastWorkspaceEvent`):

- `auth_device_flow_started` — `{deviceFlowId, providerId, expiresAt}`
  (no userCode/verificationUri — see PR 21 design §3)
- `auth_device_flow_throttled` — `{deviceFlowId, intervalMs}`,
  emitted only on upstream `slow_down` interval bumps
- `auth_device_flow_authorized` — `{deviceFlowId, providerId,
  expiresAt?, accountAlias?}`; `accountAlias` is best-effort
  non-PII (never email/phone)
- `auth_device_flow_failed` — `{deviceFlowId, errorKind, hint?}`
  with `errorKind ∈ {expired_token, access_denied, invalid_grant,
  upstream_error, persist_failed}`
- `auth_device_flow_cancelled` — `{deviceFlowId}` (DELETE on pending)

Workspace-scoped reducer `reduceDaemonAuthEvent` produces
`DaemonAuthState { flows: Partial<Record<ProviderId, ...>> }` —
parallel to `reduceDaemonSessionEvent`. Session reducer no-ops on
auth events (workspace-scoped state belongs in its own reducer).

`bridge.broadcastWorkspaceEvent` is intentionally distinct from PR
16's `publishWorkspaceEvent` to avoid merge conflict; collapses to
the shared helper as a fold-in once #4249 lands (~25 LoC).

`@qwen-code/sdk` (`packages/sdk-typescript/`):

- 4 new `DaemonClient` methods: `startDeviceFlow`, `getDeviceFlow`,
  `cancelDeviceFlow`, `getAuthStatus` — typed against the wire
  shapes, errors mapped through the existing `DaemonHttpError`.
- High-level `client.auth` getter (lazy `DaemonAuthFlow` singleton)
  exposes a `start(...).awaitCompletion()` shape mirroring `gh auth
  login`'s UX: print code first, let the SDK consumer decide where
  to open the browser. `awaitCompletion` polls GET on the
  daemon-supplied `intervalMs`, honors `slow_down` bumps, and
  fall-back-recovers from 404 (entry evicted post-grace).

POST + DELETE flow through PR 15's `mutate({strict: true})` —
401 `token_required` on token-less loopback defaults. GET routes
use only the global `bearerAuth`. Every state transition
(`started/authorized/failed/cancelled/expired/lost_success`)
records a structured stderr breadcrumb (`[serve] auth.device-flow:
provider=... deviceFlowId=abc12... clientId=... status=...`)
since `mutate()` doesn't carry an audit hook — events alone aren't
enough since SDK can silently drop them; stderr → journald/docker
logs is the unfalsifiable record.

`auth_device_flow` advertised unconditionally on
`/capabilities.features`. Supported providers list lives on
`/workspace/auth/status` to keep the registry descriptor uniform.

- `packages/core/src/qwen/qwenOAuth2.ts`:
  - exports `cacheQwenCredentials` (was a private function; needed
    by the daemon's device-flow registry)
  - `cacheQwenCredentials` now calls `SharedTokenManager.clearCache()`
    after writing, folding what was previously a paired call site at
    L820+L829. Idempotent change.
  - file mode `0o600` on `oauth_creds.json` (was default 0o666 +
    umask). Mirrors opencode's `auth/index.ts`.
- `packages/cli/src/serve/runQwenServe.ts`: device-flow registry
  `dispose()` wired into the shutdown drain (BEFORE
  `bridge.shutdown()`).

- `auth/deviceFlow.test.ts` — 21 tests: BrandedSecret leak paths,
  state machine (slow_down / success / error), terminal grace,
  concurrent-start coalescing, dispose, cancel idempotency, static-
  source grep against browser-spawn primitives.
- `server.test.ts` — 10 device-flow integration tests:
  POST 201/200 take-over, strict 401, 400 `unsupported_provider`,
  GET / DELETE / `/workspace/auth/status`, 502 `upstream_error`
  mapping, sweeper-driven auto-expiry with controlled clock,
  capability advertisement.
- `daemonEvents.test.ts` — 5 SDK reducer tests: type guards, per-
  provider state projection, `failed` always → `status: 'error'`
  (errorKind carries the kind, including new `persist_failed`),
  session reducer no-ops on auth events.

369/369 serve + SDK tests pass; typecheck + `eslint
--max-warnings 0` clean across 14 PR 21 files.

- [x] Independently mergeable (depends only on merged PR 4 / PR 7 /
      PR 12 / PR 15)
- [x] Backward compatible (4 new routes + 1 capability tag + 5 typed
      events + 4 SDK helpers; existing routes/events untouched)
- [x] Default off (capability advertised but no client is forced to
      use it; CLI `qwen` OAuth flow unchanged)
- [x] `qwen serve` Stage 1 routes / SDK behavior preserved
- [x] Gradual migration (v1 only `qwen-oauth`; future providers
      register through the `DeviceFlowProvider` interface)
- [x] Reversible (revert removes 4 routes + 1 tag + 5 events with no
      schema migration)
- [x] Tests-first (28 new tests across 3 layers)

- Inline `bridge.broadcastWorkspaceEvent` → fold-in to PR 16 (#4249)
  `publishWorkspaceEvent` once that lands
- `/workspace/auth/status` vs PR 12 `/workspace/providers` boundary
  — separate route in v1; merge alternative discussed
- Wave 4 PRs 17/19/20 should adopt the same mutate-strict +
  workspace event-fan-out pattern

5 items from pre-PR specialist passes parked for a focused
follow-up: `DeviceFlowEntry` discriminated union, single-source SDK
status / ProviderId unions, `awaitCompletion` memoization,
broadcast-100%-fail stderr elevation, SDK 404 →
`not_found_or_evicted` errorKind.

Refs: #4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 round-1 review feedback

Eleven items from copilot-pull-request-reviewer's round-1 pass on
#4255 — 4 inline threads + 7 from the PR-level review summary.

## Adopted (11 items, code/doc changes)

- **`lastSeenAt` → `lastSeenEventId`** (`events.ts`,
  `DaemonDeviceFlowReducerState`). The field was set from
  `rawEvent.id` (SSE event id) but documented as "epoch ms" — a real
  semantic mismatch that would mislead consumers into time-based
  logic against a monotonic counter. Rename + tighten the JSDoc to
  describe it as an event-id counter; reducer cases updated.
- **`DEVICE_FLOW_EXPIRY_GRACE_MS = 30_000` extracted** in
  `DaemonAuthFlow.ts` (was a magic number on `start.expiresAt +
  30_000`). `AwaitCompletionOptions.timeoutMs` doc now describes the
  actual grace-past-expiry behavior + the rationale (clock skew +
  daemon sweeper interval + network latency) instead of the wrong
  "defaults to expiresAt - Date.now()" claim.
- **Explicit `chmod 0o600`** in `cacheQwenCredentials` after every
  write. `fs.writeFile`'s `mode` only applies on file creation; a
  pre-existing `oauth_creds.json` written under a broader umask kept
  its old permissions across upgrades. The chmod now tightens it on
  every write; chmod failure (Windows / hardened FS) surfaces via
  `debugLogger.warn` instead of silently dropping the invariant.
- **`SharedTokenManager.clearCache()` failure now logs**
  `debugLogger.warn` (was a silent `try { } catch { }`). In
  production a swallowed clearCache means in-process callers serve
  stale credentials until the SharedTokenManager mtime watcher
  catches up — a recoverable degradation worth a log line.
- **Protocol doc** lists `persist_failed` in the
  `auth_device_flow_failed.errorKind` union (was added to the type
  but missed in the doc).
- **`pollDeviceToken({signal})`** plumbed through
  `IQwenOAuth2Client` interface + `QwenOAuth2Client` impl + the Qwen
  device-flow provider. Cancel / dispose during a slow IdP response
  now aborts the in-flight HTTP socket immediately instead of
  waiting for the upstream timeout. Two new registry tests assert
  `cancel()` / `dispose()` propagate abort to the signal observed by
  `provider.poll`.
- **`revealSecret` error message** clarified: was "secret has been
  GC-evicted" (impossible — WeakMap doesn't evict reachable keys).
  Now points at the actual reachable failure modes (forged shape /
  serialize+reparse losing the WeakMap binding).
- **`transitionTerminal` JSDoc** clarifies that the PRIMARY guard
  against late timer secret leaks is the `entry.status !== 'pending'`
  check at the top of `runPollTick`; secret-clearing here is
  defense-in-depth.
- **`DeviceFlowErrorKind` JSDoc'd per variant** so consumers can tell
  when each fires (RFC 8628 distinctions + `persist_failed` vs
  `upstream_error` boundary).
- **Stale "PR 16 / PR 21 §3" temporal references** in
  `DaemonAuthFlow.ts:124` rephrased to be timeless ("workspace-scoped
  events fan out through whatever session buses happen to be live"
  — no PR number references that rot when those PRs merge).

## Not adopted (4 items, replied to in-thread)

- **`authWithQwenDeviceFlow` browser-launch separation** — correct
  architectural advice but out of #4255 scope (would refactor a CLI
  auth UX module that PR 21 only touched additively). Tracked as a
  Wave 5 follow-up.
- **Copyright header year range** — repo-wide convention "2025"; not
  introduced by this PR.
- **Spread `...(x ? {x} : {})` → `x: x ?? undefined`** — the two are
  not semantically equivalent. The current form omits the key
  entirely on falsy `x`; the suggested form always includes the key.
  Tests assert object shape and would break under the change.
- **Eager `client.auth` getter** — public API boundary. Lazy
  construction matches `DaemonSessionClient` precedent + saves the
  module load for SDK consumers that never touch auth.

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-1 review feedback

15 items from @wenshao's review batches on #4255. Catches a handful
of real bugs that the earlier round (commit 3d9f082f5) didn't
surface.

## Critical fixes

- **C1 — `pollUntilTerminal` providerId pass-through**
  (`DaemonAuthFlow.ts:185`). The synthetic 404 fallback hardcoded
  `providerId: 'qwen-oauth'`; the parent `awaitCompletion` already
  receives the real providerId via `start.providerId` but
  `pollUntilTerminal`'s parameter type stripped it. Add the field to
  the param type, propagate.
- **C2 — open `errorKind` allowlist** (`events.ts`). The closed
  5-value union in the type guard silently dropped any `failed`
  event whose errorKind the daemon added without mirroring SDK-side
  (e.g. a future `rate_limited`). The flow's reducer state would
  never transition to terminal, leaving SDK consumers stuck on
  `pending` forever. Open the union with `(string & {})` and accept
  any non-empty string in the runtime guard. Updated test asserts
  forward-compat behavior + still rejects the truly-malformed
  empty-string case.
- **C3 — `persist()` timeout + signal**
  (`deviceFlow.ts`). A wedged disk I/O (NFS stall, encrypted-volume
  contention) without bounds would pin the entry in `pending` until
  the upstream `expires_in` elapsed (potentially minutes). The
  registry now passes its `cancelController.signal` AND arms a hard
  `DEVICE_FLOW_PERSIST_TIMEOUT_MS = 30_000` timer; persist failure
  surfaces as `persist_failed` immediately. The
  `DeviceFlowPollResult` `success` variant signature changed to
  `persist({signal})`.
- **C4 — cancel × success race rollback**
  (`deviceFlow.ts` + Qwen provider). Today, if `cancel()`
  transitions while `persist()` is in flight, the credentials get
  written but the flow's status is `cancelled`. User sees cancelled,
  daemon disk has a valid token. `DeviceFlowPollResult.success`
  gains an optional `unpersist()` callback the registry calls when
  `transitionTerminal(authorized)` fails — the Qwen provider wires
  it to `clearQwenCredentials()`. Rollback failure is audited but
  not propagated (re-running auth would overwrite anyway).
- **C5 — don't `unref()` the `awaitCompletion` sleep timer**
  (`DaemonAuthFlow.ts`). On a standalone Node CLI/script doing just
  `client.auth.start().awaitCompletion()`, the unref'd between-poll
  timer was the only event-loop handle, so Node could exit before
  the user finished authorization. The poll wait is foreground work
  the caller explicitly awaits — keep it ref'd.

## Information-leak fixes

- **S1 — sanitize `persist_failed` hint**. `err.message` from
  `cacheQwenCredentials` embeds the full `~/.qwen/oauth_creds.json`
  path. Broadcast via SSE, that path leaks the daemon's home layout
  to every connected session subscriber. Replace user-facing hint
  with `"credentials could not be written to the daemon filesystem
  — check disk space and permissions"`; full err goes to stderr
  audit only.
- **S2 — sanitize upstream `pollDeviceToken` hint**. The class
  embedded the entire raw IdP response body (which can be an HTML
  error page from a reverse proxy) into the thrown message. Same
  broadcast leak path. Replace upstream-error hint with
  `"unexpected response from identity provider"`; RFC 8628 errors
  use `"Qwen IdP returned ${kind}"`.

## Cleanup / forward-compat

- **D1 — drop duplicate `clearCache()`** at `qwenOAuth2.ts:840`. The
  paired call became redundant once `cacheQwenCredentials` folded
  the clearCache in (PR #4255 fold-in 1). The fold-in 1 message
  said this would be done; the duplicate slipped through.
- **S3 — drop unused `DeviceFlowNotFoundError`** (`deviceFlow.ts`).
  Exported but never imported; route handlers do inline 404 JSON.
- **S4 — single-source SDK status / errorKind unions**
  (`types.ts`). `DaemonAuthDeviceFlowSdkStatus` /
  `DaemonAuthDeviceFlowSdkErrorKind` were parallel literal copies
  of the canonical events.ts definitions — drift waiting to happen.
  Now imported + aliased as type-only re-exports.
- **S5 — broadcast 100% fail elevates to stderr**
  (`httpAcpBridge.ts`). Per-session bus failures stay debug-only,
  but a broadcast where EVERY session bus refused is operationally
  interesting (clients won't see the event). Track success / fail
  counts; `writeStderrLine` when `successCount === 0`.
- **S6 — `this.disposed` check after `await provider.start()`**
  (`deviceFlow.ts`). `dispose()` mid-start would orphan the freshly-
  inserted entry (`schedulePoll` guards on `disposed` so no poll
  fires; the entry never transitions). Throw post-await if disposed.
- **W1 — thread `signal` into `requestDeviceAuthorization`**
  (`qwenOAuth2.ts` + Qwen provider). `start()` had the same
  cancellation gap that `pollDeviceToken` had — a slow
  device-authorization request couldn't be aborted during shutdown.
  Now plumbed end-to-end.
- **W2 — split `invalid_request` from `unsupported_provider`**
  (`server.ts`). Conflating them surfaced misleading remediation
  hints to SDK consumers branching on `code` ("this provider isn't
  supported here" when the real cause was a serializer dropping the
  field). Bad-shape now returns `code: 'invalid_request'`;
  unknown-but-well-formed stays `unsupported_provider`.
- **W3 — drop never-populated `accountAlias`**
  (Qwen provider). The field was wired through types / events /
  reducer / audit but the Qwen IdP's token response doesn't carry
  one (no `name` / `email` / `sub`). Returning only `{expiresAt}`
  makes the field type-honestly absent rather than always-undefined.
  Future provider with an alias-bearing response can populate it.
- **W4 — `DaemonAuthFlow` JSDoc accuracy**. Doc claimed "first
  attempts to consume an SSE event stream … falls back to GET-based
  polling"; actual is GET-only with SSE as a real-time hint for
  clients already subscribed to a session stream.
- **W5 — clearer unit arithmetic** in interval normalization. The
  `(_INTERVAL_MS / 1000) * 1000` cancelation hid the s↔ms boundary;
  expanded form makes both branches unit-explicit.

## Test changes

- `daemonEvents.test.ts` updated to match the now-OPEN errorKind
  union (forward-compat assertion + empty-string still rejected).
- `deviceFlow.test.ts` `FakeProvider.poll` aligned with the new
  `persist({signal})` signature + optional `unpersist`.

## Validation

- `npm run typecheck --workspace packages/cli --workspace
  packages/sdk-typescript --workspace packages/core` — clean
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 368/368
- `npx eslint --max-warnings 0` over the 11 PR 21 surface files —
  clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-2 review feedback

10 new threads from @wenshao's second deep-review pass on #4255.
Verified status: 5 real issues, 1 improvement, 3 stale (already
fixed; comments lagged), 1 false alarm (typecheck demonstrably
clean).

## Critical fixes

- **fold-in 2 C4 REVERSED**: when `provider.poll()` returns success
  AND `cancel()` / `dispose()` transitioned the entry mid-`persist()`,
  the registry now FORCES the entry to `authorized` and keeps the
  on-disk credentials. The earlier rollback (`unpersist()`) wasted
  the user's IdP approval because the RFC 8628 `device_code` is
  single-use — re-running the flow would force them through the
  whole browser-prompt + paste-code dance again for a click whose
  intent was likely "stop the wait" rather than "undo my already-
  completed approval". Aligns with gh CLI / Auth0 SDK / git-
  credential-manager. Audit captures the race via `hint:
  'lost_success_kept ...'`. `DeviceFlowPollResult.success.unpersist`
  field + Qwen provider's `clearQwenCredentials` rollback removed.
- **#1 GET /workspace/auth/device-flow/:id strict gate**: this GET
  surfaces `userCode` / `verificationUri` for pending entries, which
  on the loopback no-token default were readable by any local
  process. POST + DELETE were already strict; aligning GET closes
  the information-disclosure asymmetry. `/workspace/auth/status`
  stays bearer-only (its `pendingDeviceFlows` entries intentionally
  omit `userCode`).
- **#2 `inFlightStarts` hard timeout**: a hung `provider.start()`
  (network partition, unresponsive IdP) used to leave the per-
  `providerId` slot in `inFlightStarts` occupied forever, blocking
  every subsequent POST until daemon restart. New
  `DEVICE_FLOW_START_TIMEOUT_MS = 30_000` arms a timer that
  `cancelController.abort()`s the start; the rejected promise
  unwinds through the `try/finally` clearing the slot.
- **#10 chain-completing the C3 persist-timeout**: the earlier C3
  fix armed a 30s timer that fired `cancelController.abort()` then
  `await result.persist({signal})`, but the chain ended at the
  registry boundary — `cacheQwenCredentials` didn't take a signal,
  so `fs.writeFile` couldn't be aborted. Now `cacheQwenCredentials`
  accepts an optional `{signal}` and threads it into
  `fs.writeFile(..., {signal})` (Node native). The Qwen provider's
  `persist({signal})` forwards the entry's
  `cancelController.signal` end-to-end.

## Improvement (#4): 404 fallback errorKind

`pollUntilTerminal`'s 404 catch used to synthesize
`{status: 'expired'}` for ALL evicted entries — conflating "your
flow expired during your disconnect", "the daemon was restarted",
and "your deviceFlowId was wrong". Now returns
`status: 'error'` + `errorKind: 'not_found_or_evicted'` + a `hint`
so SDK consumers branching on errorKind can distinguish.

## Information leak (#9): start() path raw IdP message

S2 (fold-in 2) sanitized `poll()`'s upstream-error hint, but
`start()` still embedded the raw `err.message` (full IdP response,
potentially HTML from a reverse proxy / WAF) into the
`UpstreamDeviceFlowError` that flowed to SDK clients via the 502.
Now uses static messages for the SDK-visible errors; raw detail
goes through `writeStderrLine` for operator audit only. Mirrors
S2's approach.

## Stale comments cleaned (#5, #7)

`qwenDeviceFlowProvider.ts:177` claimed
`cacheQwenCredentials` "doesn't currently take a signal — that's
a follow-up". After #10 above, that's no longer true; the comment
is replaced with the actual end-to-end signal-threading note.

## Not adopted (1 false alarm)

- Thread on `types.ts:330` claimed type-only-import-after-
  declarations breaks `tsc` and fails `daemonEvents.test.ts:670`
  with TS2345. Demonstrably false: `npx tsc -p
  packages/sdk-typescript/tsconfig.json --noEmit` exits 0;
  `daemonEvents.test.ts` is the post-fold-in-2 file with the
  open-allowlist assertion (test 28/28 passes). The reviewer may
  have been looking at a transient state during their analysis.

## Validation

- `npm run typecheck --workspace packages/cli --workspace
  packages/sdk-typescript --workspace packages/core` — clean
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398
  pass
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-3 review feedback

5 new threads from the third deep-review pass on #4255. 3 real
issues fixed; 1 stale (already done in fold-in 3); 1 deferred as
non-blocking design suggestion.

- **A — `expiresIn` / `interval` non-finite guard**
  (`deviceFlow.ts`). The provider contract types both as `number`,
  but a misbehaving / future provider could hand `undefined` /
  `NaN` / `Infinity`. `Math.max(0, NaN) * 1000` is `NaN`, then
  `now() + NaN` is `NaN`, then `now >= NaN` is always `false` —
  the sweeper would NEVER evict the entry, pinning an upstream
  `device_code` slot until daemon restart. Same hazard on
  `interval * 1000` (NaN → `setTimeout(NaN)` fires immediately,
  Infinity → scheduler clamps to TIMEOUT_MAX). Now both fields go
  through `Number.isFinite(x) && x > 0`; missing/bad values fall
  back to RFC 8628's recommended ceilings (10 min for expiry, 5s
  for interval).

- **D — typed `app.locals` accessor**
  (`deviceFlow.ts` + writer/reader call sites). The
  `app.locals['deviceFlowRegistry']` string key was shared between
  `createServeApp` (writer) and `runQwenServe` (reader); a typo on
  either side would compile cleanly and the shutdown dispose call
  would silently no-op, leaving polling timers running until the
  `unref()` rescue. New `setDeviceFlowRegistry(app, registry)` /
  `getDeviceFlowRegistry(app)` pair gives both call sites
  type-checked access; the string literal is encapsulated in one
  module.

- **E — `UnsupportedDeviceFlowProviderError` docstring**
  (`deviceFlow.ts`). After fold-in 2's W2 fix split
  `invalid_request` from `unsupported_provider`, the route layer
  screens unknown ids against `DEVICE_FLOW_SUPPORTED_PROVIDERS`
  before reaching the registry — so this error is now reachable
  ONLY on a daemon-internal invariant violation (id is declared
  supported but not registered in the runtime provider map).
  Docstring + thrown message updated to reflect that this branch
  signals a programmer error, not user input.

- **B** claimed `cacheQwenCredentials(credentials)` doesn't forward
  signal to `fs.writeFile`. Verified: fold-in 3 (#10) at
  `qwenDeviceFlowProvider.ts:204` calls
  `cacheQwenCredentials(credentials, { signal: persistOpts.signal })`
  and the core helper threads it into `fs.writeFile(..., {mode,
  signal})`. The reviewer was looking at the comment block above
  (lines 174-181) without scrolling to the actual call site.

- **C — SDK `cancelDeviceFlow` lossy 204/404 collapse**.
  Suggested returning `{existed: boolean; alreadyTerminal: boolean}`
  instead of resolving void on both 204 and 404. Real signal-loss
  but tagged "[非阻塞]" by the reviewer; changing requires a
  daemon route shape change (200 + body instead of 204) which is
  better as a focused follow-up PR. Acknowledged in-thread;
  deferred to a fold-in PR after #4255 lands.

- `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}`
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-4 review feedback

4 threads from the fourth review pass on #4255. 3 adopted + 1
deferred (out-of-scope rename of PR 15's `mutate` helper).

## Adopted

### #1 — `persistInFlight` flag suppresses cancel × persist event-stream UX trap

When `provider.poll()` returns success and we await `persist()`, a
concurrent `cancel()` would synchronously transition the entry to
`cancelled` and emit `auth_device_flow_cancelled` — then `persist()`
resolves and (per fold-in 3 C4) force-overrides to `authorized` +
emits `auth_device_flow_authorized`. The reducer state correctly
last-write-wins on `authorized`, but DIRECT event-stream consumers
(close-dialog handlers, telemetry, UI cleanup) race onto an unmounted
UI when the second event lands.

Now: while persist is in-flight, `cancel()` and the sweeper SKIP the
state transition + event emit. They register intent (set
`cancelRequestedDuringPersist=true` for cancel; sweeper just no-ops)
and let the persist resolution decide:

- persist succeeds → `authorized` (IdP wins per fold-in 3 C4)
- persist fails AND cancel was requested → `cancelled`
- persist fails AND `now >= expiresAt` → `expired` / `expired_token`
- persist fails otherwise → `error` / `persist_failed`

Result: at most one terminal event per flow. Imperative SSE
consumers no longer see oscillating terminal states. Audit captures
the race (`hint: 'lost_success_kept ...'`) for incident-response
correlation.

### #2 — `revealSecret` → `unsafeRevealSecret` rename

The earlier JSDoc claimed "the `unsafeReveal_` naming is intentional:
greppable in code review, easy to allowlist in lint rules, hard to
invoke by accident" — but the actual function was named
`revealSecret`. The promised safety properties didn't exist; a code
reviewer wouldn't single out `revealSecret` as suspicious, and a
`no-restricted-syntax` ESLint rule wouldn't flag it.

Renamed to `unsafeRevealSecret` so the JSDoc-promised "greppable" /
"lintable" property is now actually true. Two call sites in the
Qwen provider + 4 test references updated. Internal symbol; not
exposed through the SDK package.

### #4 — `QwenOAuthPollError` typed class replaces substring regex

The earlier RFC 8628 error mapper used an anchored regex against the
thrown error message text — an implicit cross-file string contract
between `qwenOAuth2.ts` (throws) and `qwenDeviceFlowProvider.ts`
(matches). If `qwenOAuth2.ts` ever changed its message format, ALL
RFC 8628 errors (`expired_token` / `access_denied` / `invalid_grant`)
would silently fall through to `upstream_error` — wrong errorKind
flowing through telemetry with no test or type-system check to catch
the drift.

Now `QwenOAuth2Client.pollDeviceToken` throws a structured
`QwenOAuthPollError extends Error` with `oauthError` / `description`
/ `status` fields. The provider branches on `instanceof
QwenOAuthPollError` and reads `.oauthError` directly via a
dedicated `mapRfc8628OAuthCode(code)` switch. The drift hazard is
gone: a future code change that touches the typed class will
fail tsc until both sides are updated. Message format preserved
for any pre-existing log-parsing / substring matchers.

## Not adopted

### #3 — `mutate({strict:true})` semantic awkwardness on GET

Reviewer correctly noted that `mutate` is named for state-changing
routes, but `GET /workspace/auth/device-flow/:id` uses it for an
information-disclosure defense (only reachable code path is reading
state). Suggested rename: `mutate` → `strictHttpGate`.

Deferred: the rename touches PR 15's helper which has many call
sites in `server.ts` and is shared infrastructure for Wave 4 PRs
17/19/20. PR 21 is the first / only consumer of the strict-on-GET
form so far; widening the rename to a Wave 4 follow-up keeps the
fold-in scope tight. Replied in-thread.

## Validation

- `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}`
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 544/544
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-5 review feedback

Five small adopt items from the round-5 review pass; one stale thread
already addressed in b5b77ee90 (fold-in 5).

#2 — `as const` + derived type for DEVICE_FLOW_SUPPORTED_PROVIDERS so
adding/removing a provider id requires touching exactly ONE site.
Mirrors `SERVE_ERROR_KINDS` / `ServeErrorKind` in `status.ts`.

#3 — Clarify `DEVICE_FLOW_EXPIRY_GRACE_MS` JSDoc to distinguish the
daemon's 30s SWEEP cadence (what the grace tracks) from the 5-min
TERMINAL_GRACE_MS reconnect window (which awaitCompletion does NOT
need to wait through).

#4 — Add `@remarks` block on `DeviceFlowProvider.poll()` warning
future provider authors that thrown `err.message` flows verbatim
into the SSE-broadcast `auth_device_flow_failed` hint, and must be
sanitized. Two equally-correct paths documented (typed `error`
result vs sanitized thrown message).

#5 — Truncate raw IdP detail in `qwenDeviceFlowProvider.ts` stderr
audit lines to 2 KiB. WAFs / reverse proxies can return MB-sized
HTML error pages, and container log aggregators (Loki, Fluent Bit,
Stackdriver) typically truncate or drop lines past 4-32 KiB —
losing the useful prefix downstream. 2 KiB retains structured JSON
envelopes while staying well below every aggregator's per-line cap.

#6 — Track latest `originatorClientId` on per-provider singleton
take-over via new `entry.lastOriginatorClientId` field +
`recordTakeover()` helper. When a second SDK client posts
`POST /workspace/auth/device-flow` for an already-pending provider
(or one being created in `inFlightStarts`) with a different
`initiatorClientId`, an audit breadcrumb records the take-over so
incident response can correlate "client A started, client B took
over at 12:34". Event-routing intentionally still uses the original
`initiatorClientId` (events are workspace-broadcast and changing
the originator field mid-flow would break SDK reducers that key on
it). Two new tests cover the differing-id audit + same-id no-op.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-6 review feedback

Six "Critical" findings from a gpt-5.5 /review pass — all real
liveness/correctness defects in the daemon's auth device-flow path
and the SDK's awaitCompletion polling loop.

#1 — Make `provider.start()` timeout authoritative via `Promise.race`
in `DeviceFlowRegistry.doStart`. The earlier shape only ABORTED the
signal on timeout; a provider that ignores `signal` (non-abortable
I/O, future implementer who forgets to thread it to `fetch`) would
leave the `await` hanging until daemon restart, pinning the
`inFlightStarts` slot for that providerId. Race against a rejecting
timer makes the timeout authoritative regardless of provider
cooperation; abort still fires for cooperative cleanup.

#2 — Same shape for `result.persist()` in the success branch of
`runPollTick`. A future provider whose persist performs
non-abortable steps (mkdir/chmod/mv outside the abortable
fs.writeFile path) would otherwise hang the poll tick until process
restart. Race against rejecting timer; rejection maps to
`persist_failed`.

#3 — Clamp `expiresIn` and `interval` upper bounds. Previous
`Number.isFinite + > 0` guards stopped NaN/Infinity but a finite
extreme like `1e12` was still accepted — pinning the per-provider
singleton for ~30,000 years (`expires_in`) or scheduling a
TIMEOUT_MAX-clamped poll that never fires within `expiresAt`
(`interval`). Two new constants (`DEVICE_FLOW_MAX_EXPIRES_IN_SEC =
3600`, `DEVICE_FLOW_MAX_INTERVAL_MS = 60_000`) cap the worst case.

#4 — Extract `getDeviceFlowOrSynthetic404(...)` helper in
`DaemonAuthFlow.ts` and route BOTH the loop body and the
timeout-ceiling final read through it. Previously the ceiling read
went directly through `client.getDeviceFlow` and a 404 at the
boundary (entry evicted just as the timeout fired) would reject with
`DaemonHttpError(404)` instead of returning the structured `{ status:
'error', errorKind: 'not_found_or_evicted' }` that the rest of
`awaitCompletion` promises.

#5 — Validate `AwaitCompletionOptions.timeoutMs` and `pollOverrideMs`
with `Number.isFinite + > 0`. NaN slipped past the previous `??
default` form (NaN is truthy-ish in that position) and produced a
`ceiling` of `NaN` (loop runs forever — `now >= NaN` always false)
or a `setTimeout(NaN)` (Node clamps to 1ms — tight polling loop).
Sanitize to `undefined` so the documented defaults take effect.

#6 — Thread `signal` into `DaemonClient.getDeviceFlow` and forward
to `fetchWithTimeout` (which already composes caller + timeout
signals). awaitCompletion now passes `opts.signal` from both GET
sites. Without this, an `awaitCompletion` caller that aborts mid-
poll could not cancel an in-flight stalled GET; it would have to
wait for the daemon-side `fetchTimeoutMs` (30s default) to fire.

Four new tests in `deviceFlow.test.ts` pin the new behaviors:
hanging-start timeout (#1), hanging-persist → persist_failed (#2),
extreme-expiresIn clamp (#3), extreme-interval clamp (#3).
FakeProvider gained a `startHangs` flag for the non-cooperative
provider scenario.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-7 review feedback

Two findings from a DeepSeek /review pass; both small but legitimate
defense-in-depth gaps.

#1 — `runPollTick`'s catch block forwarded `err.message` verbatim
into the SSE-broadcast `hint`. The provider's `@remarks` contract
(fold-in 6 #4) requires throwers to sanitize, but if violated the
unbounded raw payload would reach every SSE subscriber. Added
`DEVICE_FLOW_POLL_HINT_MAX_LEN = 256` + `truncatePollHint()`,
applied to the catch's `result.hint`. Full raw `err.message` is
still routed to the audit trail (`audit?.record({hint: 'provider.poll()
threw (raw): ...'})`) so operator visibility for incident response
is preserved. Belt-and-suspenders: the contract is now structurally
enforced rather than relying on every future provider author to
read the JSDoc.

#2 — `updateMatchingFlow` (and the `started`/`authorized` handlers
in `reduceDaemonAuthEvent`) unconditionally overwrote state without
comparing `rawEvent.id` against the existing flow's
`lastSeenEventId`. The field's JSDoc documented it as a monotonic
counter to prevent stale frames from overwriting newer state, but
the code didn't enforce that contract. SSE reconnect with
`Last-Event-ID < terminal-frame-id` would replay older frames; if
any of them were for the same `deviceFlowId` (e.g. a delayed
`failed` arriving after `authorized`) the stale frame would
overwrite the terminal. Daemon-side `transitionTerminal` makes the
exact reachable scenario thin, but the documented contract should
match the code.

Threaded `rawEventId` into `updateMatchingFlow` and added the gate
there + in the `started` and `authorized` handlers (the two cases
that don't go through `updateMatchingFlow`). Synthetic frames
without an envelope `id` (`rawEventId === undefined`) bypass the
gate — they originate inside SDK reducer machinery and aren't
subject to replay ordering.

Three new tests pin the contracts:
- `runPollTick catch truncates the SSE hint and preserves raw on
  the audit (fold-in 8 #1)` — `pollThrowsWith` flag on FakeProvider
  models a non-conforming provider; SSE hint < 400 chars + contains
  'truncated'; audit hint contains the full 4_000-char raw.
- `reduceDaemonAuthEvent rejects out-of-order frames (fold-in 8 #2
  monotonicity)` — stale `failed`(id=7) does NOT overwrite
  `authorized`(id=10); stale `started`(id=4) for a different flow
  also rejected.
- `reduceDaemonAuthEvent passes synthetic frames (no envelope id)
  through the gate` — SDK-internal frames without `id` are honored.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-8 review feedback

Twelve correctness + structural fixes from a wenshao + DeepSeek + gpt-5.5
review pass. Tests deferred to fold-in 10 (separate, larger commit).

CRITICAL CORRECTNESS

#7 — `provider.persist()` Promise.race could publish `persist_failed`
to SSE while a non-cooperative provider was still committing
credentials to disk. Added an independent tracker on the original
persist promise: if the race timed out (`persistTimedOut === true`)
AND the underlying persist later resolved successfully, audit a
`lost_success_after_timeout` breadcrumb so operators see the
inconsistency. Tightened the persist `@remarks` contract to require
signal honoring end-to-end. Qwen provider already complies (fold-in
3 #10); this is forward-defense for future providers.

#11 — auth surface (`DaemonAuthFlow`, `reduceDaemonAuthEvent`,
`createDaemonAuthState`, `DEVICE_FLOW_EXPIRY_GRACE_MS`, all event /
data / state types) was re-exported from `src/daemon/index.ts` but
NEVER from the published SDK entry `src/index.ts`. SDK consumers got
`undefined` for everything except `client.auth.start()` (which
traveled through the already-exported `DaemonClient`). Added the
missing exports and pinned via `daemon-public-surface.test.ts`.

#12 — `core/src/qwen/qwenOAuth2.ts:373`'s
`debugLogger.debug('Device authorization result:', result)` writes
the raw `device_code` (RFC 8628 bearer-equivalent credential) to
stderr / journald, bypassing the `BrandedSecret` redaction layer.
Pre-existing on main but PR 21 expanded the exposure surface.
Sanitized to log only `{ ok, expires_in }` on success / `{ ok,
error }` on error.

#13 — `runPollTick` success-branch persist-failure × past-`expiresAt`
classified as `expired_token` instead of `persist_failed`, routing
operators toward "tell user to retry" (RFC 8628 expiry) when the
actual root cause was disk I/O. Reclassified to `persist_failed`
with a `persist_also_failed_past_expiry` audit hint to preserve the
timing detail for incident response.

SMALL CORRECTNESS

#1 — `runPollTick` catch hint replaced with a STATIC bounded message
("provider.poll() failed; see daemon audit log for details"). The
fold-in 8 truncated-prefix approach could still leak the first 256
chars of provider-templated raw text including secret material. Full
raw still routed to audit channel for operator visibility.

#5 — `cancellerClientId` field added to `DeviceFlowEntry`; deferred-
cancel branch in `cancel()` now stamps it on the entry, and the
persist-resolution `cancelled` event publish uses
`entry.cancellerClientId ?? entry.initiatorClientId`. SSE consumers
that suppress self-emitted events can now attribute the cancel
correctly.

#6 — `AwaitCompletionOptions.timeoutMs === 0` (the documented
"settle immediately, return current daemon view" contract) was
treated as falsy by the `?` ternary, falling back to the default.
`sanitizePositiveMs` now takes an `allowZero` opt-in; the ceiling
computation uses `!== undefined` instead of truthy check.

#8 — `EventBus.publish()` returns `undefined` for closed buses (it
does NOT throw). `broadcastWorkspaceEvent` previously counted that
path as success, hiding the all-buses-dropped operator alarm.
Folded the closed-bus-as-failure check into the canonical
`publishWorkspaceEvent` (see #X below).

#9 — start-timeout Promise.race rejected with a plain `Error`,
falling through `sendBridgeError` to a generic 500. Switched to
`UpstreamDeviceFlowError` so a hung IdP correctly surfaces as 502
(matching the envelope every other IdP start failure uses).

STRUCTURAL

#3 — Three identical `transitionTerminal + publish + audit`
expired_token blocks in `runPollTick`/`sweep`/(removed by #13)
deduplicated into a private `expireEntry()` helper. Future event-
shape changes are now a one-edit operation.

#X — PR 16 (#4249) merged on 2026-05-18 06:27Z. Per the inline
comment at httpAcpBridge.ts:501, PR 21's `broadcastWorkspaceEvent`
was kept distinct only to avoid the merge conflict; once PR 16
landed, it became a fold-in candidate. Folded the closed-bus +
all-failed-stderr-escalation operator-visibility features (PR 21's
S5 + fold-in 9 #8) INTO `publishWorkspaceEvent`; dropped
`broadcastWorkspaceEvent` from the bridge interface + impl + test
mocks. PR 21's deviceFlowEventSink now calls
`bridge.publishWorkspaceEvent` — single canonical workspace fan-out.

DOC

#16 — Added a "Cross-client take-over" paragraph to
`docs/users/qwen-serve.md` explaining that two clients on the same
daemon for the same provider get the per-provider singleton with
`attached: true`/`false` distinguishing them; no separate event
fires (both eventually observe the same `auth_device_flow_authorized`).

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-9 review feedback

Two small non-blocking items from the round-9 pass; defensive shape +
docs only. The 4 deferred test-coverage threads (#1-4 of round-8) are
still tracked for fold-in 10.

#6 — `lastSeenEventId` typed `number` with `?? 0` defaults in the
`auth_device_flow_started` reducer case. The daemon-side `EventBus`
assigns ids ≥ 1 so the `0` sentinel has no real-traffic meaning, but
the monotonic gate (`rawEventId <= flow.lastSeenEventId`) would
reject any future SDK-internal synthetic frame using `id: 0`.
Changed the field type to `number | undefined` and dropped the
`?? 0` from the started case. The `updateMatchingFlow` /
`auth_device_flow_authorized` guards already short-circuit on
`existing.lastSeenEventId !== undefined`, so undefined is safe
end-to-end. Existing 34 reducer tests still pass unchanged.

#7 — Added `@remarks` block to `DeviceFlowErrorKind.persist_failed`'s
JSDoc explaining the lost-success retry UX. When fold-in 9 #7's
`lost_success_after_timeout` audit fires (non-conforming provider
violates signal contract; disk write succeeds after registry
published `persist_failed`), a naive SDK retry hits the IdP a
second time with a fresh `device_code` and prompts the user
twice — but the first credential set is already valid. JSDoc now
documents the mitigation: SDK consumers writing retry logic on
`persist_failed` should call `client.auth.getStatus()` BEFORE
re-prompting; operators can grep stderr/audit for
`lost_success_after_timeout` to detect occurrences.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(serve): fold-in 10 — auth device-flow test bundle (#4255)

Lands the four deferred test-coverage items the round-8 review
flagged (and round-9 re-surfaced) as a hard merge prerequisite.
Net +41 tests across registry / SDK helper / client HTTP /
HTTP route layers.

#1 — `deviceFlow.test.ts` `persist failure paths` describe (3
tests, +3). The success arm's three terminal mappings — pure
`persist_failed`, `cancelled` (cancel during persist), and
`persist_failed` past `expiresAt` (the fold-in 9 #13
reclassification with `persist_also_failed_past_expiry` audit
hint) — were 0-covered. Now pinned. Test #2 also asserts the
fold-in 9 #5 cancellerClientId routing on the deferred
`cancelled` event.

#2 — new `DaemonAuthFlow.test.ts` (+14 tests). Mock DaemonClient
with sequenced `getDeviceFlow` replies. Covers happy-path
polling → `authorized`; `slow_down`-driven `intervalMs` bump
firing `onThrottled`; `signal.abort()` rejection; `signal`
propagation through `client.getDeviceFlow` (fold-in 7 #6);
`timeoutMs` ceiling final-read; `timeoutMs:0` immediate-return
(round-9 #6); NaN/Infinity → `sanitizePositiveMs` fallback to
default ceiling (fold-in 7 #5); 404 → synthetic
`error`/`not_found_or_evicted` (fold-in 3 #4) at BOTH the loop
body AND the timeoutMs ceiling read (fold-in 7 #4); non-404
DaemonHttpError rethrown; `cancel()` and top-level
`status()`/`cancel()` wrappers forward correctly.

#3 — `DaemonClient.test.ts` `device-flow methods` describe
(+11 tests). POSTs `/workspace/auth/device-flow` happy path +
clientId header + body shape; 200/201 acceptance; non-2xx →
`DaemonHttpError`. GETs URL-encode the deviceFlowId; forward
`opts.signal` to `fetchWithTimeout`'s composed signal (fold-in
7 #6 — verified by aborting caller signal and observing the
fetch's signal flip to `aborted`); 404 throws. DELETEs
swallow 204 + 404 (idempotent, mirrors `closeSession`); non-
204/404 throws. `getAuthStatus` plain GET. `client.auth`
lazy-instantiated singleton.

#4 — `server.test.ts` 5 supplementary contract tests (+5).
The existing 8 `it()`s cover happy paths + take-over + 401
POST + DELETE pending/terminal/unknown + 502 upstream + sweeper.
This commit plugs gaps: 400 `invalid_request` for missing /
non-string providerId (fold-in W2 split this from
`unsupported_provider`); 409 `too_many_active_flows` (via
injected fake registry); 401 `token_required` on DELETE
without bearer; the asymmetric GET posture
(`/workspace/auth/device-flow/:id` IS strict-gated to prevent
peer-process userCode shoulder-surf; `/workspace/auth/status`
stays read-only because its `pendingDeviceFlows` entries
intentionally redact `userCode`).

Validation: cli serve 631/631 (+8 from #1, #4); sdk 384/384
(+25 from #2, #3, +/- some pre-existing churn). Typecheck +
lint clean.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(qwen): atomic temp+chmod+rename in cacheQwenCredentials (PR #4255 round-11 #2)

gpt-5.5 /review flagged a real correctness/security gap: the
post-write `chmod` ordering left a window where freshly-written
credentials could land in a broadly-readable existing
`oauth_creds.json` before the chmod tightened it. On POSIX, a
chmod failure additionally degraded to a debug warning while the
broadly-readable tokens stayed on disk.

New shape mirrors the standard atomic-write idiom:

  1. Write `${filePath}.tmp.${pid}.${randomUUID()}` with `mode: 0o600`.
     The temp path doesn't exist beforehand, so the `mode` flag
     actually applies on creation (it doesn't on existing files,
     which was the root of the original race).
  2. Defensive `chmod` on the temp file. POSIX failure is now a
     HARD ERROR (refuses to publish broad-perm credentials to the
     canonical filename). Windows logs a debug breadcrumb and
     proceeds, since chmod is a no-op on most NTFS volumes (perms
     go through ACLs).
  3. Atomic `fs.rename` over `filePath`. The canonical path is
     ALWAYS at `0o600` from the moment it contains the new tokens;
     readers see either the old creds or the new creds, never a
     partially-written or broadly-readable state.
  4. Best-effort `fs.unlink` of the temp file on any failure path
     so failed writes don't leave `.tmp.<pid>.<uuid>` litter on
     disk.

Test mock in `qwenOAuth2.test.ts` extended with `chmod` + `rename`
no-op stubs so the existing 158 core/qwen tests still pass; no test
behavior change beyond the mock surface.

Validation: typecheck clean (cli + core + sdk-typescript); core
qwen 158/158; cli serve 643/643; sdk 384/384.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao + gpt-5.5 round-12 review feedback

Eight findings from a wenshao + gpt-5.5 /review pass: 1 critical
correctness, 2 real defensive defects, 4 edge cases / minor
hardening, 1 test gap. All adopted.

CRITICAL CORRECTNESS

#1 CzSpN — `dispose()` race: after `await provider.poll(...)` the
post-await guard checked only `entry.status !== 'pending'`, NOT
`this.disposed`. `dispose()` clears the registry maps and aborts
the entry's signal but doesn't mutate `entry.status`, so a
provider whose poll already resolved (or doesn't honor abort) could
enter the success branch and call `result.persist({...})` —
committing credentials on a shutting-down daemon. Added the
`if (this.disposed) return;` guard symmetric with the top-of-method
check.

REAL DEFENSIVE DEFECTS

#2 Cy_ZG — sync-throw escape: the `result.persist({signal})` call
happens BEFORE the `new Promise` constructor that captures it
(`persistTracker` is closed-over inside the constructor). A
non-conforming provider whose persist throws synchronously (e.g.
top-of-function validation) would escape past the outer
`try/catch (await new Promise(...))` and become an
`unhandledRejection` since `runPollTick` is fire-and-forget via
`void`. Wrapped the persist invocation in a try/catch that routes
the sync-throw into the same `persistError` branch.

#3 CzSpe — runtime provider map: provider validation hardcoded
`DEVICE_FLOW_SUPPORTED_PROVIDERS` even though `deps.deviceFlowProviders`
is the documented extension hook for tests/future providers.
Switched both POST validation and `/workspace/auth/status`
`supportedDeviceFlowProviders` to derive from
`deviceFlowProviderMap.keys()` — single source of truth matches
the registry's `resolveProvider`.

EDGE CASES / MINOR HARDENING

#4 Cy_Y9 — `slow_down` re-clamp: `intervalMs += SLOW_DOWN_BUMP_MS`
can push past `DEVICE_FLOW_MAX_INTERVAL_MS` (the bound that keeps
`setTimeout` from clamping to TIMEOUT_MAX). Wrapped in
`Math.min(MAX_INTERVAL_MS, ...)` symmetric with the doStart clamp.

#5 Cy_ZF — `expiresInSec` lower bound: `0.5` was finite-positive
and produced `expiresAt = now() + 500 ms` — first poll (clamped at
≥1 s) fires AFTER expiresAt → flow expires before any user could
authorize. Added `DEVICE_FLOW_MIN_EXPIRES_IN_SEC = 30` (RFC 8628
§3.2 calls 5–30 minutes "reasonable"; sub-30s is non-compliant).

#6 CzHOK — take-over response privacy: `initiatorClientId` was
echoed to ANY take-over POST caller, including those with no
`X-Qwen-Client-Id` header. Bearer-gated already, but the
asymmetry "anonymous caller learns who started it" violated the
no-header-as-privacy-signal contract. Now only echoed when the
caller's id matches the entry's initiator.

#7 CzSpd — production audit visibility: production audit sink
dropped `line.hint`, but the registry uses hints for operator-only
breadcrumbs (`provider.poll() threw (raw)...`,
`lost_success_after_timeout`, `persist_also_failed_past_expiry`,
take-over correlation, `deferred (persist in flight; ...)`). The
documented troubleshooting trail was invisible in production
stderr. Now included with a 1 KiB bound + JSON-quoted so multi-
word hints stay parseable.

TEST GAP

#8 Cy_ZH — `lost_success_after_timeout` audit: the
fold-in 9 #7 split-brain detector for non-cooperative providers
had no test pinning it. Added a controllable `latePersist` Promise
+ test that drives poll → success → enters persist race → fires
PERSIST_TIMEOUT (registry publishes persist_failed) → resolves
persist late → asserts the lost_success audit fires.

Validation: typecheck + lint clean; cli serve 644/644 (+1 from
the new test); sdk-typescript 384/384.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): close concurrent multi-provider cap bypass (PR #4255 round-13 #1)

gpt-5.5 /review caught a real workspace-wide cap bypass:
`countActive()` only counted entries already installed in
`byProvider`, but the cap check at the top of `start()` runs
before any provider's `inFlightStarts` slot completes
`provider.start()`. A burst of fresh starts for
`DEVICE_FLOW_MAX_CONCURRENT + 1` distinct providers all run
synchronously to the cap check (each `start()` is async but
runs to its first await — the await happens AFTER the cap
check), all observe `count === 0` (no `byProvider` entries
installed yet), and all pass — eventually installing more
than the documented four pending flows.

Fix: include `inFlightStarts.size` in `countActive()`. The
two maps are disjoint by construction (the existing-pending
fast-path catches any provider with both), so simple
addition cannot double-count. The second concurrent caller
sees count=1, the third count=2, …, and the (MAX+1)th caller
is rejected with `TooManyActiveDeviceFlowsError`.

Test: `caps at DEVICE_FLOW_MAX_CONCURRENT under CONCURRENT
distinct-provider starts`. Fires `MAX+1` concurrent starts
via `Promise.allSettled`, asserts exactly `MAX` fulfilled +
exactly 1 rejected with the typed error. Pre-fix this test
fails (all `MAX+1` succeed); post-fix it passes.

Validation: typecheck clean across all 4 workspaces;
deviceFlow.test.ts 35/35 (was 34); cli serve 645/645.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant