fix(line): stop silent push failure on lowercased recipients (#81628)#81704
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. for source-level reproduction: current main lowercases LINE peer IDs in session keys, and cron-tool can copy that lowercased fragment into delivery.to when currentDeliveryContext is unavailable. The linked issue also provides sanitized failed-queue payloads, retry logs, and direct LINE API evidence. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this focused PR after the usual maintainer merge checks, then close the linked bug and supersede the alternative open fix at #81670 if this branch is chosen. Do we have a high-confidence way to reproduce the issue? Yes for source-level reproduction: current main lowercases LINE peer IDs in session keys, and cron-tool can copy that lowercased fragment into delivery.to when currentDeliveryContext is unavailable. The linked issue also provides sanitized failed-queue payloads, retry logs, and direct LINE API evidence. Is this the best way to solve the issue? Yes; refusing unrecoverable LINE session-key fallback and adding a LINE-owned permanent-error guard is the narrowest maintainable fix. Stricter full LINE id validation can remain separate because it would broaden fixture and compatibility churn. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 65dd71d42dee. |
LINE chat ids are case-sensitive (push requires capital C/U/R) but the session key holds the peer id lowercased for canonical routing. When cron-tool runs without currentDeliveryContext (delivery-recovery, queue replay after reply-token expiry), inferDeliveryFromSessionKey was lifting the lowercased fragment straight into delivery.to, producing a value LINE rejects with HTTP 400 — the job retried five times silently and the dashboard reported "delivered" while the LINE group received nothing. Refuse the session-key fallback for channel === "line" so the missing target surfaces explicitly instead of scheduling an undeliverable job.
…claw#81628) Defense-in-depth safety net for openclaw#81628: even with the cron-tool fix in place, any other code path that ever produces a 33-char LINE-shaped recipient missing its leading capital (C/U/R) would otherwise hit the LINE API and return HTTP 400 with no permanent-error signal, causing delivery-recovery to retry five times before moving the entry to failed/. normalizeTarget now throws "Recipient is not a valid LINE id ..." when the post-strip value looks like a LINE id but the case was lost. The message matches the existing /recipient is not a valid/i pattern in delivery-queue-recovery's PERMANENT_ERROR_PATTERNS, so recovery moves the entry to failed/ on the first attempt instead of silently retrying. Short fixtures (length < 33) are left alone so existing tests using "U123", "line:user:1", etc. keep working.
b1e79cf to
dbd2899
Compare
|
Landed, thanks @edenfunf. Proof:
Merged as 8338412. Follow-up refactor target: stop reconstructing delivery recipients from lossy canonical session keys and carry/persist explicit delivery context through recovery paths instead. |
Summary
Fixes #81628. LINE push silently fails for long-running tasks because the cron-tool's session-key fallback produces a lowercased recipient that LINE rejects with HTTP 400, and the LINE plugin has no early-rejection guard, so the failure burns five retries inside
delivery-recoverybefore the entry is moved tofailed/. Operators see "delivered" on the dashboard while the LINE group receives nothing.inferDeliveryFromSessionKeyinsrc/agents/tools/cron-tool.tsrebuildsdelivery.tofrom the session-key fragment whencurrentDeliveryContextis unavailable (e.g. queue replay, post-reply-token push).buildAgentPeerSessionKeylowercases the peer id for canonical routing, so the rebuilttois lowercased (cabcdef...instead ofCabcdef...). LINE chat ids are case-sensitive — push returns HTTP 400"The property, 'to', in the request body is invalid".channel === "line", so the missing target surfaces explicitly. (2) LINE plugin'snormalizeTargetrejects 33+ char recipients that do not begin with a capital C/U/R; the error message matchesdelivery-queue-recovery'sPERMANENT_ERROR_PATTERNS, so a recovery replay that ever encounters a lowercased value moves the entry tofailed/on the first attempt instead of retrying 5x."U123","line:user:1"continue to pass the LINE plugin (length-based gate). Session-key canonical lowercasing is intentional and untouched.Root cause
Two layers each contribute to the silent failure:
inferDeliveryFromSessionKeyinsrc/agents/tools/cron-tool.tstreats the lowercased session-key peer fragment as authoritative whencurrentDeliveryContextand explicitdeliveryare both missing. The Matrix-equivalent of this bug is mitigated for live runs by always supplyingcurrentDeliveryContext(see the existingprefers current delivery context over lowercased session-key targetscase incron-tool.test.ts), but cron-fire and delivery-recovery paths can run without acurrentDeliveryContext. For LINE, even a lowercased session-key fragment cannot recover the original case, so the only safe action is to refuse the fallback.normalizeTargetinextensions/line/src/send.tsstrips theline:group:/line:user:/line:room:prefixes but does not validate the resulting id's case. A lowercased value flows through toclient.pushMessage, LINE returns HTTP 400, anddelivery-recoverydoes not classify that error as permanent — so the entry retries five times before moving tofailed/. The error message added in this PR matches the existing/recipient is not a valid/ipattern inPERMANENT_ERROR_PATTERNS, so a single attempt is enough to flag the failure to operators.Changes
src/agents/tools/cron-tool.ts: returnnullfrominferDeliveryFromSessionKeywhenchannel === "line".src/agents/tools/cron-tool.test.ts: two new repros — group-shape and per-account-channel-peer DM-shape. Both use the realbuildAgentPeerSessionKeyhelper so the canonical-lowercasing step that triggers the bug runs in the same process as the fix.extensions/line/src/send.ts:normalizeTargetthrowsRecipient is not a valid LINE id ...when the post-strip value is at least 33 chars and does not begin with a capital C/U/R.extensions/line/src/send.test.ts: two new cases — reject lowercased 33-char recipient; accept case-exact recipient.Real behavior proof
Behavior or issue addressed: silent LINE push failure for long-running group tasks ([Bug]: [core] LINE delivery-recovery sends wrong recipient (lowercased / session-key prefix kept), causing silent push failures #81628). Queued entry in
~/.openclaw/delivery-queue/failed/*.jsonshowstolowercased; LINE returns HTTP 400The property, 'to', in the request body is invalid; dashboard reports delivered; LINE group receives nothing.Real environment tested: Windows 11 (Node 22.16.0). The
@openclaw/lineplugin was loaded from source vianode_modules/.bin/tsx(no separate build step) and exercised directly through its public runtime-api barrel. The script lives at.artifacts/verify-81628/verify.ts(gitignored). A placeholder string was used for the channel access token; the lowercased recipient is rejected at the outbound guard before any HTTP call to LINE, and the case-exact recipient is verified to pass the guard and only fail later at the network layer with HTTP 401 (a real bot token would proceed past that point).Exact steps or command run after this patch:
The script source (excerpt):
Observed result after fix: the actual
@openclaw/lineplugin code rejects the lowercased recipient at the outbound boundary with the new permanent-error message, before any HTTP request to LINE is issued. The classifierisPermanentDeliveryErrorfromsrc/infra/outbound/delivery-queue-recovery.tsreturnstruefor that message, so a queued delivery carrying this recipient would be moved to~/.openclaw/delivery-queue/failed/on the first attempt rather than retrying five times silently. The case-exact recipient is not affected — it passes the new guard and only fails downstream at the live HTTP layer (HTTP 401 here because the placeholder token is rejected by LINE; a real bot token would proceed to the real send).What was not tested: end-to-end traffic against the live LINE platform during a >60s task in a real group. The verification above exercises the exact case-loss boundary inside the real runtime plugin code; the cron-tool half of the fix prevents the upstream producer from emitting a lowercased recipient in the first place, and the plugin-side guard above demonstrates the safety net for any other code path that ever surfaces such a value.
Supplemental verification
In addition to the live-runtime capture in the Real behavior proof section, the following supplemental coverage was added under
src/agents/tools/cron-tool.test.tsandextensions/line/src/send.test.ts:agent:main:line:group:cabcdef...) and per-account-channel-peer DM-shape session-key recipient (agent:main:line:primary:direct:uabcdef...). Both callbuildAgentPeerSessionKeydirectly so the canonical-lowercasing step that triggers the bug runs in the same process as the assertion. Without the fix both fail withAssertionError: expected 'cabcdef0123456789abcdef0123456789' to be undefined.pushMessagesLine; accept case-exact recipient (the case-exact case proves the length-based gate does not false-positive on real LINE ids).Static checks on the four touched files (
tsgo -p tsconfig.core.json,tsgo -p tsconfig.extensions.json,oxlint) all report zero diagnostics.User-visible behavior change
Before this PR, a cron job scheduled by a LINE-bound agent without
currentDeliveryContext(delivery-recovery path) silently failed for long-task replies (HTTP 400, 5 retries,failed/). After this PR, the same path now either:session.lastTo— which is case-preserved fromctxPayload.OriginatingToin the LINE inbound builder (resolveLastToRawinsrc/auto-reply/reply/session-delivery.tsreturnsoriginatingToRaw || toRaw || persistedLastTo); orRecipient is not a valid LINE id ..., so the entry moves tofailed/immediately and the operator sees the failure in the log instead of being told "delivered".No change for non-LINE channels.
Security impact
Compatibility / migration
tobefore this fix: forward-only fix. Such jobs either (a) succeed at fire-time via thesession.lastTofallback path, or (b) fail loudly via the new plugin guard. No data migration required.Regression coverage
src/agents/tools/cron-tool.test.ts(cron-tool side),extensions/line/src/send.test.ts(plugin side)delivery-queue-recovery'sPERMANENT_ERROR_PATTERNSbuildAgentPeerSessionKeyhelper so the canonical lowercasing happens in the same process; the LINE plugin case exercises the realpushMessagesLineentry point that every outbound LINE push goes through.Followups (not in this PR)
dmScope: "per-peer"produces a session key without a channel marker (agent:main:direct:uxxx...). The cron-tool guard does not fire for that shape because the channel cannot be inferred from the session-key alone; the LINE plugin safety net still catches it at the outbound boundary as a permanent error, but the upstream-source fix would require a different signal than the channel-name check.length >= 33) so short fixtures ("U123","line:user:1") still flow through. Strict shape validation (^[CUR][0-9a-f]{32}$) would require updating 20+ existing fixtures and is left as a separate cleanup.