fix(whatsapp): drain eligible pending deliveries on reconnect#63916
Conversation
🔒 Aisle Security AnalysisWe found 4 potential security issue(s) in this PR:
1. 🟠 Untrusted plugin can drain and resend arbitrary queued deliveries via exported drainPendingDeliveries
Description
If third-party plugins are not fully trusted (or are intended to be capability-scoped), this expands the attack surface:
Vulnerable exports/usage: // Public SDK re-export
export { drainPendingDeliveries };const matchingEntries = (await loadPendingDeliveries(opts.stateDir))
.map((entry) => ({ entry, decision: opts.selectEntry(entry, now) }))
.filter((item) => item.decision.match);
if (!decision.bypassBackoff) {
const retryEligibility = isEntryEligibleForRecoveryRetry(currentEntry, Date.now());
...
}RecommendationDo not expose a generic queue-draining primitive to plugins unless plugins are fully trusted. If plugins are intended to be capability-scoped, add guardrails:
Example of a safer, scoped API surface: export async function drainChannelAccountDeliveries(opts: {
channel: "whatsapp";
accountId: string;
...
}) {
return drainPendingDeliveries({
drainKey: `plugin:${opts.channel}:${opts.accountId}`,
logLabel: `${opts.channel} drain`,
...,
selectEntry: (entry) => ({
match: entry.channel === opts.channel && entry.accountId === opts.accountId,
// never allow bypass from plugins
bypassBackoff: false,
}),
});
}2. 🟡 Stale backoff-bypass decision in drainPendingDeliveries can cause rapid retry storms
Description
However, the function continues to use the stale
This can create unintended aggressive retry loops during reconnect/startup, leading to self-induced denial of service (CPU/network churn) and possible upstream rate-limit bans. RecommendationRecompute the drain decision after reloading For example: const currentEntry = await loadPendingDelivery(entry.id, opts.stateDir);
if (!currentEntry) return;
const now2 = Date.now();
const decision2 = opts.selectEntry(currentEntry, now2);
if (!decision2.match) return; // entry no longer matches criteria
if (!decision2.bypassBackoff) {
const retryEligibility = isEntryEligibleForRecoveryRetry(currentEntry, now2);
if (!retryEligibility.eligible) return;
}Additionally, consider making 3. 🟡 Path traversal via unsanitized delivery queue entry id in loadPendingDelivery/ackDelivery/failDelivery/moveToFailed
DescriptionThe delivery queue storage builds file paths by directly concatenating the caller-provided
Vulnerable code: function resolveQueueEntryPaths(id: string, stateDir?: string) {
const queueDir = resolveQueueDir(stateDir);
return {
jsonPath: path.join(queueDir, `${id}.json`),
deliveredPath: path.join(queueDir, `${id}.delivered`),
};
}
export async function loadPendingDelivery(id: string, stateDir?: string) {
const { jsonPath } = resolveQueueEntryPaths(id, stateDir);
const stat = await fs.promises.stat(jsonPath);
// ... readQueueEntry(jsonPath) ...
// ... writeQueueEntry(jsonPath, entry) on migration ...
}While current internal callers appear to use IDs generated by RecommendationConstrain
const SAFE_ID_RE = /^[a-f0-9-]{36}$/i;
if (!SAFE_ID_RE.test(id)) throw new Error("Invalid queue entry id");
const queueDir = resolveQueueDir(stateDir);
const jsonPath = path.resolve(queueDir, `${id}.json`);
if (!jsonPath.startsWith(path.resolve(queueDir) + path.sep)) {
throw new Error("Path traversal");
}
Apply the same validation/containment in all functions that accept an 4. 🔵 Log injection via untrusted drainKey/logLabel interpolation in drainPendingDeliveries()
Description
Vulnerable code: opts.log.info(`${opts.logLabel}: already in progress for ${opts.drainKey}, skipping`);
...
opts.log.info(`${opts.logLabel}: ${matchingEntries.length} pending message(s) matched ${opts.drainKey}`);RecommendationNeutralize untrusted values before placing them into log message text, or log them as structured fields so the logger/runtime can escape them. Option A: sanitize to a safe printable subset (strip control chars): function sanitizeForLog(value: string): string {
return value.replace(/[\r\n\t\0\f\v]/g, " ");
}
const drainKeySafe = sanitizeForLog(opts.drainKey);
const labelSafe = sanitizeForLog(opts.logLabel);
opts.log.info(`${labelSafe}: already in progress for ${drainKeySafe}, skipping`);Option B: structured logging (preferred when supported): (opts.log as any).info?.({ drainKey: opts.drainKey }, `${opts.logLabel}: already in progress, skipping`);Also consider constraining Analyzed PR: #63916 at commit Last updated on: 2026-04-10T02:26:19Z |
Greptile SummaryThis PR widens the WhatsApp reconnect drain to cover all same-account pending entries that are already eligible to retry — not just those explicitly marked with the "no listener" error — while preserving the immediate-replay behaviour for the no-listener case. The policy is now owned by the WhatsApp monitor via a Confidence Score: 5/5Safe to merge; the only finding is a minor SDK type-export gap that doesn't affect runtime behavior or existing callers. All remaining findings are P2. The logic is correct: concurrency guards carry over, the per-entry re-read after claiming prevents stale double-send, backoff bypass is properly scoped to the no-listener case, and the new test cases lock in every scenario from the regression plan. The prior P1 concern (unreachable drainReconnectQueue) has been resolved by moving it to infra-runtime.ts as a documented compatibility shim. src/plugin-sdk/infra-runtime.ts — the newly public drainPendingDeliveries export lacks companion type re-exports. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/plugin-sdk/infra-runtime.ts
Line: 79
Comment:
**Missing companion type exports for `drainPendingDeliveries`**
`drainPendingDeliveries` is exported as a public SDK symbol, but the types a caller needs to annotate their `selectEntry` callback — `PendingDeliveryDrainDecision`, `QueuedDelivery`, `RecoveryLogger`, and `DeliverFn` — are not re-exported from this file. TypeScript inference covers the common case (inline callback with no explicit annotation), but any plugin that wants to define the callback separately or assign it to a typed variable will have no way to import the required types from the public SDK surface.
Consider adding the missing re-exports alongside the function:
```typescript
export { drainPendingDeliveries };
export type {
DeliverFn,
PendingDeliveryDrainDecision,
QueuedDelivery,
RecoveryLogger,
} from "../infra/outbound/delivery-queue.js";
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "fix(plugin-sdk): restore drainReconnectQ..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94d93b31b5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13c95751d1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Update after review pass: What is now done in this PR:
What I explicitly decided not to fix in this PR:
Scoped verification run after the fixes above:
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea244d1f5b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ea244d1 to
125e073
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 125e0739fb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| match: | ||
| entry.channel === "whatsapp" && | ||
| normalizeReconnectAccountId(entry.accountId) === normalizedAccountId, |
There was a problem hiding this comment.
Skip fresh queue entries during reconnect replay
This selector now matches every same-account WhatsApp queue row, including first-attempt entries that have never failed. Fresh evidence in this commit is src/infra/outbound/delivery-queue.reconnect-drain.test.ts adding coverage that explicitly drains “fresh pending WhatsApp entries,” which confirms retryCount=0 rows are replay candidates. Since deliverOutboundPayloads writes the queue row before send and only acks after send completion (src/infra/outbound/deliver.ts), a reconnect that happens while that first send is still in flight can trigger drainPendingDeliveries to send the same payload a second time, causing duplicate outbound messages.
Useful? React with 👍 / 👎.
* fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
…aw#63916) * fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
…aw#63916) * fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
…aw#63916) * fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
…aw#63916) * fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
…aw#63916) * fix(whatsapp): drain eligible pending deliveries on reconnect * docs(changelog): note whatsapp reconnect pending drain
Summary
No active WhatsApp Web listener, so fresh pending WhatsApp entries stayed stuck until the next startup.delivery-queue/even after WhatsApp came back.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
lastErroralready contained the no-listener failure, so fresh pending entries and ordinary retry-eligible entries were invisible to reconnect recovery.Regression Test Plan (if applicable)
src/infra/outbound/delivery-queue.reconnect-drain.test.tsUser-visible / Behavior Changes
Queued WhatsApp messages for the reconnecting account now replay after reconnect if they are already eligible to retry, even when they were never attempted before. The previous immediate-retry behavior for
No active WhatsApp Web listenerfailures remains intact.Diagram (if applicable)
Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation:Repro + Verification
Environment
Steps
delivery-queue/without another gateway restart.Expected
Actual
Evidence
Human Verification (required)
Review Conversations
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoRisks and Mitigations