fix(wecom): fix context leak in Start() and data race in processedMsgs#918
Conversation
Cancel the constructor-created context before overwriting in Start() to prevent the original cancel function from becoming unreachable. Move len(processedMsgs) check inside the write lock to eliminate a data race, and re-insert the current msgID after map reset to prevent duplicate processing of the in-flight message. Applies to both WeComBotChannel and WeComAppChannel.
nikolasdehor
left a comment
There was a problem hiding this comment.
Review: fix(wecom): fix context leak in Start() and data race in processedMsgs
Excellent fix addressing two real concurrency bugs. Both changes apply to WeComBotChannel and WeComAppChannel symmetrically.
Bug 1: Context Leak in Start()
The constructor creates a context (context.WithCancel) that gets overwritten in Start() without canceling the original. This leaks the original context's goroutine and resources.
Fix is correct: Cancel the constructor-created context before overwriting:
if c.cancel != nil {
c.cancel()
}
c.ctx, c.cancel = context.WithCancel(ctx)The nil check is important for safety, though in practice the constructor always sets cancel.
Bug 2: Data Race in processedMsgs
The original code had a subtle race:
c.processedMsgs[msgID] = true
c.msgMu.Unlock() // unlocks here
if len(c.processedMsgs) > 1000 { // reads map WITHOUT lock
c.msgMu.Lock()
c.processedMsgs = make(map[string]bool) // could race with another goroutine's read
c.msgMu.Unlock()
}Fix is correct: Moving the len() check and map reset inside the existing write lock eliminates the TOCTOU race. Re-inserting the current msgID after map reset is a nice touch -- without it, the in-flight message could be processed again by a concurrent goroutine between the reset and the next lock acquisition.
Quality
- Both fixes are applied identically to
app.goandbot.go, maintaining symmetry. - The changes are minimal and surgical -- no unnecessary refactoring.
- The logic is easy to verify by inspection.
Minor Suggestion
Consider using sync.Map for processedMsgs if the access pattern is mostly reads (checking for duplicates) with occasional writes. But that is an optimization, not a correctness issue -- the current fix is correct.
LGTM. Well-analyzed concurrency fixes.
Changes from upstream: - fix(channel): config cleanup and regex precompile (sipeed#911, sipeed#916) - fix(github_copilot): improve error handling (sipeed#919) - fix(wecom): context leaks and data race fixes (sipeed#914, sipeed#918) - fix(tools): HTTP client caching and response body cleanup (sipeed#940) - feat(tui): Add configurable Launcher and Gateway process management (sipeed#909) - feat(migrate): Update migration system with openclaw support (sipeed#910) - fix(skills): Use registry-backed search for skills discovery (sipeed#929) - build: Add armv6 support to goreleaser (sipeed#905) - docs: Sync READMEs and channel documentation
fix(wecom): fix context leak in Start() and data race in processedMsgs
fix(wecom): fix context leak in Start() and data race in processedMsgs
📝 Description
Cancel the constructor-created context before overwriting in Start() to prevent the original cancel function from becoming unreachable.
Move len(processedMsgs) check inside the write lock to eliminate a data race, and re-insert the current msgID after map reset to prevent duplicate processing of the in-flight message.
Applies to both WeComBotChannel and WeComAppChannel.