Skip to content

XTLS Vision: Defer Splice handoff until write completes#5737

Merged
RPRX merged 5 commits intoXTLS:mainfrom
HeXis-YS:fix-vision
Mar 22, 2026
Merged

XTLS Vision: Defer Splice handoff until write completes#5737
RPRX merged 5 commits intoXTLS:mainfrom
HeXis-YS:fix-vision

Conversation

@HeXis-YS
Copy link
Copy Markdown
Contributor

Summary

This PR addresses an intermittent Vision handoff race that can lead to TLS stream breakage (ERR_SSL_PROTOCOL_ERROR), as reported in #4878.

Root Cause

In VisionWriter.WriteMultiBuffer, inbound.CanSpliceCopy could become 1 before the current direct-write buffer had fully flushed.
That allowed splice mode to start while Vision was still writing to the same socket, creating a race at the handoff boundary.

Changes

  • Updated VisionWriter.WriteMultiBuffer in proxy/proxy.go:
    • keep splice readiness in a local variable during direct-write switch;
    • perform w.Writer.WriteMultiBuffer(mb) first;
    • only then set inbound.CanSpliceCopy = 1 (if still eligible).

This keeps existing Vision behavior intact while making splice handoff ordering deterministic.

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 2, 2026

@yuhan6665

@yuhan6665
Copy link
Copy Markdown
Member

@HeXis-YS thanks for your work.
I understand you want to make sure the buffer is written first and cleared, and then send the signal by setting the CanSpliceCopy flag. Although I believe there are more than one thread (reader) running and more than one buffer involved. So your proposal maybe better in some cases, but maybe not in others.
My test is this - see if we can get rid of the time.Sleep() in CopyRawConnIfExist() completely, and only rely on code signal to switch to splice copy - I feel like we may need another signal. If we can do that we probably really solve the root cause.

Copilot AI review requested due to automatic review settings March 3, 2026 07:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate an intermittent XTLS Vision splice handoff race (reported in #4878) by making splice enablement happen deterministically only after the relevant direct write completes, and by refactoring splice-copy state transitions behind session helper methods.

Changes:

  • Adds a session-level splice readiness signal (spliceCopySignal) and helper methods (ArmSpliceCopy, EnableSpliceCopy, DisableSpliceCopy, SpliceCopyReady).
  • Updates many inbounds/outbounds to use the new splice-copy helper methods instead of mutating CanSpliceCopy directly.
  • Adjusts Vision splice handoff logic in proxy/proxy.go to defer enabling splice until after WriteMultiBuffer returns, and updates splice gating checks to use SpliceCopyReady/State.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
common/session/splice.go Introduces splice readiness signaling + helper APIs for Inbound/Outbound splice-copy state.
common/session/session.go Adds internal spliceCopy field to Inbound/Outbound session metadata.
proxy/proxy.go Defers Vision splice enable until after write completion; updates splice checks to use new helpers.
proxy/socks/server.go Uses new splice-copy helpers for arming/enabling/disabling splice.
proxy/socks/client.go Uses new splice-copy helpers for arming/enabling splice during request/response handling.
proxy/http/server.go Uses new splice-copy helpers for arming/enabling/disabling splice around CONNECT handling.
proxy/http/client.go Uses new splice-copy helpers for arming/enabling splice in response path.
proxy/vless/inbound/inbound.go Replaces direct CanSpliceCopy manipulation with helper methods in flow handling.
proxy/vless/outbound/outbound.go Replaces direct CanSpliceCopy manipulation with helper methods in flow handling.
proxy/vmess/inbound/inbound.go Disables splice via helper method.
proxy/vmess/outbound/outbound.go Disables splice via helper method.
proxy/trojan/server.go Disables splice via helper method.
proxy/trojan/client.go Disables splice via helper method.
proxy/shadowsocks/server.go Disables splice via helper method.
proxy/shadowsocks/client.go Disables splice via helper method.
proxy/shadowsocks_2022/inbound.go Disables splice via helper method.
proxy/shadowsocks_2022/inbound_multi.go Disables splice via helper method.
proxy/shadowsocks_2022/inbound_relay.go Disables splice via helper method.
proxy/shadowsocks_2022/outbound.go Disables splice via helper method.
proxy/hysteria/server.go Disables splice via helper method.
proxy/hysteria/client.go Disables splice via helper method.
proxy/wireguard/server.go Uses helper method to disable splice; minor formatting alignment in routingInfo.
proxy/wireguard/client.go Uses helper method to disable splice.
proxy/freedom/freedom.go Uses helper method to enable splice.
proxy/dokodemo/dokodemo.go Uses helper method to enable splice.
common/mux/server.go Disables splice for mux connections via helper method.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +30
func (i *Inbound) EnableSpliceCopy() {
if i.CanSpliceCopy == 3 {
return
}
if i.spliceCopy != nil {
i.spliceCopy.once.Do(func() {
close(i.spliceCopy.ready)
})
}
i.CanSpliceCopy = 1
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnableSpliceCopy / SpliceCopyReady access i.spliceCopy multiple times without any synchronization. Since these flags are checked/updated across goroutines (splice loop vs protocol handlers), DisableSpliceCopy() can nil spliceCopy between the nil-check and subsequent field access, leading to a potential nil-pointer panic (and still a data race). Consider (1) capturing sc := i.spliceCopy once and only using sc (including inside the once.Do closure), and (2) adding proper synchronization (e.g., atomic.Pointer + atomic.Int32, or a mutex) so CanSpliceCopy/spliceCopy updates are race-free.

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +49
func (i *Inbound) SpliceCopyReady() bool {
if i.spliceCopy != nil {
select {
case <-i.spliceCopy.ready:
return true
default:
return false
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpliceCopyReady checks i.spliceCopy != nil and then reads i.spliceCopy.ready again in the select. If another goroutine calls DisableSpliceCopy() concurrently, this can panic due to i.spliceCopy becoming nil between those two reads. At minimum, read the pointer once into a local variable before the select; ideally also guard reads/writes to spliceCopy/CanSpliceCopy with atomics or a mutex.

Copilot uses AI. Check for mistakes.
@HeXis-YS
Copy link
Copy Markdown
Contributor Author

HeXis-YS commented Mar 3, 2026

@yuhan6665
I have tested it on my server and it works fine. However, my test scenarios are very limited.
Please feel free to test. Any ideas or suggestions are welcome.

@yuhan6665
Copy link
Copy Markdown
Member

yuhan6665 commented Mar 3, 2026

Can you try remove time.Sleep() in CopyRawConnIfExist() and test it again? nevermind I just see you did it

@yuhan6665
Copy link
Copy Markdown
Member

@HeXis-YS sorry but I can't understand what you added for the ready channel. I don't think it does anything more since there is no other new code signal than the existing place to call EnableSpliceCopy()
If you agree, can you remove the three new commits? And keep the removal of time.sleep() in CopyRawConnIfExist()

Maybe we can do a beta release and see if the issue is fixed.

@Fangliding
Copy link
Copy Markdown
Member

话说vless入站不是早就改成dispatchlink不走pipe了吗 就一个gouroutine在搬东西 还有竞争?

@HeXis-YS
Copy link
Copy Markdown
Contributor Author

HeXis-YS commented Mar 7, 2026

Done.
These 3 commits were actually intended to fix the problem raised by Copilot, but it seems Copilot wasn't that smart.

@yuhan6665
Copy link
Copy Markdown
Member

话说vless入站不是早就改成dispatchlink不走pipe了吗 就一个gouroutine在搬东西 还有竞争?

Seems you are right, let's give it a try

@RPRX RPRX force-pushed the main branch 2 times, most recently from 9529a82 to 2320416 Compare March 22, 2026 13:36
@RPRX RPRX changed the title proxy: defer Vision splice handoff until write completes XTLS Vision: Defer Splice handoff until write completes Mar 22, 2026
@RPRX RPRX merged commit f926ee4 into XTLS:main Mar 22, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants