Skip to content

quiche: support connection ID longer than 8 bytes with multiple worker threads#16320

Merged
ggreenway merged 3 commits intoenvoyproxy:mainfrom
danzh2010:verifybpf
May 6, 2021
Merged

quiche: support connection ID longer than 8 bytes with multiple worker threads#16320
ggreenway merged 3 commits intoenvoyproxy:mainfrom
danzh2010:verifybpf

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 commented May 4, 2021

Signed-off-by: Dan Zhang danzh@google.com

Commit Message: fix an issue where CID longer than 8 bytes is replaced by a hashed 8-byte CID which isn't compatible with existing BPF filter which requires stable first 4-byte in CID.

Testing: added integration tests for 9-byte CID
Fixes #15988

Signed-off-by: Dan Zhang <danzh@google.com>
@ggreenway
Copy link
Copy Markdown
Member

How is a longer-than-8-byte CID generated in the first place? Aren't the CIDs generated by quiche?

@ggreenway ggreenway self-assigned this May 4, 2021
@alyssawilk
Copy link
Copy Markdown
Contributor

Awesome! So glad this is tracked down :-)

@moderation can you confirm this fixes your set up for concurrency > 1?

alyssawilk
alyssawilk previously approved these changes May 5, 2021
@danzh2010
Copy link
Copy Markdown
Contributor Author

How is a longer-than-8-byte CID generated in the first place? Aren't the CIDs generated by quiche?

Google QUICHE client uses 8-byte CID, which is why we didn't catch this issue earlier. But other implementations, i.e. Cloudflare QUICHE, use longer CID to start with.

@ggreenway
Copy link
Copy Markdown
Member

How is a longer-than-8-byte CID generated in the first place? Aren't the CIDs generated by quiche?

Google QUICHE client uses 8-byte CID, which is why we didn't catch this issue earlier. But other implementations, i.e. Cloudflare QUICHE, use longer CID to start with.

Oh, so this is the peer's ID? How is it valid to truncate it (or hash it)? I read through https://quicwg.org/base-drafts/rfc9000.html#name-connection-id and didn't see anything about modifying the peer's CID.

Is it possible that by truncating, there could be duplicates, eg the peer uses the same first 8 bytes of CID?

@ggreenway
Copy link
Copy Markdown
Member

Nevermind, this is all coming back to me now. Described here: https://quicwg.org/base-drafts/rfc9000.html#name-negotiating-connection-ids

Given that envoy only looks at the first 4 bytes in the BPF to steer, would it be safer to hash all bytes beyond the first 4 into a 4-byte hash for the last 4 bytes of the CID?

@moderation
Copy link
Copy Markdown
Contributor

With @alyssawilk help I can confirm that concurrency > 1 is now working in my environment! Woohoo

@alyssawilk
Copy link
Copy Markdown
Contributor

Awesome! Thanks so much for checking!

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

Nevermind, this is all coming back to me now. Described here: https://quicwg.org/base-drafts/rfc9000.html#name-negotiating-connection-ids

Given that envoy only looks at the first 4 bytes in the BPF to steer, would it be safer to hash all bytes beyond the first 4 into a 4-byte hash for the last 4 bytes of the CID?

Done

@ggreenway
Copy link
Copy Markdown
Member

Thanks for fixing this! CI is failing with ERROR: ./source/common/quic/envoy_quic_dispatcher.cc:96: Don't call memcpy() directly; use safeMemcpy, safeMemcpyUnsafeSrc, safeMemcpyUnsafeDst or MemBlockBuilder instead..

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

Ping?

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ggreenway ggreenway merged commit 479b02b into envoyproxy:main May 6, 2021
@moderation
Copy link
Copy Markdown
Contributor

I've been trying to look for ways to verify that BPF is being used. I've installed the bcc tools and can run bps https://github.com/iovisor/bcc#bpf-introspection which is described as ps for BPF programs. With Envoy running either as sudo or with sudo setcap cap_bpf+ep envoy I don't see anything in the bps or bpflist output. Is this expected? Is there another way to verify that BPF is being used? /cc @ggreenway @alyssawilk @danzh2010

@danzh2010
Copy link
Copy Markdown
Contributor Author

I've been trying to look for ways to verify that BPF is being used. I've installed the bcc tools and can run bps https://github.com/iovisor/bcc#bpf-introspection which is described as ps for BPF programs. With Envoy running either as sudo or with sudo setcap cap_bpf+ep envoy I don't see anything in the bps or bpflist output. Is this expected?

Is Envoy started with multiple workers?

Is there another way to verify that BPF is being used? /cc @ggreenway @alyssawilk @danzh2010

I don't know much about bps or any other BPF monitoring tools. Somehow never needed to do so.
If your platform support BPF and Envoy doesn't crash upon startup, I would say it is installed correctly.
If the platform doesn't support it, you should see a warning "Efficient routing of QUIC packets to the correct worker is not supported or not implemented by Envoy on this platform. QUIC performance may be degraded."

@moderation
Copy link
Copy Markdown
Contributor

Yes Envoy starts with multiple workers. I'm testing on 6 core 12 thread Linux machine and on startup I get 12:

[2021-05-07 10:20:11.016][403778][info][quic] [bazel-out/k8-opt/bin/external/com_googlesource_quiche/quiche/quic/core/batch_writer/quic_gso_batch_writer.cc:36] Release time is not enabled.

And on shutdown 12:

[2021-05-07 10:20:14.979][403772][info][quic] [source/common/quic/active_quic_listener.cc:107] Quic listener listener_udp shutdown.

So I think it is working OK. However I thought in order for BPF filters to work either setcap or sudo would be required. As I don't see any log change either with or without these settings on the binary I thinking that they are not required and Envoy is leveraging an existing BPF filter?

I do get the warning when running on MacOS as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QUIC issues with concurrency > 1

5 participants