Skip to content

Migrate packet to Rust#3492

Merged
robgjansen merged 8 commits intoshadow:mainfrom
robgjansen:packet-rs2
Feb 17, 2025
Merged

Migrate packet to Rust#3492
robgjansen merged 8 commits intoshadow:mainfrom
robgjansen:packet-rs2

Conversation

@robgjansen
Copy link
Copy Markdown
Member

Eliminates usage of the C packet and payload structs. This allows us to track application payloads using Bytes objects in Rust, which should make Rust code such as the Rust TCP stack and UDP payload handling more efficient.

@robgjansen robgjansen self-assigned this Jan 13, 2025
@github-actions github-actions bot added Component: Testing Unit and integration tests and frameworks Component: Main Composing the core Shadow executable Component: Documentation In-repository documentation, under docs/ labels Jan 13, 2025
@robgjansen
Copy link
Copy Markdown
Member Author

Apologies for the large change, especially in the first commit, but I didn't see a great way to incrementally remove the C packet.

@robgjansen
Copy link
Copy Markdown
Member Author

I was worried that adding an Arc<Packet> to every PacketRc was going to introduce a big performance hit compared to the unsafe inner C packet pointer that we used to use, so I ran the benchmarks:

We do see a runtime slowdown effect due to the additional atomic operations as expected.

run_time

I'm not sure how bad of a performance hit this is. If we think it's bad, we could consider using RootedRc, but that's makes the code a bit less flexible and I don't remember if we wanted to phase out the use of that module.

@stevenengler
Copy link
Copy Markdown
Contributor

stevenengler commented Jan 14, 2025

I was worried that adding an Arc<Packet> to every PacketRc was going to introduce a big performance hit compared to the unsafe inner C packet pointer that we used to use, so I ran the benchmarks:

We do see a runtime slowdown effect due to the additional atomic operations as expected.

I'm not sure I'm convinced that the slowdown is caused by Arcs (or atomics in general). I haven't looked at this PR closely yet, but I ran a benchmark using the C packet with atomic reference counting, and didn't see any significant performance change.

https://github.com/shadow/benchmark-results/tree/master/tor/2025-01-14-T18-06-33

run_time

If it is the Arc, maybe we're cloning it much more than we were increasing the refcount of C packets. Or maybe we're doing more memory allocations, or something else.

@robgjansen
Copy link
Copy Markdown
Member Author

OK, thanks for checking into that! There's probably something else we could be doing better in the code then, which is good news :)

Copy link
Copy Markdown
Contributor

@stevenengler stevenengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me! I'm not sure what's causing the performance difference though. We are doing more allocations and more packet refs/unrefs, so maybe it is one of those. I'll leave it up to you whether you want to merge with the performance as-is. If we do, I think this should also be mentioned in the changelog. I'll also take another look at the performance, but I'm not sure how best to debug other than just trying to reduce the number of allocations and packet refs/unrefs, and then seeing if that fixes it. Shadow does seem to be behaving the same though, so it seems like only a change in performance and not behaviour.

@robgjansen
Copy link
Copy Markdown
Member Author

To get a handle on the performance issue, I set up a simple experiment where a tgen client downloaded 100 MiB from a tgen server. Then I profiled performance using perf by running the following before and after this PR:

perf record --call-graph fp ~/dev/shadow/build/src/main/shadow shadow.yaml

Comparing the before/after diff using perf diff shows the following:

Screenshot 2025-02-13 at 5 01 37 PM

This highlights possible problems with ObjectCounter. Sure enough, it turns out I added a new ObjectCounter to the PacketRc wrapper, but PacketRcs were not being counted previously when it wrapped the C packet. Checking the object counts in the log shows that indeed this could be significant overhead:

Global allocated object counts: {PacketRc:4132234, Packet:148978,

That's 4.1 million new objects being counted just in this simple, small experiment. Removing the PacketRc object counter seems to get rid of the hotspot as perf diff shows:

Screenshot 2025-02-13 at 5 07 25 PM

Next I'll push that change and then run a Tor benchmark to double check.

@robgjansen
Copy link
Copy Markdown
Member Author

Removing the object counter from PacketRc has mitigated most of the performance degradation.

Full Tor benchmark results:
https://github.com/shadow/benchmark-results/tree/master/tor/2025-02-13-T22-25-31

@stevenengler
Copy link
Copy Markdown
Contributor

This highlights possible problems with ObjectCounter.

Nice find! I'm surprised that the object counter was the problem. I think it's just an extra hash table lookup each time a packet is created/destroyed? I guess there's a lot of packets, but the hash table accesses seem small compared to everything else we do with packets (allocating memory, copying payload data, etc).

@robgjansen robgjansen merged commit b84b2b1 into shadow:main Feb 17, 2025
25 checks passed
@robgjansen robgjansen deleted the packet-rs2 branch February 17, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Documentation In-repository documentation, under docs/ Component: Main Composing the core Shadow executable Component: Testing Unit and integration tests and frameworks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants