Port DescriptorTable to Rust by sporksmith · Pull Request #1367 · shadow/shadow

sporksmith · 2021-05-19T05:27:09Z

Wanted to start nailing down some more of how the object graph will work out in Rust; started to port Process to Rust, but looks like DescriptorTable needed to be ported first. (Though in hindsight I suppose the Rust Process could have just stored a Box<DescriptorTable>)

codecov · 2021-05-19T05:37:46Z

Codecov Report

Merging #1367 (4412027) into dev (8d643bb) will increase coverage by 0.02%.
The diff coverage is 59.52%.

❗ Current head 4412027 differs from pull request most recent head da9022f. Consider uploading reports for the commit da9022f to get more accurate results

@@            Coverage Diff             @@
##              dev    #1367      +/-   ##
==========================================
+ Coverage   54.13%   54.15%   +0.02%     
==========================================
  Files         138      137       -1     
  Lines       20636    20564      -72     
  Branches     5199     5183      -16     
==========================================
- Hits        11171    11137      -34     
+ Misses       6553     6517      -36     
+ Partials     2912     2910       -2

Flag	Coverage Δ
tests	`54.15% <59.52%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/test/file/test_file.c	`49.79% <28.57%> (-2.05%)`	⬇️
src/main/host/descriptor/descriptor.c	`73.52% <75.00%> (+0.09%)`	⬆️
src/main/host/descriptor/epoll.c	`78.59% <100.00%> (ø)`
src/main/host/process.c	`68.53% <100.00%> (ø)`
src/main/host/syscall/epoll.c	`61.70% <100.00%> (+0.41%)`	⬆️
src/main/host/syscall/poll.c	`81.25% <100.00%> (+0.34%)`	⬆️
src/support/logger/rust_bindings/src/lib.rs	`36.46% <0.00%> (-0.05%)`	⬇️
src/main/routing/topology.c	`45.92% <0.00%> (+1.87%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e451d04...da9022f. Read the comment docs.

sporksmith · 2021-05-19T13:50:11Z

Sorry for spamming reviewer (un)assignment - I'm being wishy washy about whether to change to a Vec, which I suspect would be faster and use less memory in the common case. Going to sketch it out a bit more.

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

robgjansen · 2021-05-19T14:41:38Z

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

I believe the Linux behavior is to always return the lowest free descriptor. I'm not sure that it's important to mirror this exact behavior, though I suppose some app could build logic that's dependent on this behavior.

stevenengler · 2021-05-19T15:37:28Z

Sorry for spamming reviewer (un)assignment - I'm being wishy washy about whether to change to a Vec, which I suspect would be faster and use less memory in the common case. Going to sketch it out a bit more.

No problem. I think a vector is a good idea, it's probably faster in all cases, but I'm not sure at what point the memory usage of a spares vector becomes greater than a hash table. In a large tor experiment, what would the upper bound be on the number of file descriptors per process? Maybe ~2,000 sockets for a guard/exit?

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

The order of the descriptors is useful for i/o redirection. It's not as important now that there are dup2 and dup3 syscalls (which Shadow doesn't support), but some software (especially older software) relies on the ordering. For example, if you want to write stdout to a file, you would use:

close(0);
open("filename");

And you could rely on the kernel giving the new file a fd of 0.

I think we should keep this behaviour as it is the expected behaviour and is part of posix, unless there's a strong reason not to. It also makes debugging error cases easier in tests when the program uses the same fd numbers both inside and outside shadow.

I don't think we need to keep a sorted list though, we could just iterate over them until we find an unused one.

stevenengler · 2021-05-19T16:53:58Z

src/main/host/descriptor/descriptor_table.rs

+    /// Remove the descriptor at the given index and return it.  Panics if there
+    /// is no descriptor at `idx`.
+    pub fn remove(&mut self, idx: u32) -> CompatDescriptor {
+        let mut descriptor = self.descriptors.remove(&idx).unwrap();
+        self.available_indices.push(idx);
+        descriptor.set_handle(0);
+        descriptor
+    }


descriptortable_remove() is called from process_deregisterCompatDescriptor(), and I don't think we want shadow to panic if a plugin calls close() on a fd that doesn't exist. Maybe return an Option instead?

stevenengler · 2021-05-19T16:56:34Z

src/main/host/descriptor/descriptor_table.rs

+    pub fn set(&mut self, index: u32, descriptor: CompatDescriptor) -> Option<CompatDescriptor> {
+        let mut descriptor = descriptor;


Shortened as just pub fn set(... , mut descriptor: CompatDescriptor)?

I thought it'd be a little weird to have mut in the interface, since the caller shouldn't care. I checked cargo doc though, and it looks like the generated docs do ignore it, so it's mostly moot :). Done.

stevenengler · 2021-05-19T17:00:45Z

src/main/host/descriptor/descriptor_table.rs

+        super::export::compatdescriptor_setHandle(
+            &mut descriptor as *mut CompatDescriptor,
+            index.try_into().unwrap(),
+        );


compatdescriptor_setHandle() is only called from descriptor_table.c, so this compatdescriptor_setHandle() function could be removed entirely from shadow and replaced with just an inline:

if let CompatDescriptor::Legacy(d) = &mut descriptor { unsafe { c::descriptor_setHandle(d.ptr(), index) } }

I'd already created CompatDescriptor::set_handle (i.e. a Rust native version), but had missed a couple call sites. Fixed.

stevenengler · 2021-05-19T17:14:43Z

src/main/host/descriptor/descriptor_table.rs

+        table: *mut DescriptorTable,
+        descriptor: *mut CompatDescriptor,
+    ) -> c_int {
+        let table = unsafe { table.as_mut().unwrap() };


table.as_mut() -> notnull_mut_debug(table).as_mut()? (And same with the other descriptortable_ functions below).

as_mut and as_ref are safe to call on null pointers: https://doc.rust-lang.org/std/primitive.pointer.html#method.as_mut

stevenengler · 2021-05-19T17:21:34Z

src/main/host/descriptor/descriptor_table.rs

+    /// Store a descriptor object for later reference at the next available index
+    /// in the table. The chosen table index is stored in the descriptor object and
+    /// returned. The descriptor is guaranteed to be stored successfully.
+    ///
+    /// NOTE: that this consumes a reference to the descriptor, so if you are storing
+    /// it outside of the descriptor table you will need to ref it after calling
+    /// this function.


These comments from descriptor_table.h are out of date. The descriptor object doesn't generally store the table index anymore (only the legacy descriptors do).

Also, the descriptor table consumes the descriptor, so it should not be stored or used after this function call.

stevenengler · 2021-05-19T17:29:34Z

src/main/host/descriptor/descriptor_table.rs

+    /// Stop storing the descriptor so that it can no longer be referenced. The table
+    /// index that was used to store the descriptor is cleared from the descriptor
+    /// and may be assigned to new descriptors that are later added to the table.
+    /// Returns an owned pointer to the CompatDescriptor if the descriptor was found
+    /// in the table and removed, and NULL otherwise.
+    ///
+    /// NOTE: this will not unref the descriptor and you should do so manually.
+    #[no_mangle]
+    pub unsafe extern "C" fn descriptortable_remove(


The returned type CompatDescriptor doesn't have a reference count. Maybe the comment should say it should be freed with compatdescriptor_free()?

Also this function doesn't ever return NULL, but probably should following the earlier comment about process_deregisterCompatDescriptor().

stevenengler · 2021-05-19T17:43:49Z

src/main/host/descriptor/descriptor_table.rs

+pub struct DescriptorTable {
+    descriptors: std::collections::HashMap<u32, CompatDescriptor>,
+    available_indices: Vec<u32>,
+    index_counter: u32,
+}


If we want to keep the available indices sorted so that we can always take the smallest, a BTreeSet might work well here.

sporksmith · 2021-05-19T20:28:54Z

is part of posix

Ok, yeah that clinches it for me :). I see open(2) specifically says it'll return the lowest available as well. Added back in this behavior and added a test for it.

but I'm not sure at what point the memory usage of a spares vector becomes greater than a hash table

Yeah, that's the main drawback. While I still think a Vec is probably going to better in most cases, it won't be by that much, and could have unexpected poor behavior if it is sparse. Leaving it as a HashMap for this PR.

stevenengler

Looks good!

stevenengler · 2021-05-19T20:43:21Z

src/main/host/descriptor/descriptor_table.rs

+        let idx = if let Some(idx) = self.available_indices.iter().next() {
+            // Un-borrow from `available_indices`.
+            let idx = *idx;
+            // Take from `available_indices`
+            trace!("Reusing available index {}", idx);
+            self.available_indices.remove(&idx);
+            idx


Ah, it's too bad BTreeSet::pop_first() isn't stable yet.

Yeah :/. I suppose we could backport it like I did for stream_len, but doesn't seem worth it for just this.

sporksmith requested a review from stevenengler May 19, 2021 05:27

sporksmith removed the request for review from stevenengler May 19, 2021 09:17

sporksmith marked this pull request as draft May 19, 2021 09:17

sporksmith requested a review from stevenengler May 19, 2021 13:42

sporksmith self-assigned this May 19, 2021

sporksmith marked this pull request as ready for review May 19, 2021 13:43

sporksmith removed the request for review from stevenengler May 19, 2021 13:45

stevenengler requested changes May 19, 2021

View reviewed changes

sporksmith requested a review from stevenengler May 19, 2021 20:29

stevenengler approved these changes May 19, 2021

View reviewed changes

sporksmith force-pushed the port-desc-table branch 2 times, most recently from 7faa630 to 4412027 Compare May 19, 2021 22:20

DescriptorTable: Rewrite in Rust

da9022f

sporksmith force-pushed the port-desc-table branch from 4412027 to da9022f Compare May 19, 2021 22:26

sporksmith enabled auto-merge May 19, 2021 22:26

robgjansen added the Component: Main Composing the core Shadow executable label May 19, 2021

robgjansen added this to the Convert existing modules to Rust milestone May 19, 2021

robgjansen added the Type: Maintenance Refactoring, cleanup, documenation, or process improvements label May 19, 2021

sporksmith merged commit 394eb32 into shadow:dev May 19, 2021

		pub fn set(&mut self, index: u32, descriptor: CompatDescriptor) -> Option<CompatDescriptor> {
		let mut descriptor = descriptor;

Conversation

sporksmith commented May 19, 2021

Uh oh!

codecov bot commented May 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sporksmith commented May 19, 2021

Uh oh!

robgjansen commented May 19, 2021

Uh oh!

stevenengler commented May 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sporksmith commented May 19, 2021

Uh oh!

stevenengler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented May 19, 2021 •

edited

Loading

stevenengler commented May 19, 2021 •

edited

Loading