Skip to content

Port DescriptorTable to Rust#1367

Merged
sporksmith merged 1 commit intoshadow:devfrom
sporksmith:port-desc-table
May 19, 2021
Merged

Port DescriptorTable to Rust#1367
sporksmith merged 1 commit intoshadow:devfrom
sporksmith:port-desc-table

Conversation

@sporksmith
Copy link
Copy Markdown
Contributor

Wanted to start nailing down some more of how the object graph will work out in Rust; started to port Process to Rust, but looks like DescriptorTable needed to be ported first. (Though in hindsight I suppose the Rust Process could have just stored a Box<DescriptorTable>)

@sporksmith sporksmith requested a review from stevenengler May 19, 2021 05:27
@codecov
Copy link
Copy Markdown

codecov bot commented May 19, 2021

Codecov Report

Merging #1367 (4412027) into dev (8d643bb) will increase coverage by 0.02%.
The diff coverage is 59.52%.

❗ Current head 4412027 differs from pull request most recent head da9022f. Consider uploading reports for the commit da9022f to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1367      +/-   ##
==========================================
+ Coverage   54.13%   54.15%   +0.02%     
==========================================
  Files         138      137       -1     
  Lines       20636    20564      -72     
  Branches     5199     5183      -16     
==========================================
- Hits        11171    11137      -34     
+ Misses       6553     6517      -36     
+ Partials     2912     2910       -2     
Flag Coverage Δ
tests 54.15% <59.52%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/test/file/test_file.c 49.79% <28.57%> (-2.05%) ⬇️
src/main/host/descriptor/descriptor.c 73.52% <75.00%> (+0.09%) ⬆️
src/main/host/descriptor/epoll.c 78.59% <100.00%> (ø)
src/main/host/process.c 68.53% <100.00%> (ø)
src/main/host/syscall/epoll.c 61.70% <100.00%> (+0.41%) ⬆️
src/main/host/syscall/poll.c 81.25% <100.00%> (+0.34%) ⬆️
src/support/logger/rust_bindings/src/lib.rs 36.46% <0.00%> (-0.05%) ⬇️
src/main/routing/topology.c 45.92% <0.00%> (+1.87%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e451d04...da9022f. Read the comment docs.

@sporksmith sporksmith removed the request for review from stevenengler May 19, 2021 09:17
@sporksmith sporksmith marked this pull request as draft May 19, 2021 09:17
@sporksmith sporksmith requested a review from stevenengler May 19, 2021 13:42
@sporksmith sporksmith self-assigned this May 19, 2021
@sporksmith sporksmith marked this pull request as ready for review May 19, 2021 13:43
@sporksmith sporksmith removed the request for review from stevenengler May 19, 2021 13:45
@sporksmith
Copy link
Copy Markdown
Contributor Author

Sorry for spamming reviewer (un)assignment - I'm being wishy washy about whether to change to a Vec, which I suspect would be faster and use less memory in the common case. Going to sketch it out a bit more.

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

@robgjansen
Copy link
Copy Markdown
Member

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

I believe the Linux behavior is to always return the lowest free descriptor. I'm not sure that it's important to mirror this exact behavior, though I suppose some app could build logic that's dependent on this behavior.

@stevenengler
Copy link
Copy Markdown
Contributor

stevenengler commented May 19, 2021

Sorry for spamming reviewer (un)assignment - I'm being wishy washy about whether to change to a Vec, which I suspect would be faster and use less memory in the common case. Going to sketch it out a bit more.

No problem. I think a vector is a good idea, it's probably faster in all cases, but I'm not sure at what point the memory usage of a spares vector becomes greater than a hash table. In a large tor experiment, what would the upper bound be on the number of file descriptors per process? Maybe ~2,000 sockets for a guard/exit?

By the way - do you happen to know whether it's important to assign the lowest free descriptor first? Or is arbitrary order ok? e.g. if we create fd's 0, 1, 2, 3, 4, 5, then destroy 2, then destroy 3, is it ok if the next one created is 3? The C version sorts the free list to ensure the lowest free # is always returned, but the tests pass with arbitrary order.

The order of the descriptors is useful for i/o redirection. It's not as important now that there are dup2 and dup3 syscalls (which Shadow doesn't support), but some software (especially older software) relies on the ordering. For example, if you want to write stdout to a file, you would use:

close(0);
open("filename");

And you could rely on the kernel giving the new file a fd of 0.

I think we should keep this behaviour as it is the expected behaviour and is part of posix, unless there's a strong reason not to. It also makes debugging error cases easier in tests when the program uses the same fd numbers both inside and outside shadow.

I don't think we need to keep a sorted list though, we could just iterate over them until we find an unused one.

Comment on lines +57 to +87
/// Remove the descriptor at the given index and return it. Panics if there
/// is no descriptor at `idx`.
pub fn remove(&mut self, idx: u32) -> CompatDescriptor {
let mut descriptor = self.descriptors.remove(&idx).unwrap();
self.available_indices.push(idx);
descriptor.set_handle(0);
descriptor
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptortable_remove() is called from process_deregisterCompatDescriptor(), and I don't think we want shadow to panic if a plugin calls close() on a fd that doesn't exist. Maybe return an Option instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +73 to +74
pub fn set(&mut self, index: u32, descriptor: CompatDescriptor) -> Option<CompatDescriptor> {
let mut descriptor = descriptor;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened as just pub fn set(... , mut descriptor: CompatDescriptor)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it'd be a little weird to have mut in the interface, since the caller shouldn't care. I checked cargo doc though, and it looks like the generated docs do ignore it, so it's mostly moot :). Done.

Comment on lines +75 to +78
super::export::compatdescriptor_setHandle(
&mut descriptor as *mut CompatDescriptor,
index.try_into().unwrap(),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatdescriptor_setHandle() is only called from descriptor_table.c, so this compatdescriptor_setHandle() function could be removed entirely from shadow and replaced with just an inline:

if let CompatDescriptor::Legacy(d) = &mut descriptor {
    unsafe { c::descriptor_setHandle(d.ptr(), index) }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd already created CompatDescriptor::set_handle (i.e. a Rust native version), but had missed a couple call sites. Fixed.

table: *mut DescriptorTable,
descriptor: *mut CompatDescriptor,
) -> c_int {
let table = unsafe { table.as_mut().unwrap() };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table.as_mut() -> notnull_mut_debug(table).as_mut()? (And same with the other descriptortable_ functions below).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_mut and as_ref are safe to call on null pointers: https://doc.rust-lang.org/std/primitive.pointer.html#method.as_mut

Comment on lines +124 to +130
/// Store a descriptor object for later reference at the next available index
/// in the table. The chosen table index is stored in the descriptor object and
/// returned. The descriptor is guaranteed to be stored successfully.
///
/// NOTE: that this consumes a reference to the descriptor, so if you are storing
/// it outside of the descriptor table you will need to ref it after calling
/// this function.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments from descriptor_table.h are out of date. The descriptor object doesn't generally store the table index anymore (only the legacy descriptors do).

Also, the descriptor table consumes the descriptor, so it should not be stored or used after this function call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +141 to +176
/// Stop storing the descriptor so that it can no longer be referenced. The table
/// index that was used to store the descriptor is cleared from the descriptor
/// and may be assigned to new descriptors that are later added to the table.
/// Returns an owned pointer to the CompatDescriptor if the descriptor was found
/// in the table and removed, and NULL otherwise.
///
/// NOTE: this will not unref the descriptor and you should do so manually.
#[no_mangle]
pub unsafe extern "C" fn descriptortable_remove(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned type CompatDescriptor doesn't have a reference count. Maybe the comment should say it should be freed with compatdescriptor_free()?

Also this function doesn't ever return NULL, but probably should following the earlier comment about process_deregisterCompatDescriptor().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +8 to +18
pub struct DescriptorTable {
descriptors: std::collections::HashMap<u32, CompatDescriptor>,
available_indices: Vec<u32>,
index_counter: u32,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep the available indices sorted so that we can always take the smallest, a BTreeSet might work well here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sporksmith
Copy link
Copy Markdown
Contributor Author

is part of posix

Ok, yeah that clinches it for me :). I see open(2) specifically says it'll return the lowest available as well. Added back in this behavior and added a test for it.

but I'm not sure at what point the memory usage of a spares vector becomes greater than a hash table

Yeah, that's the main drawback. While I still think a Vec is probably going to better in most cases, it won't be by that much, and could have unexpected poor behavior if it is sparse. Leaving it as a HashMap for this PR.

@sporksmith sporksmith requested a review from stevenengler May 19, 2021 20:29
Copy link
Copy Markdown
Contributor

@stevenengler stevenengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +33 to +37
let idx = if let Some(idx) = self.available_indices.iter().next() {
// Un-borrow from `available_indices`.
let idx = *idx;
// Take from `available_indices`
trace!("Reusing available index {}", idx);
self.available_indices.remove(&idx);
idx
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's too bad BTreeSet::pop_first() isn't stable yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :/. I suppose we could backport it like I did for stream_len, but doesn't seem worth it for just this.

@sporksmith sporksmith force-pushed the port-desc-table branch 2 times, most recently from 7faa630 to 4412027 Compare May 19, 2021 22:20
@sporksmith sporksmith enabled auto-merge May 19, 2021 22:26
@robgjansen robgjansen added the Component: Main Composing the core Shadow executable label May 19, 2021
@robgjansen robgjansen added the Type: Maintenance Refactoring, cleanup, documenation, or process improvements label May 19, 2021
@sporksmith sporksmith merged commit 394eb32 into shadow:dev May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Main Composing the core Shadow executable Type: Maintenance Refactoring, cleanup, documenation, or process improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants