ByteQueue: Avoid unnecessary initialization, and take Read objs by sporksmith · Pull Request #1280 · shadow/shadow

sporksmith · 2021-04-09T17:48:22Z

Extracts pointer-access APIs into ProcessMemoryReader and ProcessMemoryWriter, which implement std::io::Read and std::io::Write.
Use ProcessMemoryWriter to ensure writes are flushed. The Drop trait on this object flushes writes, and the object must be dropped to release a mutable reference to its Process.
Add APIs to ByteQueue that take Read and Write objects.
In ByteQueue use reserved but unfilled buffer capacity to avoid unnecessary initialization.
Change some internal functions to take Read and Write objects instead of buffers.

codecov · 2021-04-09T17:57:23Z

Codecov Report

Merging #1280 (f0408e6) into dev (0ee3581) will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##              dev    #1280      +/-   ##
==========================================
+ Coverage   55.65%   55.68%   +0.02%     
==========================================
  Files         141      141              
  Lines       20619    20617       -2     
  Branches     5068     5068              
==========================================
+ Hits        11476    11480       +4     
+ Misses       6020     6015       -5     
+ Partials     3123     3122       -1

Flag	Coverage Δ
tests	`55.68% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/support/logger/rust_bindings/src/lib.rs	`34.89% <0.00%> (+0.10%)`	⬆️
src/main/core/worker.c	`78.20% <0.00%> (+0.96%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ee3581...f0408e6. Read the comment docs.

sporksmith · 2021-04-15T16:32:47Z

@stevenengler After sleeping on it realized a couple things I want to check/change. Removing you as a reviewer for the moment but feel free to continue if you've already started. I think it'll be small, and I won't rebase...

stevenengler · 2021-04-15T16:38:11Z

@stevenengler After sleeping on it realized a couple things I want to check/change. Removing you as a reviewer for the moment but feel free to continue if you've already started. I think it'll be small, and I won't rebase...

Feel free to rebase if it's easier. I've skimmed through it but haven't written comments yet.

sporksmith · 2021-04-15T16:49:51Z

Done. Narrowed the scope of the Reader and Writer implementations to what we're actually using now.

I was also double-checking whether it was really ok to remove the c::SYSCALL_IO_BUFSIZE checks in unistd.rs. I think it is since the descriptor types should do their own capping; I double-checked that indeed Pipe for example has a maximum buffer size.

I think we needed this before because when getReadablePtr and getWritablePtr needed to copy, we could end up copying much more than we actually use. Now that the memory reads/writes are deferred until the descriptor actually knows how much data to access I don't think we need it.

stevenengler

Cool, this reader/writer design looks nice!

stevenengler · 2021-04-16T03:44:40Z

src/main/host/descriptor/pipe.rs

        // the write would block if we could not write any bytes, but were asked to
-        if num_written == 0 && !bytes.is_empty() {
+        if usize::from(num_written) == 0 && bytes.bytes().next().is_some() {
            Err(nix::errno::EWOULDBLOCK.into())


Won't bytes.bytes().next() consume from the reader, so a byte from the reader will be lost? I think a more general issue is that the read(fd, count) syscall reads a specific number of bytes, but the length of astd::io::Read is unknown (and potentially unbounded). Maybe the read() and write() functions should take another argument for the max number of bytes (or maybe a std::io::Take) instead?

I think a more general issue is that the read(fd, count) syscall reads a specific number of bytes, but the length of astd::io::Read is unknown (and potentially unbounded).

Yup!

Maybe the read() and write() functions should take another argument for the max number of bytes (or maybe a std::io::Take) instead?

A Take object also implements Read. If the caller wants to put a cap on how much data is read into the ByteQueue, they can already use Take to do so. We do this a few lines up from here.

Unfortunately I don't think there is such an adapter in the standard library for Write. I don't think we'll need it in practice though, and if we do, I think it'd be cleaner to find or create such an adapter (Give?) .

Won't bytes.bytes().next() consume from the reader, so a byte from the reader will be lost?

Yeah; I was thinking this was ok since at this point nothing else is going to read from this reader object, but thinking about it again I agree this could end up being surprising behavior later.

In an earlier iteration ProcessMemoryReader and ProcessMemoryWriter also implemented the Seek trait, and I required the Seek trait for R here. Then here we can check whether seek(SeekFrom::Start(1)).is_ok(), and then rewind it before returning via seek(SeekFrom::Start(0)). I think I'll probably put that back in, but am open to other ideas.

A Take object also implements Read. If the caller wants to put a cap on how much data is read into the ByteQueue, they can already use Take to do so. We do this a few lines up from here.

The caller can use take() to limit how much data, but if the PipeFile::write() accepts only a Read, then it loses that information (it can't know the limit without reaching that limit).

Unfortunately I don't think there is such an adapter in the standard library for Write. I don't think we'll need it in practice though, and if we do, I think it'd be cleaner to find or create such an adapter (Give?) .

Looking at this again, I think I left a bug in PipeFile::read(), where it's not returning a EWOULDBLOCK instead of 0 if the pipe is empty :( So I think we will also need to know if the Write is non-empty or not.

In an earlier iteration ProcessMemoryReader and ProcessMemoryWriter also implemented the Seek trait, and I required the Seek trait for R here. Then here we can check whether seek(SeekFrom::Start(1)).is_ok(), and then rewind it before returning via seek(SeekFrom::Start(0)). I think I'll probably put that back in, but am open to other ideas.

I've been trying to think of an alternative to Seek, but I can't think of any.

Done. This cascaded a bit - it seemed silly to implement Seek twice, so I extracted that and some other pointer arithmetic into TypedPluginPtr.

I ended up backporting std::io::Seek::stream_len. Not sure this is the best way to do it. Do you know if we switched to nightly, would we only get nightly-only features that we explicitly opt into? If that's the case, maybe it'd be worth thinking about.

stevenengler · 2021-04-16T04:34:31Z

src/main/host/descriptor/pipe.rs

+        if written > 0 {
+            self.adjust_status(FileStatus::READABLE, !self.is_empty(), event_queue);
+            self.adjust_status(
+                FileStatus::WRITABLE,
+                self.space_available() > 0,
+                event_queue,
+            );
+        }


I think it's better to leave out the if written > 0 condition. If something called write() but no bytes were written, then we should probably check if we should remove the FileStatus::WRITABLE flag to prevent a possible infinite loop. (A similar bug appeared in the C code a while back and I remember Rob spent a lot of time debugging it, so I think it's useful to be safe.)

Oops! Yeah, I think I'd misunderstood what was going on here.

stevenengler · 2021-04-16T04:53:14Z

src/main/host/process.rs

+        ProcessMemoryReader::<T> {
+            process: self,
+            ptr: src,
+            len,
+            _phantom: std::marker::PhantomData,
        }


Maybe rename len to count? When I think of len in the context of a PluginPtr, I think of the number of bytes since a PluginPtr doesn't have a distinct type.

Good point! Done

stevenengler · 2021-04-16T05:55:13Z

src/main/host/process.rs

+impl<'a> std::io::Read for ProcessMemoryReader<'a, u8> {
+    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {


How expensive are functions like process_readPtr()? One concern is that if the code calls something like read_to_end() for this reader, it will call read() iteratively, making many process_readPtr() calls. But if the overhead of each process_readPtr() call is small, then this shouldn't be an issue.

The individual calls are relatively expensive if we don't go through the MemoryManager path for whatever reason, especially now that we've disabled buffering by default. This is generally the expectation with Read objects though, so generally things that take a Read object should try to minimize individual read calls.

e.g. from https://doc.rust-lang.org/std/io/trait.Read.html:

Please note that each call to read() may involve a system call, and therefore, using something that implements BufRead, such as BufReader, will be more efficient.

(Though I'd argue they should replace "will" with "may be", since addering a buffering layer is less efficient if you are accessing in large chunks)

If/when a consumer needs to do small sequential accesses, then can wrap a Read object in a BufReader

stevenengler · 2021-04-16T05:59:40Z

src/main/host/process.rs

+    /// Unlike `ProcessReader::as_ref`, the result's lifetime is bound to that of
+    /// *this* object, so that we can ensure writes are flushed.


I think this name needs to be updated (there is another reference to this ProcessReader as well).

stevenengler · 2021-04-16T06:07:45Z

src/main/utility/byte_queue.rs

+    pub fn push(&mut self, src: &[u8]) {
+        // A slice reader should be infallible.
+        self.push_from_reader(src).unwrap();
+    }


Since a &[u8] implements Read and Write, should push_from_reader() just be renamed to push() since it can also accept slices? (And the same for pop().)

Done, though I could go either way on this one. I think the primary disadvantage of removing the slice-based APIs, is that a caller with a slice has to reason out that push/pop are actually infallible in that case to safely call unwrap. I added comments as a hint, though, and in practice I don't think we'll ever be calling it with a slice.

stevenengler

I still think we should avoid getting the extra byte with bytes.bytes().next() since it would be unexpected, but everything else looks good to me.

stevenengler

Looks good! It's nice to see how the API looks. Maybe in the future we could implement Read and Write on the PosixFiles themselves, then we could have a cool sendfile() implementation using std::io::copy() to move data from one descriptor type to another.

stevenengler · 2021-04-17T01:06:11Z

src/main/host/descriptor/pipe.rs

+        // the write would block if we could not write any bytes, but were asked to
+        if usize::from(num_read) == 0 && bytes.stream_len_bp()? != 0 {


'write' -> 'read'

u8 slices implement `Read` and `Write`, so those still work. When the caller has a `Read` or `Write` object that *isn't* a buffer though, this allows the ByteQueue to read/write it directly without an intermediate buffer. Also avoid unnecessary initialization of internal buffers, using `Read::read_to_end` to read directly into uninitialized memory.

* Adds ProcessMemoryReader and ProcessMemoryWriter. These implement the `Read` and `Write` traits respectively, as well as `Seek`. * Removes the previous memory access APIs from Process. * Updates Rust syscall handlers to use the new APIs, removing an extra copy to an intermediate buffer in pipe reads and writes.

sporksmith · 2021-04-17T13:56:45Z

Maybe in the future we could implement Read and Write on the PosixFiles themselves, then we could have a cool sendfile() implementation using std::io::copy() to move data from one descriptor type to another.

Maybe! Do we ever need to copy directly from one descriptor to another? Note that std::io::copy adds an extra copy through an intermediate buffer; I don't think there's a way to avoid that given only a generic Read and a Write object. In some cases that's fundamental - e.g. if we wanted to copy from one actual OS file descriptor to another, we'd need to copy to an intermediate buffer first.

robgjansen assigned sporksmith Apr 12, 2021

robgjansen added this to the Fully support process-based simulation architecture milestone Apr 12, 2021

robgjansen added Component: Main Composing the core Shadow executable Type: Enhancement New functionality or improved design labels Apr 12, 2021

sporksmith force-pushed the bytequeue-read branch 2 times, most recently from 770636a to aed8b71 Compare April 15, 2021 00:17

sporksmith marked this pull request as ready for review April 15, 2021 00:24

sporksmith requested a review from stevenengler April 15, 2021 00:24

sporksmith removed the request for review from stevenengler April 15, 2021 16:32

sporksmith requested a review from stevenengler April 15, 2021 16:44

stevenengler requested changes Apr 16, 2021

View reviewed changes

stevenengler approved these changes Apr 16, 2021

View reviewed changes

sporksmith requested a review from stevenengler April 16, 2021 21:34

stevenengler approved these changes Apr 17, 2021

View reviewed changes

sporksmith force-pushed the bytequeue-read branch from 0e84a2d to 28b38af Compare April 17, 2021 13:00

sporksmith added 5 commits April 17, 2021 08:38

Export process_flushPtrs

1f0f584

Add TypedPluginPtr

15e3946

Backport Seek::stream_len

573f532

sporksmith force-pushed the bytequeue-read branch from 28b38af to f0408e6 Compare April 17, 2021 13:39

sporksmith merged commit 0c69df4 into shadow:dev Apr 17, 2021

sporksmith deleted the bytequeue-read branch April 17, 2021 13:57

		impl<'a> std::io::Read for ProcessMemoryReader<'a, u8> {
		fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {

		/// Unlike `ProcessReader::as_ref`, the result's lifetime is bound to that of
		/// this object, so that we can ensure writes are flushed.

		// the write would block if we could not write any bytes, but were asked to
		if usize::from(num_read) == 0 && bytes.stream_len_bp()? != 0 {

Conversation

sporksmith commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sporksmith commented Apr 15, 2021

Uh oh!

stevenengler commented Apr 15, 2021

Uh oh!

sporksmith commented Apr 15, 2021

Uh oh!

stevenengler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenengler Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenengler left a comment

Choose a reason for hiding this comment

Uh oh!

stevenengler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sporksmith commented Apr 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sporksmith commented Apr 9, 2021 •

edited

Loading

codecov bot commented Apr 9, 2021 •

edited

Loading

stevenengler Apr 16, 2021 •

edited

Loading