-
Notifications
You must be signed in to change notification settings - Fork 697
Description
I think it may be unsound for Syscall to send data from the host to guest via to_guest: &mut [u32], when the pointer given by the guest is to an uninitialized allocation. I think use of sys_read or sys_read_words with uninitialized buffers could trigger UB. I think it should instead be &mut [MaybeUninit<u32>].
My motivation for this comes from wanting to implement Read::read_buf for the RISC Zero version of std::io::Stdin (tracked in rust-lang/rust#136756).
If this sounds good, I could submit a PR here similar to what I did for the Hermit unikernel in hermit-os/kernel#1606 (Use uninitialized buffers for read and recvfrom). However, since each file descriptor is associated with a dyn Read and Read::read_buf is currently unstable, this might not be possible.
On the other hand, Rust tentatively does not have full recursive validity for references, so as long as uninitialized data is not read, you might be fine. @RalfJung?
Steps to reproduce
This patch to std, which implements Read::read_buf for RISC Zero stdin, would encounter this issue.
Since I know little about RISC Zero, here's my understanding of the relevant bits, so you can correct me if I'm wrong:
Trace from the C FFI to the underlying Read::read calls
-
Guest code invokes a function in the C FFI, passing a buffer via a
*mut u8andusizelength:risc0/risc0/zkvm/platform/src/syscall.rs
Line 482 in 711b163
pub unsafe extern "C" fn sys_read(fd: u32, recv_ptr: *mut u8, nread: usize) -> usize { -
It invokes a syscall via
ecall, passing the pointer and length bya0anda1registers and :risc0/risc0/zkvm/platform/src/syscall.rs
Lines 584 to 590 in 711b163
syscall_2( nr::SYS_READ, recv_ptr, min(nwords_remain, MAX_BUF_WORDS), fd, chunk_len, ) risc0/risc0/zkvm/platform/src/syscall.rs
Lines 236 to 251 in 711b163
::core::arch::asm!( "ecall", in("t0") $crate::syscall::ecall::SOFTWARE, inlateout("a0") from_host => a0, inlateout("a1") from_host_words => a1, in("a2") syscall.as_ptr() $(,in("a3") $a0 $(,in("a4") $a1 $(,in("a5") $a2 $(,in("a6") $a3 $(,in("a7") $a4 )? )? )? )? )?); -
It is somehow transmitted to the host. I'm fuzzy here. I assume the guest and host are on the same system?
-
The host resolves the syscall
risc0/risc0/zkvm/src/host/server/exec/syscall/mod.rs
Lines 197 to 199 in 711b163
pub(crate) fn get_syscall(&self, name: &str) -> Option<&Rc<RefCell<(dyn Syscall + 'a)>>> { self.inner.get(name) } from the table:
.with_syscall(SYS_READ, SysRead) -
And calls the appropriate
impl Syscall:risc0/risc0/zkvm/src/host/server/exec/syscall/posix_io.rs
Lines 26 to 32 in 711b163
impl Syscall for SysRead { fn syscall( &mut self, _syscall: &str, ctx: &mut dyn SyscallContext, to_guest: &mut [u32], ) -> Result<(u32, u32)> { which casts
to_guestto a&mut [u8], adjusting the length to be in bytes:let to_guest_u8 = bytemuck::cast_slice_mut(to_guest); -
It then resolves
std::io::Readinstance for the given fd, which is either astd::io::Cursor<Vec<u8>>for stdinnew.with_read_fd(fileno::STDIN, Cursor::new(vec![])) or some other
dyn Read:risc0/risc0/zkvm/src/host/client/posix_io.rs
Lines 62 to 64 in 711b163
pub fn with_read_fd(&mut self, fd: u32, reader: impl Read + 'a) -> &mut Self { self.with_shared_read_fd(fd, Rc::new(RefCell::new(reader))) } -
It then calls
std::io::Read::readuntil enough of the buffer is filled:risc0/risc0/zkvm/src/host/server/exec/syscall/posix_io.rs
Lines 50 to 61 in 711b163
let read_all = |mut buf: &mut [u8]| -> Result<usize> { let mut tot_nread = 0; while !buf.is_empty() { let nread = reader.borrow_mut().read(buf)?; if nread == 0 { break; } tot_nread += nread; (_, buf) = buf.split_at_mut(nread); } Ok(tot_nread) }; This API expects the buffer to be fully initialized.