Today, @Nieuwejaar came to me with a very perplexing kernel panic.
debugging crash dump vmcore.1 (64-bit) from rust
operating system: 5.11 xde-0-g7b420ea31b (i86pc)
build version: heads/xde-0-g7b420ea31b-dirty
image uuid: 7389e819-d240-cd85-b776-fc1253d5a82b
panic message: copyin: kaddr argument below kernelbase
dump content: kernel pages only
fffffe007eb237e8 vpanic()
fffffe007eb23810 copyin+0x30()
fffffe007eb23840 ddi_copyin+0x21(1, 1, 0, 202003)
fffffe007eb23920 list_ports_hdlr+0x61()
fffffe007eb23c00 xde_dld_ioc_opte_cmd+0x56()
fffffe007eb23cb0 drv_ioctl+0x172(3500000000, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08)
fffffe007eb23cf0 cdev_ioctl+0x3f(3500000000, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08)
fffffe007eb23d40 spec_ioctl+0x55(fffffe59d6191d00, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08, 0)
fffffe007eb23dd0 fop_ioctl+0x40(fffffe59d6191d00, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08, 0)
fffffe007eb23ef0 ioctl+0x144(a, de001e61, fffffc7fffdfb8d0)
fffffe007eb23f00 sys_syscall+0x283()
Notice the first two arguments to ddi_copyin() -- those are supposed to be a userspace pointer and kernel pointer. Last I checked 0x1 does not a pointer make. So what the hell is happening?
All OPTE commands have a request and a response. When an ioctl/cmd is performed, the xde code tries to copyin the request bytes from userspace and deserialize them to a request struct.
let mut bytes = Vec::with_capacity(self.ioctl.req_len);
let ret = unsafe {
ddi::ddi_copyin(
self.ioctl.req_bytes as *const c_void,
bytes.as_mut_ptr() as *mut c_void,
self.ioctl.req_len,
self.mode,
)
};
However, some of these requests are ZSTs.
#[derive(Debug, Deserialize, Serialize)]
pub struct ListPortsReq {
pub unused: (),
}
In those cases we never do anything with the copied request.
#[no_mangle]
fn list_ports_hdlr(
env: &mut IoctlEnvelope,
) -> Result<ListPortsResp, OpteError> {
let _req: ListPortsReq = env.copy_in_req()?;
But many people have been using OPTE without any problem. So what gives. Well, there's a difference in behavior between ddi_copyin() in non-DEBUG vs DEBUG kernels.
.copyin_panic_msg:
.string "copyin: kaddr argument below kernelbase"
<... SNIP ...>
ENTRY(copyin)
pushq %rbp
movq %rsp, %rbp
subq $24, %rsp
/*
* save args in case we trap and need to rerun as a copyop
*/
movq %rdi, (%rsp)
movq %rsi, 0x8(%rsp)
movq %rdx, 0x10(%rsp)
movq kernelbase(%rip), %rax
#ifdef DEBUG
cmpq %rax, %rsi /* %rsi = kaddr */
jnb 1f
leaq .copyin_panic_msg(%rip), %rdi
xorl %eax, %eax
call panic
1:
#endif
If I look at an opteadm list-ports request on a non-DEBUG kernel I see the following:
rpz@kalm:~/opte-dtrace$ pfexec ./opte-trace opte-ioctl.d
opte_cmd_ioctl_t {
uint64_t api_version = 0xc
int cmd = 0x1
uint64_t flags = 0
uint64_t reserved = 0
char *req_bytes = 0x1
size_t req_len = 0
char *resp_bytes = 0x1378010
size_t resp_len = 0x4000
size_t resp_len_actual = 0
}
ddi_copyin(1, 1, 0, 0x202003) => 0
So it seems ddi_copyin() doesn't care about the bogus kernel address on non-DEBUG. I'm not sure if that should be the case. I wonder if it should panic in non-DEBUG or return a -1 (and bump a kstat)? Though, it is a zero-byte copy, so does it really matter what value was passed for the addresses?
So anyways, this doesn't answer the question of where the address value of 0x1 is coming from. Well that has to do with the fact that this request is a ZST. Which means we end up doing the following:
let mut bytes = Vec::with_capacity(0);
let ret = unsafe {
ddi::ddi_copyin(
self.ioctl.req_bytes as *const c_void,
bytes.as_mut_ptr() as *mut c_void,
self.ioctl.req_len,
self.mode,
)
};
If you look in the RawVec code in the alloc lib you'll see this creates a new struct with a dangling pointer. And that dangling pointer ends up aligned to 1 for a ZST.
I believe the fix here is to not lean on ZSTs like this and instead be more explicit with commands that have no request body by not calling ddi_copyin at all in those cases.
Today, @Nieuwejaar came to me with a very perplexing kernel panic.
Notice the first two arguments to
ddi_copyin()-- those are supposed to be a userspace pointer and kernel pointer. Last I checked0x1does not a pointer make. So what the hell is happening?All OPTE commands have a request and a response. When an ioctl/cmd is performed, the
xdecode tries to copyin the request bytes from userspace and deserialize them to a request struct.However, some of these requests are ZSTs.
In those cases we never do anything with the copied request.
But many people have been using OPTE without any problem. So what gives. Well, there's a difference in behavior between
ddi_copyin()in non-DEBUG vs DEBUG kernels.If I look at an
opteadm list-portsrequest on a non-DEBUG kernel I see the following:So it seems
ddi_copyin()doesn't care about the bogus kernel address on non-DEBUG. I'm not sure if that should be the case. I wonder if it should panic in non-DEBUG or return a-1(and bump a kstat)? Though, it is a zero-byte copy, so does it really matter what value was passed for the addresses?So anyways, this doesn't answer the question of where the address value of
0x1is coming from. Well that has to do with the fact that this request is a ZST. Which means we end up doing the following:If you look in the
RawVeccode in the alloc lib you'll see this creates a new struct with a dangling pointer. And that dangling pointer ends up aligned to1for a ZST.I believe the fix here is to not lean on ZSTs like this and instead be more explicit with commands that have no request body by not calling
ddi_copyinat all in those cases.