xsk: do not discard packet when NETDEV_TX_BUSY#72
Closed
kernel-patches-bot wants to merge 1 commit intobpffrom
Closed
xsk: do not discard packet when NETDEV_TX_BUSY#72kernel-patches-bot wants to merge 1 commit intobpffrom
kernel-patches-bot wants to merge 1 commit intobpffrom
Conversation
Author
|
Master branch: 642e450 Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/patch/1600257625-2353-1-git-send-email-magnus.karlsson@gmail.com/, error message was: |
Author
|
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=202227 irrelevant now. Closing PR. |
kernel-patches-bot
pushed a commit
that referenced
this pull request
Nov 2, 2021
The test case btrfs/238 reports the warning below: WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c Reason: While btrfs_prepare_sprout() moves the fs_devices::devices into fs_devices::seed_list, the btrfs_show_devname() searches for the devices and found none, leading to the warning as in above. Fix: latest_dev is updated according to the changes to the device list. That means we could use the latest_dev->name to show the device name in /proc/self/mounts, the pointer will be always valid as it's assigned before the device is deleted from the list in remove or replace. The RCU protection is sufficient as the device structure is freed after synchronization. Reported-by: Su Yue <l@damenly.su> Tested-by: Su Yue <l@damenly.su> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 16, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 16, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 16, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 17, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 17, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 17, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 17, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 17, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 18, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 19, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 21, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 23, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 24, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 29, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 31, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
kernel-patches-bot
pushed a commit
that referenced
this pull request
Mar 31, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number. Therefore, this patch tries to
convert the offsets.
The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.
FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.
Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.
For example, for the following bpftrace command:
bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
Without this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: mov x25, sp
1c: mov x26, #0x0 // #0
20: bti j
24: sub sp, sp, #0x90
28: add x19, x0, #0x0
2c: mov x0, #0x0 // #0
30: mov x10, #0xffffffffffffff78 // #-136
34: str x0, [x25, x10]
38: mov x10, #0xffffffffffffff80 // #-128
3c: str x0, [x25, x10]
40: mov x10, #0xffffffffffffff88 // #-120
44: str x0, [x25, x10]
48: mov x10, #0xffffffffffffff90 // #-112
4c: str x0, [x25, x10]
50: mov x10, #0xffffffffffffff98 // #-104
54: str x0, [x25, x10]
58: mov x10, #0xffffffffffffffa0 // #-96
5c: str x0, [x25, x10]
60: mov x10, #0xffffffffffffffa8 // #-88
64: str x0, [x25, x10]
68: mov x10, #0xffffffffffffffb0 // #-80
6c: str x0, [x25, x10]
70: mov x10, #0xffffffffffffffb8 // #-72
74: str x0, [x25, x10]
78: mov x10, #0xffffffffffffffc0 // #-64
7c: str x0, [x25, x10]
80: mov x10, #0xffffffffffffffc8 // #-56
84: str x0, [x25, x10]
88: mov x10, #0xffffffffffffffd0 // #-48
8c: str x0, [x25, x10]
90: mov x10, #0xffffffffffffffd8 // #-40
94: str x0, [x25, x10]
98: mov x10, #0xffffffffffffffe0 // #-32
9c: str x0, [x25, x10]
a0: mov x10, #0xffffffffffffffe8 // #-24
a4: str x0, [x25, x10]
a8: mov x10, #0xfffffffffffffff0 // #-16
ac: str x0, [x25, x10]
b0: mov x10, #0xfffffffffffffff8 // #-8
b4: str x0, [x25, x10]
b8: mov x10, #0x8 // #8
bc: ldr x2, [x19, x10]
[...]
With this patch, jited code(fragment):
0: bti c
4: stp x29, x30, [sp, #-16]!
8: mov x29, sp
c: stp x19, x20, [sp, #-16]!
10: stp x21, x22, [sp, #-16]!
14: stp x25, x26, [sp, #-16]!
18: stp x27, x28, [sp, #-16]!
1c: mov x25, sp
20: sub x27, x25, #0x88
24: mov x26, #0x0 // #0
28: bti j
2c: sub sp, sp, #0x90
30: add x19, x0, #0x0
34: mov x0, #0x0 // #0
38: str x0, [x27]
3c: str x0, [x27, #8]
40: str x0, [x27, #16]
44: str x0, [x27, #24]
48: str x0, [x27, #32]
4c: str x0, [x27, #40]
50: str x0, [x27, #48]
54: str x0, [x27, #56]
58: str x0, [x27, #64]
5c: str x0, [x27, #72]
60: str x0, [x27, #80]
64: str x0, [x27, #88]
68: str x0, [x27, #96]
6c: str x0, [x27, #104]
70: str x0, [x27, #112]
74: str x0, [x27, #120]
78: str x0, [x27, #128]
7c: ldr x2, [x19, #8]
[...]
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Aug 30, 2023
More than 8 whitespaces of the code indent are replaced with "tab +
whitespaces" to fix up such errors reported by scripts/checkpatch.pl:
ERROR: code indent should use tabs where possible
#64: FILE: tools/include/nolibc/arch-mips.h:64:
+^I \$
ERROR: code indent should use tabs where possible
#72: FILE: tools/include/nolibc/arch-mips.h:72:
+^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$
This command is used:
$ sed -i -e '/^\t* /{s/ /\t/g}' tools/include/nolibc/arch-*.h
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 1, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
musamaanjum
pushed a commit
to musamaanjum/bpf
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes since v2: - Replace test_dev_cgroup with serial_test_dev_cgroup as there is probability that the test is racing against another cgroup test - Minor changes to the commit message above I've tested the patch with vmtest.sh on bpf-next/for-next and linux next. It is passing on both. Not sure why it was failed on BPFCI. Test run with vmtest.h: sudo LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh ./test_progs -t dev_cgroup ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000403432 s, 81.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#69 dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Changes since v1: - Rename file from test_dev_cgroup.c to dev_cgroup.c - Use ASSERT_* in-place of CHECK
musamaanjum
pushed a commit
to musamaanjum/bpf
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes since v2: - Replace test_dev_cgroup with serial_test_dev_cgroup as there is probability that the test is racing against another cgroup test - Minor changes to the commit message above I've tested the patch with vmtest.sh on bpf-next/for-next and linux next. It is passing on both. Not sure why it was failed on BPFCI. Test run with vmtest.h: sudo LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh ./test_progs -t dev_cgroup ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000403432 s, 81.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#69 dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Changes since v1: - Rename file from test_dev_cgroup.c to dev_cgroup.c - Use ASSERT_* in-place of CHECK
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Apr 2, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted #72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
musamaanjum
pushed a commit
to musamaanjum/bpf
that referenced
this pull request
May 3, 2024
Move test_dev_cgroup.c to prog_tests/dev_cgroup.c to be able to run it with test_progs. Replace dev_cgroup.bpf.o with skel header file, dev_cgroup.skel.h and load program from it accourdingly. ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000856684 s, 38.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#72 test_dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes since v2: - Replace test_dev_cgroup with serial_test_dev_cgroup as there is probability that the test is racing against another cgroup test - Minor changes to the commit message above I've tested the patch with vmtest.sh on bpf-next/for-next and linux next. It is passing on both. Not sure why it was failed on BPFCI. Test run with vmtest.h: sudo LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh ./test_progs -t dev_cgroup ./test_progs -t dev_cgroup mknod: /tmp/test_dev_cgroup_null: Operation not permitted 64+0 records in 64+0 records out 32768 bytes (33 kB, 32 KiB) copied, 0.000403432 s, 81.2 MB/s dd: failed to open '/dev/full': Operation not permitted dd: failed to open '/dev/random': Operation not permitted kernel-patches#69 dev_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Changes since v1: - Rename file from test_dev_cgroup.c to dev_cgroup.c - Use ASSERT_* in-place of CHECK
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Aug 23, 2024
This patch fixes a tailcall infinite loop issue that freplace prog tail calls its target prog and attaches to the target prog's subprog, since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"). The commit allows freplace prog tail calls its target prog. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. It will cause this panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, tail_call_cnt_ptr propagates to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr propagates to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. Furthermore, trampoline is required to propagate tail_call_cnt/tail_call_cnt_ptr always, no matter whether there is tailcall at run time. So, it reuses trampoline flag "BIT(7)" to tell trampoline to propagate it, as BPF_TRAMP_F_TAIL_CALL_CTX is not used by any other arch BPF JIT. However, the bad effect is that it requires initializing tail_call_cnt at prologue always no matter whether it's tail_call_reachable, because it is unable to confirm its subprog[s] whether to be attached by freplace prog. And, when call subprog, tail_call_cnt_ptr is required to be propagated always. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Aug 23, 2024
This patch fixes a tailcall infinite loop issue that freplace prog tail calls its target prog and attaches to the target prog's subprog, since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"). The commit allows freplace prog tail calls its target prog. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. It will cause this panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, tail_call_cnt_ptr propagates to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr propagates to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. Furthermore, trampoline is required to propagate tail_call_cnt/tail_call_cnt_ptr always, no matter whether there is tailcall at run time. So, it reuses trampoline flag "BIT(7)" to tell trampoline to propagate it, as BPF_TRAMP_F_TAIL_CALL_CTX is not used by any other arch BPF JIT. However, the bad effect is that it requires initializing tail_call_cnt at prologue always no matter whether it's tail_call_reachable, because it is unable to confirm its subprog[s] whether to be attached by freplace prog. And, when call subprog, tail_call_cnt_ptr is required to be propagated always. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Aug 24, 2024
This patch fixes a tailcall infinite loop issue that freplace prog tail calls its target prog and attaches to the target prog's subprog, since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"). The commit allows freplace prog tail calls its target prog. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. It will cause this panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, tail_call_cnt_ptr propagates to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr propagates to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. Furthermore, trampoline is required to propagate tail_call_cnt/tail_call_cnt_ptr always, no matter whether there is tailcall at run time. So, it reuses trampoline flag "BIT(7)" to tell trampoline to propagate it, as BPF_TRAMP_F_TAIL_CALL_CTX is not used by any other arch BPF JIT. However, the bad effect is that it requires initializing tail_call_cnt at prologue always no matter whether it's tail_call_reachable, because it is unable to confirm its subprog[s] whether to be attached by freplace prog. And, when call subprog, tail_call_cnt_ptr is required to be propagated always. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Aug 25, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), Freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. Furthermore, trampoline is required to propagate tail_call_cnt/tail_call_cnt_ptr always, no matter whether there is tailcall at run time. So, it reuses trampoline flag "BIT(7)" to tell trampoline to propagate the tail_call_cnt/tail_call_cnt_ptr, as BPF_TRAMP_F_TAIL_CALL_CTX is not used by any other arch BPF JIT. However, the bad effect is that it requires initializing tail_call_cnt at prologue always no matter whether the prog is tail_call_reachable, because it is unable to confirm itself or its subprog[s] whether to be attached by freplace prog. And, when call subprog, tail_call_cnt_ptr is required to be propagated to subprog always. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Aug 25, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. Furthermore, trampoline is required to propagate tail_call_cnt/tail_call_cnt_ptr always, no matter whether there is tailcall at run time. So, it reuses trampoline flag "BIT(7)" to tell trampoline to propagate the tail_call_cnt/tail_call_cnt_ptr, as BPF_TRAMP_F_TAIL_CALL_CTX is not used by any other arch BPF JIT. However, the bad effect is that it requires initializing tail_call_cnt at prologue always no matter whether the prog is tail_call_reachable, because it is unable to confirm itself or its subprog[s] whether to be attached by freplace prog. And, when call subprog, tail_call_cnt_ptr is required to be propagated to subprog always. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Sep 1, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Sep 1, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Asphaltt
added a commit
to Asphaltt/bpf
that referenced
this pull request
Sep 1, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty kernel-patches#72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 1, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 2, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 2, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 4, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 4, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 4, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
kernel-patches-daemon-bpf bot
pushed a commit
that referenced
this pull request
Sep 4, 2024
This patch fixes a tailcall infinite loop issue caused by freplace. Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog is allowed to tail call its target prog. Then, when a freplace prog attaches to its target prog's subprog and tail calls its target prog, kernel will panic. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_bpf2bpf_hierarchy_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; subprog_tail(skb); subprog_tail(skb); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in subprog_tail is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace -> subprog_tail --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] </#DF> [ 15.310490] <TASK> [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] </TASK> [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch fixes the issue by initializing tail_call_cnt at the prologue of entry_tc. Next, when call subprog_tc, the tail_call_cnt_ptr is propagated to subprog_tc by rax. Next, when jump to entry_freplace, the tail_call_cnt_ptr will be reused to count tailcall in freplace prog. Next, when call subprog_tail, the tail_call_cnt_ptr is propagated to subprog_tail by rax. Next, while tail calling to entry_tc, the tail_call_cnt on the stack of entry_tc increments via the tail_call_cnt_ptr. The whole procedure shows as the following JITed prog dumping. bpftool p d j n entry_tc: int entry_tc(struct __sk_buff * skb): bpf_prog_1c515f389a9059b4_entry_tc: ; return subprog_tc(skb); 0: endbr64 4: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: movq -16(%rbp), %rax ;; rax = tcc_ptr 29: callq 0x70 ;; call subprog_tc() ; return subprog_tc(skb); 2e: leave 2f: retq int subprog_tc(struct __sk_buff * skb): bpf_prog_1e8f76e2374a0607_subprog_tc: ; return skb->len * 2; 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: jmp 0x108 ;; jump to entry_freplace() c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax 15: pushq %rax 16: movl 112(%rdi), %eax ; return skb->len * 2; 19: shll %eax ; return skb->len * 2; 1b: leave 1c: retq bpftool p d j n entry_freplace: int entry_freplace(struct __sk_buff * skb): bpf_prog_85781a698094722f_entry_freplace: ; int entry_freplace(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: pushq %r13 ;; callee saved 25: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 28: movabsq $-123406219759616, %r13 32: movl (%r13), %edi 36: addl $1, %edi 39: movl %edi, (%r13) ; subprog_tail(skb); 3d: movq %rbx, %rdi ;; rdi = skb 40: movq -16(%rbp), %rax ;; rax = tcc_ptr 47: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 4c: movq %rbx, %rdi ;; rdi = skb 4f: movq -16(%rbp), %rax ;; rax = tcc_ptr 56: callq 0x80 ;; call subprog_tail() ; return count; 5b: movl (%r13), %eax ; return count; 5f: popq %r13 61: popq %rbx 62: leave 63: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax) ;; do not touch tail_call_cnt 7: nopl (%rax,%rax) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-128494642337280, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: nopl (%rax,%rax) ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: jmp 0xfffffffffffffe18 ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq As a result, the tail_call_cnt is stored on the stack of entry_tc. And the tail_call_cnt_ptr is propagated between subprog_tc, entry_freplace, subprog_tail and entry_tc. But wait, what if entry_freplace has tailcall and entry_tc has no tailcall? It's to disallow attaching this entry_freplace to this entry_tc in verifier. And, what if entry_freplace has tailcall and entry_tc has tailcall and entry_freplace attaches to entry_tc? In this patch, the tailcall info of entry_freplace inherits from its target. Therefore, it swaps the positions of nop5 and xor/nop3 in order to initialize tail_call_cnt at the prologue of entry_tc and then propagate the tail_call_cnt to entry_freplace. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: xsk: do not discard packet when NETDEV_TX_BUSY
version: 5
url: https://patchwork.ozlabs.org/project/netdev/list/?series=202227