Skip to content

Introduce {attach|detach}_kfunc API#2726

Merged
yonghong-song merged 8 commits intoiovisor:masterfrom
olsajiri:kfunc
Feb 27, 2020
Merged

Introduce {attach|detach}_kfunc API#2726
yonghong-song merged 8 commits intoiovisor:masterfrom
olsajiri:kfunc

Conversation

@olsajiri
Copy link
Copy Markdown
Contributor

Kernel added new probe called trampoline allowing to probe
almost any kernel function when BTF info is available in
the system.

Adding the interface under 'kfunc' name, which allows to
probe function entry and 'kretfunc' name, which allows to
trace function return.

The main benefit is really low overhead which is really low
for trampolines (please see commit for klockstat.py
with perf comparison).

Another benefit is the ability of kretfunc probe to access
function arguments, so some tools might need only one program
instead of entry/exit ones (please see commit for
opensnoop.py changes).

@yonghong-song
Copy link
Copy Markdown
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Copy Markdown
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Copy Markdown
Collaborator

All tests failed. All the buildbot is pretty old. The latest one in buildbot is fc28 which I do not think supports BTF. For example, the following is the error message:

29: ....../bin/sh: line 1: 11657 Killed                  timeout -s KILL -k 5s 5s ../../tools/killsnoop.py > /dev/null
29: .s........WARNING: 15 stack traces lost and could not be displayed.
29: ..libbpf: failed to find valid kernel BTF
29: libbpf: vmlinux BTF is not found
29: /virtual/main.c:27:1: error: redefinition of 'infotmp_table_t'
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: /virtual/main.c:24:1: note: previous definition is here
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: /virtual/main.c:27:10: error: redefinition of 'infotmp'
29: BPF_HASH(infotmp, u64, struct val_t);
29:          ^
29: /virtual/main.c:24:10: note: previous definition is here
29: BPF_HASH(infotmp, u64, struct val_t);
29:          ^
29: /virtual/main.c:27:1: error: redefinition of '____btf_map_infotmp'
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: /virtual/main.c:24:1: note: previous definition is here
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: /virtual/main.c:27:1: error: redefinition of '____btf_map_infotmp'
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: /virtual/main.c:24:1: note: previous definition is here
29: BPF_HASH(infotmp, u64, struct val_t);
29: ^
29: /virtual/include/bcc/helpers.h:176:37: note: expanded from macro 'BPF_HASH'
29:   BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
29:                                     ^
29: 4 errors generated.

We will need to guard the new test with kernel version and availability of kernel BTF. Not all distro's have kernel BTF, I guess.

@olsajiri
Copy link
Copy Markdown
Contributor Author

ok, I expected some failures, but not that bad.. will check on some BTF detection for tests

"""

bpf_text_kprobe = """
BPF_HASH(infotmp, u64, struct val_t);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BPF_HASH is redundant. This will cause compilation error for kprobe case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, I must have consused my versions and pushed the wrong one,
I recall fixing this, sry.. will fix in next version

@yonghong-song
Copy link
Copy Markdown
Collaborator

@olsajiri any update on this one?

@brendangregg
Copy link
Copy Markdown
Member

Having the kretfunc retval in args[N] might get pretty confusing. Could it have a separate variable called retval for this?

(I was going to suggest putting retval in args[0] and bumping the others along by one, but that would make it inconsistent with kfunc.)

@yonghong-song
Copy link
Copy Markdown
Collaborator

Having the kretfunc retval in args[N] might get pretty confusing. Could it have a separate variable called retval for this?
I think we could design an interface with return value as a separate variable. The rewriter can kick in to generate proper code. We also need and we already have BTF to get the function signature, so we could parse and get the number of arguments for the particular function.

@olsajiri
Copy link
Copy Markdown
Contributor Author

olsajiri commented Feb 6, 2020

Having the kretfunc retval in args[N] might get pretty confusing. Could it have a separate variable called retval for this?
I think we could design an interface with return value as a separate variable. The rewriter can kick in to generate proper code. We also need and we already have BTF to get the function signature, so we could parse and get the number of arguments for the particular function.

sry, I'm out this week, hence my lag.. I think you're right, we can add some better interface
for arguments.. perhaps even with the argument names, because we have it in the BTF
I'll check on this

jirka

@yonghong-song
Copy link
Copy Markdown
Collaborator

Oh we could adopt libbpf convention as well.

__u64 test4_result = 0;
SEC("fexit/bpf_fentry_test4")
int BPF_PROG(test4, void *a, char b, int c, __u64 d, int ret)

Some macros, so we do not need to add BTF analysis code although it is nice to have to double check.

In the long time, I intend to change bcc to be more libbpf friendly so we can get a lot of libbpf funcitonality for free.

@olsajiri
Copy link
Copy Markdown
Contributor Author

Oh we could adopt libbpf convention as well.

__u64 test4_result = 0;
SEC("fexit/bpf_fentry_test4")
int BPF_PROG(test4, void *a, char b, int c, __u64 d, int ret)

Some macros, so we do not need to add BTF analysis code although it is nice to have to double check.

In the long time, I intend to change bcc to be more libbpf friendly so we can get a lot of libbpf funcitonality for free.

ok, that might be actually easier.. will check

thanks,
jirka

@olsajiri
Copy link
Copy Markdown
Contributor Author

@yonghong-song actually while working on bpftrace support,
I relaized we don't need to pass expected_attach_type through
bcc_func_load chain, but we can get it from the function name
prefix (kfunc__, kretfunc__).

If this kind of interface is ok with you - getting the attr info from
the function name, I'll make the changes. It will ease up bpftrace
inclusion, because we could continue to use bcc_prog_load.

We need to use LIBBPF_LIBRARIES variable instead of just using bpf
for libbpf linking, so it user specified library gets linked in.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Currently we include headers from local libbpf subpackage,
which does not work if user specify LIBBPF_INCLUDE_DIR.

Adding HAVE_EXTERNAL_LIBBPF macro, that gets defined when
user specifies libbpf header path. This macro is checked
in bcc_libbpf_inc.h and proper files are included.

Using bcc_libbpf_inc.h in place of libbpf includes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The TRACING/EXT programs use attach_btf_id and attach_prog_fd
fields from struct bpf_load_program_attr.

The attach_prog_fd field shares space with kern_version,
so by setting kern_version unconditionally we also set
attach_prog_fd to bogus value and kernel fails the load
because it tries to look it up.

Setting kern_version only for programs other than TRACING/EXT
type.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Kernel added new probe called trampoline allowing to probe
almost any kernel function when BTF info is available in
the system.

Adding the interface to define trampoline function for given
kernel function via BPF_PROG macro, like:

  KFUNC_PROBE(do_sys_open, int dfd, const char *filename, int flags, int mode)
  {
    ...
  }

which defines trampoline function with the 'kfunc__do_sys_open'
name, that instruments do_sys_open kernel function before the
function is executed.

or:

  KRETFUNC_PROBE(do_sys_open, int dfd, const char *filename, int flags, int mode, int ret)
  {
    ...
  }

which defines trampoline function with the 'kfunc__do_sys_open'
name, that instruments do_sys_open kernel function after the
function is executed.

The main benefit is really lower overhead for trampolines (please
see following commit for klockstat.py with perf comparison).

Another benefit is the ability of kretfunc probe to access
function arguments, so some tools might need only one program
instead of entry/exit ones (please see following commit for
opensnoop.py changes).

Currently the interface does not allow to define function
of different name than:
  kfunc__<function_name> or kretfunc__<function_name>

which is sufficient by now, and can be easily changed in future
if needed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding support_kfunc function to BPF object to return
state of kfunc trampolines support.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding kfunc return trampoline probe if available instead of
kprobe/kretprobe probes.

The return trampoline has access to function entry arguments,
so we are good with just single eBPF program.

The kfunc trampolines are also faster - less intrusive.

Below are stats for compiling linux kernel while running
opensnoop.py on the background for kprobes and kfuncs.

Without opensnoop.py:

 Performance counter stats for 'system wide':

   849,741,782,765      cycles:k

     162.235646336 seconds time elapsed

With opensnoop.py using kprobes:

 Performance counter stats for 'system wide':

   941,615,199,769      cycles:k

     164.355032879 seconds time elapsed

With opensnoop.py using trampolines:

 Performance counter stats for 'system wide':

   913,437,005,488      cycles:k

     163.742914795 seconds time elapsed

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding kfunc return trampoline probe if available instead of
kprobe/kretprobe probes.

The kfunc trampolines are faster - less intrusive. The benchmark
without klockstat.py script on background:

	$ perf bench sched messaging -l 50000
	# Running 'sched/messaging' benchmark:
	# 20 sender and receiver processes per group
	# 10 groups == 400 processes run

	     Total time: 18.571 [sec]

With kprobe tracing:
	$ perf bench sched messaging -l 50000
	# Running 'sched/messaging' benchmark:
	# 20 sender and receiver processes per group
	# 10 groups == 400 processes run

	     Total time: 183.395 [sec]

With kfunc tracing:
	$ perf bench sched messaging -l 50000
	# Running 'sched/messaging' benchmark:
	# 20 sender and receiver processes per group
	# 10 groups == 400 processes run

	     Total time: 39.773 [sec]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding kfunc return trampoline probe if available instead of
kprobe/kretprobe probes.

The kfunc trampolines are faster - less intrusive.

Below are stats for running perf bench sched pipe benchamark while
running vfsstat.py on the background for kprobes and kfuncs.

With kprobes:
         Performance counter stats for './perf bench sched pipe -l 5000000' (3 runs):

           112,520,853,574      cycles:k

                    48.674 +- 0.672 seconds time elapsed  ( +-  1.38% )

With kfuncs:
         Performance counter stats for './perf bench sched pipe -l 5000000' (3 runs):

           106,304,165,424      cycles:k

                    46.820 +- 0.197 seconds time elapsed  ( +-  0.42% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
@yonghong-song
Copy link
Copy Markdown
Collaborator

@yonghong-song actually while working on bpftrace support,
I relaized we don't need to pass expected_attach_type through
bcc_func_load chain, but we can get it from the function name
prefix (kfunc__, kretfunc__).

If this kind of interface is ok with you - getting the attr info from
the function name, I'll make the changes. It will ease up bpftrace
inclusion, because we could continue to use bcc_prog_load.

This should work. In bcc, we already have some other usage with special prefixes.

@olsajiri
Copy link
Copy Markdown
Contributor Author

@yonghong-song actually while working on bpftrace support,
I relaized we don't need to pass expected_attach_type through
bcc_func_load chain, but we can get it from the function name
prefix (kfunc__, kretfunc__).
If this kind of interface is ok with you - getting the attr info from
the function name, I'll make the changes. It will ease up bpftrace
inclusion, because we could continue to use bcc_prog_load.

This should work. In bcc, we already have some other usage with special prefixes.

cool, changes are already in the branch

@yonghong-song
Copy link
Copy Markdown
Collaborator

LGTM. Thanks!

@yonghong-song yonghong-song merged commit 2fa54c0 into iovisor:master Feb 27, 2020
@alban
Copy link
Copy Markdown
Contributor

alban commented Mar 18, 2020

I get the following error in opensnoop with Linux 5.5.6. Is it a problem in bcc or in the kernel?

bpf: Failed to load program: Permission denied
0: (bf) r6 = r1
1: (79) r1 = *(u64 *)(r6 +32)
2: (7b) *(u64 *)(r10 -304) = r1
3: (79) r7 = *(u64 *)(r6 +8)
func 'do_sys_open' arg1 type INT is not a struct
invalid bpf_context access off=8 size=8
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Traceback (most recent call last):
  File "/usr/share/bcc/tools/opensnoop", line 235, in <module>
    
b = BPF(text=bpf_text)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 364, in __init__
    
self._trace_autoload()
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 1198, in _trace_autoload
    
self.attach_kretfunc(fn_name=func_name)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 933, in attach_kretfunc
    
fn = self.load_func(fn_name, BPF.TRACING)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 404, in load_func
    
(func_name, errstr))
Exception
: 
Failed to load BPF program kretfunc__do_sys_open: Permission denied

@yonghong-song
Copy link
Copy Markdown
Collaborator

Maybe this is a kernel issue? I mean kernel is not advanced enough to have this patch?
https://www.spinics.net/lists/netdev/msg627180.html

@alban
Copy link
Copy Markdown
Contributor

alban commented Mar 19, 2020

Thanks for the pointer. Indeed my kernel does not have this patch. I wonder if BPF.support_kfunc() should return false on kernels without that patch, so that opensnoop can fallback on the previous kprobe code, or if BCC is fine asking users to update their kernels to the latest version.

I just noticed that the new kfunc code does not have the --cgroupmap filter.
Edit: I filed #2826 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants