xdp: add a new helper for dev map multicast support#13
xdp: add a new helper for dev map multicast support#13kernel-patches-bot wants to merge 6 commits intobpf-nextfrom
Conversation
|
Master branch: f9bec5d patch https://patchwork.ozlabs.org/project/netdev/patch/20200907082724.1721685-2-liuhangbin@gmail.com/ applied successfully |
used when we want to allow NULL pointer for map parameter. The bpf helper need to take care and check if the map is NULL when use this type. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- v11: no update v10: remove useless CONST_PTR_TO_MAP_OR_NULL and Copy-paste comment. v9: merge the patch from [1] in to this series. v1-v8: no this patch [1] https://lore.kernel.org/bpf/20200715070001.2048207-1-liuhangbin@gmail.com/ --- include/linux/bpf.h | 1 + kernel/bpf/verifier.c | 14 +++++++++----- 2 files changed, 10 insertions(+), 5 deletions(-)
before[0], The goal is to be able to implement an OVS-like data plane in XDP, i.e., a software switch that can forward XDP frames to multiple ports. To achieve this, an application needs to specify a group of interfaces to forward a packet to. It is also common to want to exclude one or more physical interfaces from the forwarding operation - e.g., to forward a packet to all interfaces in the multicast group except the interface it arrived on. While this could be done simply by adding more groups, this quickly leads to a combinatorial explosion in the number of groups an application has to maintain. To avoid the combinatorial explosion, we propose to include the ability to specify an "exclude group" as part of the forwarding operation. This needs to be a group (instead of just a single port index), because a physical interface can be part of a logical grouping, such as a bond device. Thus, the logical forwarding operation becomes a "set difference" operation, i.e. "forward to all ports in group A that are not also in group B". This series implements such an operation using device maps to represent the groups. This means that the XDP program specifies two device maps, one containing the list of netdevs to redirect to, and the other containing the exclude list. To achieve this, I re-implement a new helper bpf_redirect_map_multi() to accept two maps, the forwarding map and exclude map. The forwarding map could be DEVMAP or DEVMAP_HASH, but the exclude map *must* be DEVMAP_HASH to get better performace. If user don't want to use exclude map and just want simply stop redirecting back to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS. As both bpf_xdp_redirect_map() and this new helpers are using struct bpf_redirect_info, I add a new ex_map and set tgt_value to NULL in the new helper to make a difference with bpf_xdp_redirect_map(). Also I keep the the general data path in net/core/filter.c, the native data path in kernel/bpf/devmap.c so we can use direct calls to get better performace. [0] https://xdp-project.net/#Handling-multicast Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- v11: Fix bpf_redirect_map_multi() helper description typo. Add loop limit for devmap_get_next_obj() and dev_map_redirect_multi(). v10: Update helper bpf_xdp_redirect_map_multi() - No need to check map pointer as we will do the check in verifier. v9: Update helper bpf_xdp_redirect_map_multi() - Use ARG_CONST_MAP_PTR_OR_NULL for helper arg2 v8: Update function dev_in_exclude_map(): - remove duplicate ex_map map_type check in - lookup the element in dev map by obj dev index directly instead of looping all the map v7: a) Fix helper flag check b) Limit the *ex_map* to use DEVMAP_HASH only and update function dev_in_exclude_map() to get better performance. v6: converted helper return types from int to long v5: a) Check devmap_get_next_key() return value. b) Pass through flags to __bpf_tx_xdp_map() instead of bool value. c) In function dev_map_enqueue_multi(), consume xdpf for the last obj instead of the first on. d) Update helper description and code comments to explain that we use NULL target value to distinguish multicast and unicast forwarding. e) Update memory model, memory id and frame_sz in xdpf_clone(). v4: Fix bpf_xdp_redirect_map_multi_proto arg2_type typo v3: Based on Toke's suggestion, do the following update a) Update bpf_redirect_map_multi() description in bpf.h. b) Fix exclude_ifindex checking order in dev_in_exclude_map(). c) Fix one more xdpf clone in dev_map_enqueue_multi(). d) Go find next one in dev_map_enqueue_multi() if the interface is not able to forward instead of abort the whole loop. e) Remove READ_ONCE/WRITE_ONCE for ex_map. v2: Add new syscall bpf_xdp_redirect_map_multi() which could accept include/exclude maps directly. --- include/linux/bpf.h | 20 +++++ include/linux/filter.h | 1 + include/net/xdp.h | 1 + include/uapi/linux/bpf.h | 27 +++++++ kernel/bpf/devmap.c | 132 +++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 6 ++ net/core/filter.c | 118 +++++++++++++++++++++++++++-- net/core/xdp.c | 29 ++++++++ tools/include/uapi/linux/bpf.h | 27 +++++++ 9 files changed, 356 insertions(+), 5 deletions(-)
packets between given interfaces.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v10-v11: no update
v9: use NULL directly for arg2 and redefine the maps with btf format
v5: add a null_map as we have strict the arg2 to ARG_CONST_MAP_PTR.
Move the testing part to bpf selftest in next patch.
v4: no update.
v3: add rxcnt map to show the packet transmit speed.
v2: no update.
---
samples/bpf/Makefile | 3 +
samples/bpf/xdp_redirect_map_multi_kern.c | 43 ++++++
samples/bpf/xdp_redirect_map_multi_user.c | 166 ++++++++++++++++++++++
3 files changed, 212 insertions(+)
create mode 100644 samples/bpf/xdp_redirect_map_multi_kern.c
create mode 100644 samples/bpf/xdp_redirect_map_multi_user.c
test we have 3 forward groups and 1 exclude group. The test will redirect each interface's packets to all the interfaces in the forward group, and exclude the interface in exclude map. We will also test both DEVMAP and DEVMAP_HASH with xdp generic and drv. For more test details, you can find it in the test script. Here is the test result. ]# ./test_xdp_redirect_multi.sh Pass: xdpgeneric arp ns1-2 Pass: xdpgeneric arp ns1-3 Pass: xdpgeneric arp ns1-4 Pass: xdpgeneric ping ns1-2 Pass: xdpgeneric ping ns1-3 Pass: xdpgeneric ping ns1-4 Pass: xdpgeneric ping6 ns2-1 Pass: xdpgeneric ping6 ns2-3 Pass: xdpgeneric ping6 ns2-4 Pass: xdpdrv arp ns1-2 Pass: xdpdrv arp ns1-3 Pass: xdpdrv arp ns1-4 Pass: xdpdrv ping ns1-2 Pass: xdpdrv ping ns1-3 Pass: xdpdrv ping ns1-4 Pass: xdpdrv ping6 ns2-1 Pass: xdpdrv ping6 ns2-3 Pass: xdpdrv ping6 ns2-4 Summary: PASS 18, FAIL 0 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- v10-v11: no update v9: use NULL directly for arg2 and redefine the maps with btf format v2-v8: no update --- tools/testing/selftests/bpf/Makefile | 4 +- .../bpf/progs/xdp_redirect_multi_kern.c | 77 ++++++++ .../selftests/bpf/test_xdp_redirect_multi.sh | 164 +++++++++++++++++ .../selftests/bpf/xdp_redirect_multi.c | 173 ++++++++++++++++++ 4 files changed, 417 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c create mode 100755 tools/testing/selftests/bpf/test_xdp_redirect_multi.sh create mode 100644 tools/testing/selftests/bpf/xdp_redirect_multi.c
arg ARG_CONST_MAP_PTR and ARG_CONST_MAP_PTR_OR_NULL. Make sure the
map arg could be verified correctly when it is NULL or valid map
pointer.
Add devmap and devmap_hash in struct bpf_test due to bpf_redirect_{map,
map_multi} limit.
Test result:
]# ./test_verifier 702 705
#702/p ARG_CONST_MAP_PTR: null pointer OK
#703/p ARG_CONST_MAP_PTR: valid map pointer OK
#704/p ARG_CONST_MAP_PTR_OR_NULL: null pointer for ex_map OK
#705/p ARG_CONST_MAP_PTR_OR_NULL: valid map pointer for ex_map OK
Summary: 4 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v2-v11: no update
---
tools/testing/selftests/bpf/test_verifier.c | 22 +++++-
.../testing/selftests/bpf/verifier/map_ptr.c | 70 +++++++++++++++++++
2 files changed, 91 insertions(+), 1 deletion(-)
|
Master branch: bc0b5a0 patch https://patchwork.ozlabs.org/project/netdev/patch/20200907082724.1721685-2-liuhangbin@gmail.com/ applied successfully |
|
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=199872 expired. Closing PR. |
After commit 92cc68e ("drm/vblank: Use spin_(un)lock_irq() in drm_crtc_vblank_on()") omapdrm locking is broken: WARNING: inconsistent lock state 5.8.0-rc2-00483-g92cc68e35863 #13 Tainted: G W -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: ea98222c (&dev->event_lock#2){?.+.}-{2:2}, at: drm_handle_vblank+0x4c/0x520 [drm] {HARDIRQ-ON-W} state was registered at: trace_hardirqs_on+0x9c/0x1ec _raw_spin_unlock_irq+0x20/0x58 omap_crtc_atomic_enable+0x54/0xa0 [omapdrm] drm_atomic_helper_commit_modeset_enables+0x218/0x270 [drm_kms_helper] omap_atomic_commit_tail+0x48/0xc4 [omapdrm] commit_tail+0x9c/0x190 [drm_kms_helper] drm_atomic_helper_commit+0x154/0x188 [drm_kms_helper] drm_client_modeset_commit_atomic+0x228/0x268 [drm] drm_client_modeset_commit_locked+0x60/0x1d0 [drm] drm_client_modeset_commit+0x24/0x40 [drm] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xa8 [drm_kms_helper] drm_fb_helper_set_par+0x2c/0x5c [drm_kms_helper] drm_fb_helper_hotplug_event.part.0+0xa0/0xbc [drm_kms_helper] drm_kms_helper_hotplug_event+0x24/0x30 [drm_kms_helper] output_poll_execute+0x1a8/0x1c0 [drm_kms_helper] process_one_work+0x268/0x800 worker_thread+0x30/0x4e0 kthread+0x164/0x190 ret_from_fork+0x14/0x20 The reason for this is that omapdrm calls drm_crtc_vblank_on() while holding event_lock taken with spin_lock_irq(). It is not clear why drm_crtc_vblank_on() and drm_crtc_vblank_get() are called while holding event_lock. I don't see any problem with moving those calls outside the lock, which is what this patch does. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819103021.440288-1-tomi.valkeinen@ti.com Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The aliases were never released causing the following leaks:
Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
#0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
#1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
#2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
#3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
#4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
#5 0x56332c76a09b in run_test tests/builtin-test.c:410
#6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
#7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
#8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
#9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 956a783 ("perf test: Test pmu-events aliases")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string. But tool event (duration_time) passes a result of
strdup() caused a leak.
It was found by ASAN during metric test:
Direct leak of 210 byte(s) in 70 object(s) allocated from:
#0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
#1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
#2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
#3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
#4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
#5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
#6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
#7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
#8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
#10 0x559fbbc0109b in run_test tests/builtin-test.c:410
#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test_generic_metric() missed to release entries in the pctx. Asan
reported following leak (and more):
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
#1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
#2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
#3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
#4 0x55f7e7341667 in expr__add_ref util/expr.c:120
#5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
#6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
#7 0x55f7e712390b in compute_single tests/parse-metric.c:128
#8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
#9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
#10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
#11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
#12 0x55f7e70be09b in run_test tests/builtin-test.c:410
#13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
#14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
#15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail. Also it can fail in the middle like in
resolve_metric() even for single metric.
In those cases, the intermediate list and ids will be leaked like:
Direct leak of 3 byte(s) in 1 object(s) allocated from:
#0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
#1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
#2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
#3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
#4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
#5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
#6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
#7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
#8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
#9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
#10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
#11 0x55f7e70be09b in run_test tests/builtin-test.c:410
#12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
#13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
#14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
#15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following leaks were detected by ASAN:
Indirect leak of 360 byte(s) in 9 object(s) allocated from:
#0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
#4 0x560578e07045 in test__pmu tests/pmu.c:155
#5 0x560578de109b in run_test tests/builtin-test.c:410
#6 0x560578de109b in test_and_print tests/builtin-test.c:440
#7 0x560578de401a in __cmd_test tests/builtin-test.c:661
#8 0x560578de401a in cmd_test tests/builtin-test.c:807
#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ido Schimmel says: ==================== mlxsw: Refactor headroom management Petr says: On Spectrum, port buffers, also called port headroom, is where packets are stored while they are parsed and the forwarding decision is being made. For lossless traffic flows, in case shared buffer admission is not allowed, headroom is also where to put the extra traffic received before the sent PAUSE takes effect. Another aspect of the port headroom is the so called internal buffer, which is used for egress mirroring. Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer for configuration, and dcbnl_getbuffer for inspection. In order to make it possible to implement these interfaces, it is first necessary to clean up headroom handling, which is currently strewn in several places in the driver. The end goal is an architecture whereby it is possible to take a copy of the current configuration, adjust parameters, and then hand the proposed configuration over to the system to implement it. When everything works, the proposed configuration is accepted and saved. First, this centralizes the reconfiguration handling to one function, which takes care of coordinating buffer size changes and priority map changes to avoid introducing drops. Second, the fact that the configuration is all in one place makes it easy to keep a backup and handle error path rollbacks, which were previously hard to understand. Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom configuration. Patch #2 unifies handling of delay provision between PFC and PAUSE. From now on, delay is to be measured in bytes of extra space, and will not include MTU. PFC handler sets the delay directly from the parameter it gets through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to have the same meaning. In patches #3-#5, MTU, lossiness and priorities are gradually moved over to struct mlxsw_sp_hdroom. In patches #6-#11, handling of buffer resizing and priority maps is moved from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface through which the various clients express how the headroom should be configured. Patch #12 is a small cleanup that the previous transformation made possible. In patch #13, the port init code becomes a boring client of the headroom code, instead of rolling its own thing. Patches #14 and #15 move handling of internal mirroring buffer to the new headroom code as well. Previously, this code was in the SPAN module. This patchset converts the SPAN module to another boring client of the headroom code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
…ertion-and-removal' Ido Schimmel says: ==================== mlxsw: spectrum: Prepare for XM implementation - prefix insertion and removal Jiri says: This is a preparation patchset for follow-up support of boards with extended mezzanine (XM), which is going to allow extended (scale-wise) router offload. XM requires a separate PRM register named XMDR to be used instead of RALUE to insert/update/remove FIB entries. Therefore, this patchset extends the previously introduces low-level ops to be able to have XM-specific FIB entry config implementation. Currently the existing original RALUE implementation is moved to "basic" low-level ops. Unlike legacy router, insertion/update/removal of FIB entries into XM could be done in bulks up to 4 items in a single PRM register write. That is why this patchset implements "an op context", that allows the future XM ops implementation to squash multiple FIB events to single register write. For that, the way in which the FIB events are processed by the work queue has to be changed. The conversion from 1:1 FIB event - work callback call to event queue is implemented in patch #3. Patch #4 introduces "an op context" that will allow in future to squash multiple FIB events into one XMDR register write. Patch #12 converts it from stack to be allocated per instance. Existing RALUE manipulations are pushed to ops in patch #10. Patch #13 is introducing a possibility for low-level implementation to have per FIB entry private memory. The rest of the patches are either cosmetics or smaller preparations. ==================== Link: https://lore.kernel.org/r/20201110094900.1920158-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fix is for a failure that occurred in the DWARF unwind perf test.
Stack unwinders may probe memory when looking for frames.
Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.
This can lead to false memory sanitizer failures for the use of an
uninitialized value.
Avoid this problem by removing the poison on the copied stack.
The full msan failure with track origins looks like:
==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#24 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#25 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#17 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
#0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445
SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by casting 1 operand to u64. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by casting 1 operand to u64. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN. [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com>
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN: [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210806150419.109658-1-th.yasumatsu@gmail.com
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN: [...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...] In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow. Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN. It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket. If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time. Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory. Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu <th.yasumatsu@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210806150419.109658-1-th.yasumatsu@gmail.com
It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) #5 __lock_page (mm/filemap.c:1648:2) #6 lock_page (./include/linux/pagemap.h:625:3) #7 pagecache_get_page (mm/filemap.c:1910:4) #8 find_or_create_page (./include/linux/pagemap.h:420:9) #9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) #10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) #11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) #12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) #13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) #14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) #15 vfs_ioctl (fs/ioctl.c:51:10) #16 __do_sys_ioctl (fs/ioctl.c:874:11) #17 __se_sys_ioctl (fs/ioctl.c:860:1) #18 __x64_sys_ioctl (fs/ioctl.c:860:1) #19 do_syscall_x64 (arch/x86/entry/common.c:50:14) #20 do_syscall_64 (arch/x86/entry/common.c:80:7) #21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
When I do fuzz test for bonding device interface, I got the following use-after-free Calltrace: ================================================================== BUG: KASAN: use-after-free in bond_enslave+0x1521/0x24f0 Read of size 8 at addr ffff88825bc11c00 by task ifenslave/7365 CPU: 5 PID: 7365 Comm: ifenslave Tainted: G E 5.15.0-rc1+ #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 Call Trace: dump_stack_lvl+0x6c/0x8b print_address_description.constprop.0+0x48/0x70 kasan_report.cold+0x82/0xdb __asan_load8+0x69/0x90 bond_enslave+0x1521/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f19159cf577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 78 RSP: 002b:00007ffeb3083c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffeb3084bca RCX: 00007f19159cf577 RDX: 00007ffeb3083ce0 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 00007ffeb3084bc4 R08: 0000000000000040 R09: 0000000000000000 R10: 00007ffeb3084bc0 R11: 0000000000000246 R12: 00007ffeb3083ce0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb3083cb0 Allocated by task 7365: kasan_save_stack+0x23/0x50 __kasan_kmalloc+0x83/0xa0 kmem_cache_alloc_trace+0x22e/0x470 bond_enslave+0x2e1/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 7365: kasan_save_stack+0x23/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x24/0x40 __kasan_slab_free+0xf2/0x130 kfree+0xd1/0x5c0 slave_kobj_release+0x61/0x90 kobject_put+0x102/0x180 bond_sysfs_slave_add+0x7a/0xa0 bond_enslave+0x11b6/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 delayed_work_timer_fn+0x3e/0x50 call_timer_fn+0x148/0x470 run_timer_softirq+0x8a8/0xc50 __do_softirq+0x107/0x55f Second to last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 __queue_delayed_work+0x130/0x180 queue_delayed_work_on+0xa7/0xb0 bond_enslave+0xe25/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88825bc11c00 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 0 bytes inside of 1024-byte region [ffff88825bc11c00, ffff88825bc12000) The buggy address belongs to the page: page:ffffea00096f0400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25bc10 head:ffffea00096f0400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab|head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff00000010200 ffffea0009a71c08 ffff888240001968 ffff88810004dbc0 raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88825bc11b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88825bc11b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88825bc11c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88825bc11c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88825bc11d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Put new_slave in bond_sysfs_slave_add() will cause use-after-free problems when new_slave is accessed in the subsequent error handling process. Since new_slave will be put in the subsequent error handling process, remove the unnecessary put to fix it. In addition, when sysfs_create_file() fails, if some files have been crea- ted successfully, we need to call sysfs_remove_file() to remove them. Since there are sysfs_create_files() & sysfs_remove_files() can be used, use these two functions instead. Fixes: 7afcaec (bonding: use kobject_put instead of _del after kobject_add) Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The exit function fixes a memory leak with the src field as detected by
leak sanitizer. An example of which is:
Indirect leak of 25133184 byte(s) in 207 object(s) allocated from:
#0 0x7f199ecfe987 in __interceptor_calloc libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x55defe638224 in annotated_source__alloc_histograms util/annotate.c:803
#2 0x55defe6397e4 in symbol__hists util/annotate.c:952
#3 0x55defe639908 in symbol__inc_addr_samples util/annotate.c:968
#4 0x55defe63aa29 in hist_entry__inc_addr_samples util/annotate.c:1119
#5 0x55defe499a79 in hist_iter__report_callback tools/perf/builtin-report.c:182
#6 0x55defe7a859d in hist_entry_iter__add util/hist.c:1236
#7 0x55defe49aa63 in process_sample_event tools/perf/builtin-report.c:315
#8 0x55defe731bc8 in evlist__deliver_sample util/session.c:1473
#9 0x55defe731e38 in machines__deliver_event util/session.c:1510
#10 0x55defe732a23 in perf_session__deliver_event util/session.c:1590
#11 0x55defe72951e in ordered_events__deliver_event util/session.c:183
#12 0x55defe740082 in do_flush util/ordered-events.c:244
#13 0x55defe7407cb in __ordered_events__flush util/ordered-events.c:323
#14 0x55defe740a61 in ordered_events__flush util/ordered-events.c:341
#15 0x55defe73837f in __perf_session__process_events util/session.c:2390
#16 0x55defe7385ff in perf_session__process_events util/session.c:2420
...
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Similar to commit b8d8436 ("drm/i915/gt: Hold RPM wakelock during PXP suspend") but to fix the same warning for unbind during shutdown: ------------[ cut here ]------------ RPM wakelock ref not held during HW access WARNING: CPU: 0 PID: 4139 at drivers/gpu/drm/i915/intel_runtime_pm.h:115 gen12_fwtable_write32+0x1b7/0 Modules linked in: 8021q ccm rfcomm cmac algif_hash algif_skcipher af_alg uinput snd_hda_codec_hdmi vf industrialio iwl7000_mac80211 cros_ec_sensorhub lzo_rle lzo_compress zram iwlwifi cfg80211 joydev CPU: 0 PID: 4139 Comm: halt Tainted: G U W 5.10.84 #13 344e11e079c4a03940d949e537eab645f6 RIP: 0010:gen12_fwtable_write32+0x1b7/0x200 Code: 48 c7 c7 fc b3 b5 89 31 c0 e8 2c f3 ad ff 0f 0b e9 04 ff ff ff c6 05 71 e9 1d 01 01 48 c7 c7 d67 RSP: 0018:ffffa09ec0bb3bb0 EFLAGS: 00010246 RAX: 12dde97bbd260300 RBX: 00000000000320f0 RCX: ffffffff89e60ea0 RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: ffffffff89e60e70 RBP: ffffa09ec0bb3bd8 R08: 0000000000000000 R09: ffffa09ec0bb3950 R10: 00000000ffffdfff R11: ffffffff89e91160 R12: 0000000000000000 R13: 0000000028121969 R14: ffff9515c32f0990 R15: 0000000040000000 FS: 0000790dcf225740(0000) GS:ffff951737800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000058b25efae147 CR3: 0000000133ea6001 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: intel_pxp_fini_hw+0x2f/0x39 i915_pxp_tee_component_unbind+0x1c/0x42 component_unbind+0x32/0x48 component_unbind_all+0x80/0x9d take_down_master+0x24/0x36 component_master_del+0x56/0x70 mei_pxp_remove+0x2c/0x68 mei_cl_device_remove+0x35/0x68 device_release_driver_internal+0x100/0x1a1 mei_cl_bus_remove_device+0x21/0x79 mei_cl_bus_remove_devices+0x3b/0x51 mei_stop+0x3b/0xae mei_me_shutdown+0x23/0x58 device_shutdown+0x144/0x1d3 kernel_power_off+0x13/0x4c __se_sys_reboot+0x1d4/0x1e9 do_syscall_64+0x43/0x55 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x790dcf316273 Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad8 RSP: 002b:00007ffca0df9198 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 000000004321fedc RCX: 0000790dcf316273 RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007ffca0df9200 R08: 0000000000000007 R09: 0000563ce8cd8970 R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffca0df9308 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003 ---[ end trace 2f501b01b348f114 ]--- ACPI: Preparing to enter system sleep state S5 reboot: Power down Changes since v1: - Rebase to latest drm-tip Fixes: 0cfab4c ("drm/i915/pxp: Enable PXP power management") Suggested-by: Lee Shawn C <shawn.c.lee@intel.com> Signed-off-by: Juston Li <juston.li@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220106200236.489656-2-juston.li@intel.com (cherry picked from commit 57ded5f) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.
[ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called
[ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called
...
[ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
crash> bt
...
PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd"
...
#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
[exception RIP: dma_pool_alloc+0x1ab]
RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090
RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00
R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0
R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
crash> net_device.state ffff89443b0c0000
state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
To prevent this scenario, we also make sure that the netdevice is present.
Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
HW counters for soft devices
Petr says:
Offloading switch device drivers may be able to collect statistics of the
traffic taking place in the HW datapath that pertains to a certain soft
netdevice, such as a VLAN. In this patch set, add the necessary
infrastructure to allow exposing these statistics to the offloaded
netdevice in question, and add mlxsw offload.
Across HW platforms, the counter itself very likely constitutes a limited
resource, and the act of counting may have a performance impact. Therefore
this patch set makes the HW statistics collection opt-in and togglable from
userspace on a per-netdevice basis.
Additionally, HW devices may have various limiting conditions under which
they can realize the counter. Therefore it is also possible to query
whether the requested counter is realized by any driver. In TC parlance,
which is to a degree reused in this patch set, two values are recognized:
"request" tracks whether the user enabled collecting HW statistics, and
"used" tracks whether any HW statistics are actually collected.
In the past, this author has expressed the opinion that `a typical user
doing "ip -s l sh", including various scripts, wants to see the full
picture and not worry what's going on where'. While that would be nice,
unfortunately it cannot work:
- Packets that trap from the HW datapath to the SW datapath would be
double counted.
For a given netdevice, some traffic can be purely a SW artifact, and some
may flow through the HW object corresponding to the netdevice. But some
traffic can also get trapped to the SW datapath after bumping the HW
counter. It is not clear how to make sure double-counting does not occur
in the SW datapath in that case, while still making sure that possibly
divergent SW forwarding path gets bumped as appropriate.
So simply adding HW and SW stats may work roughly, most of the time, but
there are scenarios where the result is nonsensical.
- HW devices will have limitations as to what type of traffic they can
count.
In case of mlxsw, which is part of this patch set, there is no reasonable
way to count all traffic going through a certain netdevice, such as a
VLAN netdevice enslaved to a bridge. It is however very simple to count
traffic flowing through an L3 object, such as a VLAN netdevice with an IP
address.
Similarly for physical netdevices, the L3 object at which the counter is
installed is the subport carrying untagged traffic.
These are not "just counters". It is important that the user understands
what is being counted. It would be incorrect to conflate these statistics
with another existing statistics suite.
To that end, this patch set introduces a statistics suite called "L3
stats". This label should make it easy to understand what is being counted,
and to decide whether a given device can or cannot implement this suite for
some type of netdevice. At the same time, the code is written to make
future extensions easy, should a device pop up that can implement a
different flavor of statistics suite (say L2, or an address-family-specific
suite).
For example, using a work-in-progress iproute2[1], to turn on and then list
the counters on a VLAN netdevice:
# ip stats set dev swp1.200 l3_stats on
# ip stats show dev swp1.200 group offload subgroup l3_stats
56: swp1.200: group offload subgroup l3_stats on used on
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
The patchset progresses as follows:
- Patch #1 is a cleanup.
- In patch #2, remove the assumption that all LINK_OFFLOAD_XSTATS are
dev-backed.
The only attribute defined under the nest is currently
IFLA_OFFLOAD_XSTATS_CPU_HIT. L3_STATS differs from CPU_HIT in that the
driver that supplies the statistics is not the same as the driver that
implements the netdevice. Make the code compatible with this in patch #2.
- In patch #3, add the possibility to filter inside nests.
The filter_mask field of RTM_GETSTATS header determines which
top-level attributes should be included in the netlink response. This
saves processing time by only including the bits that the user cares
about instead of always dumping everything. This is doubly important
for HW-backed statistics that would typically require a trip to the
device to fetch the stats. In this patch, the UAPI is extended to
allow filtering inside IFLA_STATS_LINK_OFFLOAD_XSTATS in particular,
but the scheme is easily extensible to other nests as well.
- In patch #4, propagate extack where we need it.
In patch #5, make it possible to propagate errors from drivers to the
user.
- In patch #6, add the in-kernel APIs for keeping track of the new stats
suite, and the notifiers that the core uses to communicate with the
drivers.
- In patch #7, add UAPI for obtaining the new stats suite.
- In patch #8, add a new UAPI message, RTM_SETSTATS, which will carry
the message to toggle the newly-added stats suite.
In patch #9, add the toggle itself.
At this point the core is ready for drivers to add support for the new
stats suite.
- In patches #10, #11 and #12, apply small tweaks to mlxsw code.
- In patch #13, add support for L3 stats, which are realized as RIF
counters.
- Finally in patch #14, a selftest is added to the net/forwarding
directory. Technically this is a HW-specific test, in that without a HW
implementing the counters, it just will not pass. But devices that
support L3 statistics at all are likely to be able to reuse this
selftest, so it seems appropriate to put it in the general forwarding
directory.
We also have a netdevsim implementation, and a corresponding selftest that
verifies specifically some of the core code. We intend to contribute these
later. Interested parties can take a look at the raw code at [2].
[1] https://github.com/pmachata/iproute2/commits/soft_counters
[2] https://github.com/pmachata/linux_mlxsw/commits/petrm_soft_counters_2
v2:
- Patch #3:
- Do not declare strict_start_type at the new policies, since they are
used with nla_parse_nested() (sans _deprecated).
- Use NLA_POLICY_NESTED to declare what the nest contents should be
- Use NLA_POLICY_MASK instead of BITFIELD32 for the filtering
attribute.
- Patch #6:
- s/monotonous/monotonic/ in commit message
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
- Patch #7:
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
- Patch #8:
- Do not declare strict_start_type at the new policies, since they are
used with nla_parse_nested() (sans _deprecated).
- Patch #13:
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only one "goto done;" in set_device_flags() and this happens *before* hci_dev_lock() is called, move the done label to after the hci_dev_unlock() to fix the following unlock balance: [ 31.493567] ===================================== [ 31.493571] WARNING: bad unlock balance detected! [ 31.493576] 5.17.0-rc2+ #13 Tainted: G C E [ 31.493581] ------------------------------------- [ 31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at: [ 31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth] [ 31.493684] but there are no more locks to release! Note this bug has been around for a couple of years, but before commit fe92ee6 ("Bluetooth: hci_core: Rework hci_conn_params flags") supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so the check for unsupported flags which does the "goto done;" never triggered. Fixes: fe92ee6 ("Bluetooth: hci_core: Rework hci_conn_params flags") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In remove_phb_dynamic() we use &phb->io_resource, after we've called device_unregister(&host_bridge->dev). But the unregister may have freed phb, because pcibios_free_controller_deferred() is the release function for the host_bridge. If there are no outstanding references when we call device_unregister() then phb will be freed out from under us. This has gone mainly unnoticed, but with slub_debug and page_poison enabled it can lead to a crash: PID: 7574 TASK: c0000000d492cb80 CPU: 13 COMMAND: "drmgr" #0 [c0000000e4f075a0] crash_kexec at c00000000027d7dc #1 [c0000000e4f075d0] oops_end at c000000000029608 #2 [c0000000e4f07650] __bad_page_fault at c0000000000904b4 #3 [c0000000e4f076c0] do_bad_slb_fault at c00000000009a5a8 #4 [c0000000e4f076f0] data_access_slb_common_virt at c000000000008b30 Data SLB Access [380] exception frame: R0: c000000000167250 R1: c0000000e4f07a00 R2: c000000002a46100 R3: c000000002b39ce8 R4: 00000000000000c0 R5: 00000000000000a9 R6: 3894674d000000c0 R7: 0000000000000000 R8: 00000000000000ff R9: 0000000000000100 R10: 6b6b6b6b6b6b6b6b R11: 0000000000008000 R12: c00000000023da80 R13: c0000009ffd38b00 R14: 0000000000000000 R15: 000000011c87f0f0 R16: 0000000000000006 R17: 0000000000000003 R18: 0000000000000002 R19: 0000000000000004 R20: 0000000000000005 R21: 000000011c87ede8 R22: 000000011c87c5a8 R23: 000000011c87d3a0 R24: 0000000000000000 R25: 0000000000000001 R26: c0000000e4f07cc8 R27: c00000004d1cc400 R28: c0080000031d00e8 R29: c00000004d23d800 R30: c00000004d1d2400 R31: c00000004d1d2540 NIP: c000000000167258 MSR: 8000000000009033 OR3: c000000000e9f474 CTR: 0000000000000000 LR: c000000000167250 XER: 0000000020040003 CCR: 0000000024088420 MQ: 0000000000000000 DAR: 6b6b6b6b6b6b6ba3 DSISR: c0000000e4f07920 Syscall Result: fffffffffffffff2 [NIP : release_resource+56] [LR : release_resource+48] #5 [c0000000e4f07a00] release_resource at c000000000167258 (unreliable) #6 [c0000000e4f07a30] remove_phb_dynamic at c000000000105648 #7 [c0000000e4f07ab0] dlpar_remove_slot at c0080000031a09e8 [rpadlpar_io] #8 [c0000000e4f07b50] remove_slot_store at c0080000031a0b9c [rpadlpar_io] #9 [c0000000e4f07be0] kobj_attr_store at c000000000817d8c #10 [c0000000e4f07c00] sysfs_kf_write at c00000000063e504 #11 [c0000000e4f07c20] kernfs_fop_write_iter at c00000000063d868 #12 [c0000000e4f07c70] new_sync_write at c00000000054339c #13 [c0000000e4f07d10] vfs_write at c000000000546624 #14 [c0000000e4f07d60] ksys_write at c0000000005469f4 #15 [c0000000e4f07db0] system_call_exception at c000000000030840 #16 [c0000000e4f07e10] system_call_vectored_common at c00000000000c168 To avoid it, we can take a reference to the host_bridge->dev until we're done using phb. Then when we drop the reference the phb will be freed. Fixes: 2dd9c11 ("powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)") Reported-by: David Dai <zdai@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lore.kernel.org/r/20220318034219.1188008-1-mpe@ellerman.id.au
Ido Schimmel says: ==================== net/sched: Better error reporting for offload failures This patchset improves error reporting to user space when offload fails during the flow action setup phase. That is, when failures occur in the actions themselves, even before calling device drivers. Requested / reported in [1]. This is done by passing extack to the offload_act_setup() callback and making use of it in the various actions. Patches #1-#2 change matchall and flower to log error messages to user space in accordance with the verbose flag. Patch #3 passes extack to the offload_act_setup() callback from the various call sites, including matchall and flower. Patches #4-#11 make use of extack in the various actions to report offload failures. Patch #12 adds an error message when the action does not support offload at all. Patches #13-#14 change matchall and flower to stop overwriting more specific error messages. [1] https://lore.kernel.org/netdev/20220317185249.5mff5u2x624pjewv@skbuf/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Use the actual length of volume coherency data when setting the xattr to avoid the following KASAN report. BUG: KASAN: slab-out-of-bounds in cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] Write of size 4 at addr ffff888101e02af4 by task kworker/6:0/1347 CPU: 6 PID: 1347 Comm: kworker/6:0 Kdump: loaded Not tainted 5.18.0-rc1-nfs-fscache-netfs+ #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 Workqueue: events fscache_create_volume_work [fscache] Call Trace: <TASK> dump_stack_lvl+0x45/0x5a print_report.cold+0x5e/0x5db ? __lock_text_start+0x8/0x8 ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] kasan_report+0xab/0x120 ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] kasan_check_range+0xf5/0x1d0 memcpy+0x39/0x60 cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles] cachefiles_acquire_volume+0x2be/0x500 [cachefiles] ? __cachefiles_free_volume+0x90/0x90 [cachefiles] fscache_create_volume_work+0x68/0x160 [fscache] process_one_work+0x3b7/0x6a0 worker_thread+0x2c4/0x650 ? process_one_work+0x6a0/0x6a0 kthread+0x16c/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Allocated by task 1347: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 cachefiles_set_volume_xattr+0x76/0x350 [cachefiles] cachefiles_acquire_volume+0x2be/0x500 [cachefiles] fscache_create_volume_work+0x68/0x160 [fscache] process_one_work+0x3b7/0x6a0 worker_thread+0x2c4/0x650 kthread+0x16c/0x1a0 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888101e02af0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 4 bytes inside of 8-byte region [ffff888101e02af0, ffff888101e02af8) The buggy address belongs to the physical page: page:00000000a2292d70 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101e02 flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0000200 0000000000000000 dead000000000001 ffff888100042280 raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888101e02980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc ffff888101e02a00: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 >ffff888101e02a80: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 04 fc ^ ffff888101e02b00: fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc ffff888101e02b80: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc ================================================================== Fixes: 413a4a6 "cachefiles: Fix volume coherency attribute" Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20220405134649.6579-1-dwysocha@redhat.com/ # v1 Link: https://lore.kernel.org/r/20220405142810.8208-1-dwysocha@redhat.com/ # Incorrect v2
When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check
that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are
initialized with 0 value, but the check is done with pte_none. 0 pte
however does not necessarily mean that pte_none will return true. e.g.
on xtensa it returns false, resulting in the following runtime warnings:
WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108
CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13
Call Trace:
dump_stack+0xc/0x40
__warn+0x8f/0x174
warn_slowpath_fmt+0x48/0xac
__kmap_local_sched_out+0x51/0x108
__schedule+0x71a/0x9c4
preempt_schedule_irq+0xa0/0xe0
common_exception_return+0x5c/0x93
do_wp_page+0x30e/0x330
handle_mm_fault+0xa70/0xc3c
do_page_fault+0x1d8/0x3c4
common_exception+0x7f/0x7f
WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0
CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13
Call Trace:
dump_stack+0xc/0x40
__warn+0x8f/0x174
warn_slowpath_fmt+0x48/0xac
__kmap_local_sched_in+0x50/0xe0
finish_task_switch$isra$0+0x1ce/0x2f8
__schedule+0x86e/0x9c4
preempt_schedule_irq+0xa0/0xe0
common_exception_return+0x5c/0x93
do_wp_page+0x30e/0x330
handle_mm_fault+0xa70/0xc3c
do_page_fault+0x1d8/0x3c4
common_exception+0x7f/0x7f
Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.
Link: https://lkml.kernel.org/r/20220403235159.3498065-1-jcmvbkbc@gmail.com
Fixes: 5fbda3e ("sched: highmem: Store local kmaps in task struct")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes it is necessary to use a PLT entry to call an ftrace trampoline. This is handled by ftrace_make_call() and ftrace_make_nop(), with each having *almost* identical logic, but this is not handled by ftrace_modify_call() since its introduction in commit: 3b23e49 ("arm64: implement ftrace with regs") Due to this, if we ever were to call ftrace_modify_call() for a callsite which requires a PLT entry for a trampoline, then either: a) If the old addr requires a trampoline, ftrace_modify_call() will use an out-of-range address to generate the 'old' branch instruction. This will result in warnings from aarch64_insn_gen_branch_imm() and ftrace_modify_code(), and no instructions will be modified. As ftrace_modify_call() will return an error, this will result in subsequent internal ftrace errors. b) If the old addr does not require a trampoline, but the new addr does, ftrace_modify_call() will use an out-of-range address to generate the 'new' branch instruction. This will result in warnings from aarch64_insn_gen_branch_imm(), and ftrace_modify_code() will replace the 'old' branch with a BRK. This will result in a kernel panic when this BRK is later executed. Practically speaking, case (a) is vastly more likely than case (b), and typically this will result in internal ftrace errors that don't necessarily affect the rest of the system. This can be demonstrated with an out-of-tree test module which triggers ftrace_modify_call(), e.g. | # insmod test_ftrace.ko | test_ftrace: Function test_function raw=0xffffb3749399201c, callsite=0xffffb37493992024 | branch_imm_common: offset out of range | branch_imm_common: offset out of range | ------------[ ftrace bug ]------------ | ftrace failed to modify | [<ffffb37493992024>] test_function+0x8/0x38 [test_ftrace] | actual: 1d:00:00:94 | Updating ftrace call site to call a different ftrace function | ftrace record flags: e0000002 | (2) R | expected tramp: ffffb374ae42ed54 | ------------[ cut here ]------------ | WARNING: CPU: 0 PID: 165 at kernel/trace/ftrace.c:2085 ftrace_bug+0x280/0x2b0 | Modules linked in: test_ftrace(+) | CPU: 0 PID: 165 Comm: insmod Not tainted 5.19.0-rc2-00002-g4d9ead8b45ce #13 | Hardware name: linux,dummy-virt (DT) | pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : ftrace_bug+0x280/0x2b0 | lr : ftrace_bug+0x280/0x2b0 | sp : ffff80000839ba00 | x29: ffff80000839ba00 x28: 0000000000000000 x27: ffff80000839bcf0 | x26: ffffb37493994180 x25: ffffb374b0991c28 x24: ffffb374b0d70000 | x23: 00000000ffffffea x22: ffffb374afcc33b0 x21: ffffb374b08f9cc8 | x20: ffff572b8462c000 x19: ffffb374b08f9000 x18: ffffffffffffffff | x17: 6c6c6163202c6331 x16: ffffb374ae5ad110 x15: ffffb374b0d51ee4 | x14: 0000000000000000 x13: 3435646532346561 x12: 3437336266666666 | x11: 203a706d61727420 x10: 6465746365707865 x9 : ffffb374ae5149e8 | x8 : 336266666666203a x7 : 706d617274206465 x6 : 00000000fffff167 | x5 : ffff572bffbc4a08 x4 : 00000000fffff167 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffff572b84461e00 x0 : 0000000000000022 | Call trace: | ftrace_bug+0x280/0x2b0 | ftrace_replace_code+0x98/0xa0 | ftrace_modify_all_code+0xe0/0x144 | arch_ftrace_update_code+0x14/0x20 | ftrace_startup+0xf8/0x1b0 | register_ftrace_function+0x38/0x90 | test_ftrace_init+0xd0/0x1000 [test_ftrace] | do_one_initcall+0x50/0x2b0 | do_init_module+0x50/0x1f0 | load_module+0x17c8/0x1d64 | __do_sys_finit_module+0xa8/0x100 | __arm64_sys_finit_module+0x2c/0x3c | invoke_syscall+0x50/0x120 | el0_svc_common.constprop.0+0xdc/0x100 | do_el0_svc+0x3c/0xd0 | el0_svc+0x34/0xb0 | el0t_64_sync_handler+0xbc/0x140 | el0t_64_sync+0x18c/0x190 | ---[ end trace 0000000000000000 ]--- We can solve this by consistently determining whether to use a PLT entry for an address. Note that since (the earlier) commit: f1a54ae ("arm64: module/ftrace: intialize PLT at load time") ... we can consistently determine the PLT address that a given callsite will use, and therefore ftrace_make_nop() does not need to skip validation when a PLT is in use. This patch factors the existing logic out of ftrace_make_call() and ftrace_make_nop() into a common ftrace_find_callable_addr() helper function, which is used by ftrace_make_call(), ftrace_make_nop(), and ftrace_modify_call(). In ftrace_make_nop() the patching is consistently validated by ftrace_modify_code() as we can always determine what the old instruction should have been. Fixes: 3b23e49 ("arm64: implement ftrace with regs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Tested-by: "Ivan T. Ivanov" <iivanov@suse.de> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614080944.1349146-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ido Schimmel says: ==================== mlxsw: Unified bridge conversion - part 2/6 This is the second part of the conversion of mlxsw to the unified bridge model. Part 1 was merged in commit 4336487 ("Merge branch 'mlxsw-unified-bridge-conversion-part-1'") which includes details about the new model and the motivation behind the conversion. This patchset does not begin the conversion, but rather prepares the code base for it. Patchset overview: Patch #1 removes an unnecessary field from one of the FID families. Patches #2-#7 make various improvements in the layer 2 multicast code, making it more receptive towards upcoming changes. Patches #8-#10 prepare the CONFIG_PROFILE command for the unified bridge model. This command will be used to enable the new model in the last patchset. Patches #11-#13 perform small changes in the FID code, preparing it for upcoming changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Unified bridge conversion - part 4/6
This is the fourth part of the conversion of mlxsw to the unified bridge
model.
Unlike previous parts that prepared mlxsw for the conversion, this part
actually starts the conversion. It focuses on flooding configuration and
converts mlxsw to the more "raw" APIs of the unified bridge model.
The patches configure the different stages of the flooding pipeline in
Spectrum that looks as follows (at a high-level):
+------------+ +----------+ +-------+
{FID, | | {Packet type, | | | | MID
DMAC} | FDB lookup | Bridge type} | SFGC | MID base | | Index
+--------> (miss) +----------------> register +-----------> Adder +------->
| | | | | |
| | | | | |
+------------+ +----+-----+ +---^---+
| |
Table | |
type | | Offset
| +-------+ |
| | | |
| | | |
+----->+ Mux +------+
| |
| |
+-^---^-+
| |
FID| |FID
| |offset
+ +
The multicast identifier (MID) index is used as an index to the port
group table (PGT) that contains a bitmap of ports via which a packet
needs to be replicated.
From the PGT table, the packet continues to the multicast port egress
(MPE) table that determines the packet's egress VLAN. This is a
two-dimensional table that is indexed by port and switch multicast port
to egress (SMPE) index. The latter can be thought of as a FID. Without
it, all the packets replicated via a certain port would get the same
VLAN, regardless of the bridge domain (FID).
Logically, these two steps look as follows:
PGT table MPE table
+-----------------------+ +---------------+
| | {Local port, | | Egress
MID index | Local ports bitmap #1 | SMPE index} | | VID
+------------> ... +---------------> +-------->
| Local ports bitmap #N | | |
| | SMPE | |
+-----------------------+ +---------------+
Local port
Patchset overview:
Patch #1 adds a variable to guard against mixed model configuration.
Will be removed in part 6 when mlxsw is fully converted to the unified
model.
Patches #2-#5 introduce two new FID attributes required for flooding
configuration in the new model:
1. 'flood_rsp': Instructs the firmware to handle flooding configuration
for this FID. Only set for router FIDs (rFIDs) which are used to connect
a {Port, VLAN} to the router block.
2. 'bridge_type': Allows the device to determine the flood table (i.e.,
base index to the PGT table) for the FID. The first type will be used
for FIDs in a VLAN-aware bridge and the second for FIDs representing
VLAN-unaware bridges.
Patch #6 configures the MPE table that determines the egress VLAN of a
packet that is forwarded according to L2 multicast / flood.
Patches #7-#11 add the PGT table and related APIs to allocate entries
and set / clear ports in them.
Patches #12-#13 convert the flooding configuration to use the new PGT
APIs.
====================
Link: https://lore.kernel.org/r/20220627070621.648499-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel says: ==================== mlxsw: Unified bridge conversion - part 6/6 This is the sixth and final part of the conversion of mlxsw to the unified bridge model. It transitions the last bits of functionality that were under firmware's responsibility in the legacy model to the driver. The last patches flip the driver to the unified bridge model and clean up code that was used to make the conversion easier to review. Patchset overview: Patch #1 sets the egress VID for known unicast packets. For multicast packets, the egress VID is configured using the MPE table. See commit 8c2da08 ("mlxsw: spectrum_fid: Configure egress VID classification for multicast"). Patch #2 configures the VNI to FID classification that is used during decapsulation. Patch #3 configures ingress router interface (RIF) in FID classification records, so that when a packet reaches the router block, its ingress RIF is known. Care is taken to configure this in all the different flows (e.g., RIF set on a FID, {Port, VID} joins a FID that already has a RIF etc.). Patch #4 configures the egress VID for routed packets. For such packets, the egress VID is not set by the MPE table or by an FDB record at the egress bridge, but instead by a dedicated table that maps {Egress RIF, Egress port} to a VID. Patch #5 removes VID configuration from RIF creation as in the unified bridge model firmware no longer needs it. Patch #6 sets the egress FID to use in RIF configuration so that the device knows using which FID to bridge the packet after routing. Patches #7-#9 add a new 802.1Q family and associated VLAN RIFs. In the unified bridge model, we no longer need to emulate 802.1Q FIDs using 802.1D FIDs as VNI can be associated with both. Patches #10-#11 finally flip the driver to the unified bridge model. Patches #12-#13 clean up code that was used to make the conversion easier to review. v2: * Fix build failure [1] in patch #1. [1] https://lore.kernel.org/netdev/20220630201709.6e66a1bb@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
ASAN reports an use-after-free in btf_dump_name_dups:
ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
#0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
#1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
#2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
#3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
#4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
#5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
#6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
#7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
#8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
#9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
#10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
#11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#15 0xaaaab5d65990 (test_progs+0x185990)
0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
#7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
#8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#12 0xaaaab5d65990 (test_progs+0x185990)
previously allocated by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
#7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
#8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
#9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#13 0xaaaab5d65990 (test_progs+0x185990)
The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.
Fix it by storing duplicated string address in name_map.
Fixes: 351131b ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
ASAN reports an use-after-free in btf_dump_name_dups:
ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
#0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
#1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
#2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
#3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
#4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
#5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
#6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
#7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
#8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
#9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
#10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
#11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#15 0xaaaab5d65990 (test_progs+0x185990)
0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
#7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
#8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#12 0xaaaab5d65990 (test_progs+0x185990)
previously allocated by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
#7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
#8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
#9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#13 0xaaaab5d65990 (test_progs+0x185990)
The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.
Fix it by storing duplicated string address in name_map.
Fixes: 351131b ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Pull request for series with
subject: xdp: add a new helper for dev map multicast support
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=199872