Add block device option in runltplite.sh #8

HarryWeppner · 2014-03-04T00:49:34Z

Hi,

as previously reported on the mailing list at least one test (access06) included in runltplite.sh requires a block device to be defined and reports a test failure if not present. However, runltplite.sh provided no command line option to define a block device or block device file system type.

With this change runltplite.sh sources the functions in runltp and provides an option to specify the block device (-b) and block device filesystem type (-B) just like runltp.

NOTE: the underlying assumption is that both runltp and runltplite.sh are located in the same directory.

Thanks & cheerio, Harry.

* At least one test (access06) included in runltplite.sh requires a block device and is reported as a failure if not specified. * runltplite.sh now sources the functions in runltp and provides an option to specify the block device (-b) and block device filesystem type (-B) - just like runltp does.

metan-ucw · 2014-03-13T15:14:15Z

Applied in ltp.

Description of Problem: There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. Testcase hugemmap06.c is the trigger to cause system crash: crash> bt -s PID: 4492 TASK: ffff88033e437520 CPU: 2 COMMAND: "hugemmap06" #0 [ffff88033dbb3960] machine_kexec+395 at ffffffff8103d1ab #1 [ffff88033dbb39c0] crash_kexec+114 at ffffffff810cc4f2 #2 [ffff88033dbb3a90] oops_end+192 at ffffffff8153c840 #3 [ffff88033dbb3ac0] die+91 at ffffffff81010f5b #4 [ffff88033dbb3af0] do_general_protection+338 at ffffffff8153c332 #5 [ffff88033dbb3b20] general_protection+37 at ffffffff8153bb05 [exception RIP: list_del+40] RIP: ffffffff812a3598 RSP: ffff88033dbb3bd8 RFLAGS: 00010292 RAX: dead000000100100 RBX: ffff88013cf37340 RCX: 0000000000002dc2 RDX: dead000000200200 RSI: 0000000000000046 RDI: 0000000000000009 RBP: ffff88033dbb3be8 R8: 0000000000015598 R9: 0000000000000000 R10: 000000000000000f R11: 0000000000000009 R12: 000000000000000a R13: ffff88033d64b9e8 R14: ffff88033e5b9720 R15: ffff88013cf37340 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff88033dbb3bf0] region_add+154 at ffffffff811698da #7 [ffff88033dbb3c40] alloc_huge_page+669 at ffffffff8116a61d #8 [ffff88033dbb3ce0] hugetlb_fault+1083 at ffffffff8116b9bb #9 [ffff88033dbb3d90] handle_mm_fault+917 at ffffffff81153295 #10 [ffff88033dbb3e00] __do_page_fault+326 at ffffffff8104f156 #11 [ffff88033dbb3f20] do_page_fault+62 at ffffffff8153e78e #12 [ffff88033dbb3f50] page_fault+37 at ffffffff8153bb35 RIP: 00000000004027c6 RSP: 00007f7cadef9e80 RFLAGS: 00010297 RAX: 000000005a49238f RBX: 00007ffcb2d19320 RCX: 000000357498e084 RDX: 000000357498e0b0 RSI: 00007f7cadef9e5c RDI: 000000357498e4e0 RBP: 0000000000000008 R8: 000000357498e0a0 R9: 000000357498e100 R10: 00007f7cadefa9d0 R11: 0000000000000206 R12: 0000000000000007 R13: 0000000000000002 R14: 0000000000000003 R15: 00002aaaac000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b The fix are all these below commits: f522c3ac00(mm, hugetlb: change variable name reservations to resv) 9119a41e90(mm, hugetlb: unify region structure handling) 7b24d8616b(mm, hugetlb: fix race in region tracking) 1406ec9ba6(mm, hugetlb: improve, cleanup resv_map parameters) Signed-off-by: Li Wang <liwang@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com>

These case occasionally failed on RHEL platform: numa02 2 TFAIL : ltpapicmd.c:200: Test #2: NUMA hit and othernode increase in node0 is less than expected numa03 3 TFAIL : ltpapicmd.c:200: Test #3: NUMA interleave hit in node0 is less than expected numa08 8 TFAIL : ltpapicmd.c:200: Test #8: NUMA interleave hit in node0 is less than expected From git log (commit e439df0), it says "In RHEL collection of istics take more time", and add sleep 2s in the case. I looked into the detail and found that's reasonable, numastat growing slowly on RHEL system, therefore tests failed with numastate update uncompleted. Despite all that, sleeping 2 sec still working bad during my test. In this patch, reconstruct the whole tests in a new method which detect numa statistics more precisely by command 'numastat -p $pid'. The worth to say, I take use of Cyril's proposal to add a few lines of code to the numa helper so that it can test the share memory just like the command blow does: `numactl --length=1M --file /dev/shm/numa_shm --interleave=all --touch` And, the orignal test7() has been removed, replaced by a new test for share memory allocated on preferred numa node(see: test3). Signed-off-by: Li Wang <liwang@redhat.com>

Only newlib testcases support SAFE macros in cleanup(). When SAFE_UNLINK fails, it creates infinite loop between tst_brk_ and cleanup: #0 tst_res__ at tst_res.c:153 #1 0x0000000000407ba8 in tst_brk__ at tst_res.c:480 #2 0x00000000004081fe in tst_brkm_ at tst_res.c:577 #3 0x000000000040a7c9 in safe_unlink at safe_macros.c:358 #4 0x0000000000404abd in cleanup () at pipeio.c:497 #5 0x0000000000407bc7 in tst_brk__ at tst_res.c:498 #6 0x00000000004081fe in tst_brkm_ at tst_res.c:577 #7 0x000000000040c1d6 in def_handler at tst_sig.c:231 #8 <signal handler called> #9 0x00007f29c2cbd1f7 in __GI_raise at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #10 0x00007f29c2cbe8e8 in __GI_abort () at abort.c:90 #11 0x00000000004081af in tst_brkm_ at tst_res.c:581 #12 0x000000000040a7c9 in safe_unlink at safe_macros.c:358 #13 0x0000000000404abd in cleanup () at pipeio.c:497 #14 0x0000000000407bc7 in tst_brk__ at tst_res.c:498 #15 0x00000000004081fe in tst_brkm_ at tst_res.c:577 #16 0x000000000040c1d6 in def_handler at tst_sig.c:231 #17 <signal handler called> #18 0x00007f29c2cbd1f7 in __GI_raise at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #19 0x00007f29c2cbe8e8 in __GI_abort () at abort.c:90 #20 0x00000000004081af in tst_brkm_ at tst_res.c:581 #21 0x000000000040a7c9 in safe_unlink at safe_macros.c:358 #22 0x0000000000404abd in cleanup () at pipeio.c:497 ... Signed-off-by: Jan Stancek <jstancek@redhat.com>

There haven't been any major changes to this test in years, so presumably something in recent glibc changed, that exposed this problem. I confirmed with glibc-2.28, that this test can hang quite reliably on 2 CPU KVM guest. It reproduces easier with smaller number of loops for child_mapper() and overall test runtime reduced (-p 20 -t 0.02). The problem is that childs' signal handler and main function both call exit(), which can deadlock on __exit_funcs_lock: #0 0x00007f0619d72f8c in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007f0619ca2f4b in __run_exit_handlers () from /lib64/libc.so.6 #2 0x00007f0619ca3160 in exit () from /lib64/libc.so.6 #3 0x00000000004039d8 in clean_mapper (sig=<optimized out>) at mmapstress10.c:898 #4 <signal handler called> #5 0x00007f0619ca2fbd in __run_exit_handlers () from /lib64/libc.so.6 #6 0x00007f0619ca3160 in exit () from /lib64/libc.so.6 #7 0x0000000000403e7f in child_mapper (file=file@entry=0x40f530 "mmapstress10.out", procno=<optimized out>, nprocs=nprocs@entry=20) at mmapstress10.c:676 #8 0x0000000000403833 in main (argc=<optimized out>, argv=<optimized out>) at mmapstress10.c:458 Switch all signal handlers to _exit(). Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Cyril Hrubis <chrubis@suse.cz>

Test started failing with recent glibc (glibc-2.34.9000-38.fc36), which detects that buffer in pread is potentially too small: tst_test.c:1431: TINFO: Timeout per run is 0h 05m 00s *** buffer overflow detected ***: terminated tst_test.c:1484: TBROK: Test killed by SIGIOT/SIGABRT! (gdb) bt #0 __pthread_kill_implementation at pthread_kill.c:44 linux-test-project#1 0x00007ffff7e46f73 in __pthread_kill_internal at pthread_kill.c:78 linux-test-project#2 0x00007ffff7df6a36 in __GI_raise at ../sysdeps/posix/raise.c:26 linux-test-project#3 0x00007ffff7de082f in __GI_abort () at abort.c:79 linux-test-project#4 0x00007ffff7e3b01e in __libc_message at ../sysdeps/posix/libc_fatal.c:155 linux-test-project#5 0x00007ffff7ed945a in __GI___fortify_fail at fortify_fail.c:26 linux-test-project#6 0x00007ffff7ed7dc6 in __GI___chk_fail () at chk_fail.c:28 linux-test-project#7 0x00007ffff7ed8214 in __pread_chk at pread_chk.c:26 linux-test-project#8 0x0000000000404d1a in pread at /usr/include/bits/unistd.h:74 linux-test-project#9 verify_pread (n=<optimized out>) at pread02.c:44 linux-test-project#10 0x000000000040dc19 in run_tests () at tst_test.c:1246 linux-test-project#11 testrun () at tst_test.c:1331 linux-test-project#12 fork_testrun () at tst_test.c:1462 linux-test-project#13 0x000000000040e9a1 in tst_run_tcases linux-test-project#14 0x0000000000404bde in main Extend it to number of bytes we are trying to read from fd. Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Petr Vorel <pvorel@suse.cz>

Test started failing with recent glibc (glibc-2.34.9000-38.fc36), which detects that buffer in pread is potentially too small: tst_test.c:1431: TINFO: Timeout per run is 0h 05m 00s *** buffer overflow detected ***: terminated tst_test.c:1484: TBROK: Test killed by SIGIOT/SIGABRT! (gdb) bt #0 __pthread_kill_implementation at pthread_kill.c:44 #1 0x00007ffff7e46f73 in __pthread_kill_internal at pthread_kill.c:78 #2 0x00007ffff7df6a36 in __GI_raise at ../sysdeps/posix/raise.c:26 #3 0x00007ffff7de082f in __GI_abort () at abort.c:79 #4 0x00007ffff7e3b01e in __libc_message at ../sysdeps/posix/libc_fatal.c:155 #5 0x00007ffff7ed945a in __GI___fortify_fail at fortify_fail.c:26 #6 0x00007ffff7ed7dc6 in __GI___chk_fail () at chk_fail.c:28 #7 0x00007ffff7ed8214 in __pread_chk at pread_chk.c:26 #8 0x0000000000404d1a in pread at /usr/include/bits/unistd.h:74 #9 verify_pread (n=<optimized out>) at pread02.c:44 #10 0x000000000040dc19 in run_tests () at tst_test.c:1246 #11 testrun () at tst_test.c:1331 #12 fork_testrun () at tst_test.c:1462 #13 0x000000000040e9a1 in tst_run_tcases #14 0x0000000000404bde in main Extend it to number of bytes we are trying to read from fd. Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Cyril Hrubis <chrubis@suse.cz>

The maximum field width of a string conversion does not include the null byte. So we can overflow the buffer by one byte. This can be triggered in ioctl_loop01 with -fsanitize=address even if the file contents are far less than the buffer size: tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s tst_device.c:93: TINFO: Found free device 1 '/dev/loop1' ioctl_loop01.c:85: TPASS: /sys/block/loop1/loop/partscan = 0 ioctl_loop01.c:86: TPASS: /sys/block/loop1/loop/autoclear = 0 ================================================================= ==293==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xf5c03420 at pc 0xf7952bf8 bp 0xff9cf9f8 sp 0xff9cf5d0 WRITE of size 1025 at 0xf5c03420 thread T0 #0 0xf7952bf7 (/lib/libasan.so.8+0x89bf7) (BuildId: f8d5331e88e5c1b8a8a55eda0a8e20503ea0d2b9) #1 0xf7953879 in __isoc99_vfscanf (/lib/libasan.so.8+0x8a879) (BuildId: f8d5331e88e5c1b8a8a55eda0a8e20503ea0d2b9) #2 0x8071f85 in safe_file_scanf /home/rich/qa/ltp/lib/safe_file_ops.c:139 #3 0x80552ea in tst_assert_str /home/rich/qa/ltp/lib/tst_assert.c:60 #4 0x804f17a in verify_ioctl_loop /home/rich/qa/ltp/testcases/kernel/syscalls/ioctl/ioctl_loop01.c:87 #5 0x8061599 in run_tests /home/rich/qa/ltp/lib/tst_test.c:1380 #6 0x8061599 in testrun /home/rich/qa/ltp/lib/tst_test.c:1463 #7 0x8061599 in fork_testrun /home/rich/qa/ltp/lib/tst_test.c:1592 #8 0x806877a in tst_run_tcases /home/rich/qa/ltp/lib/tst_test.c:1686 #9 0x804e01b in main ../../../../include/tst_test.h:394 #10 0xf7188294 in __libc_start_call_main (/lib/libc.so.6+0x23294) (BuildId: 87c7a50c8792985dd164f5af2d45b8e91d9f4391) #11 0xf7188357 in __libc_start_main@@GLIBC_2.34 (/lib/libc.so.6+0x23357) (BuildId: 87c7a50c8792985dd164f5af2d45b8e91d9f4391) #12 0x804e617 in _start ../sysdeps/i386/start.S:111 Address 0xf5c03420 is located in stack of thread T0 at offset 1056 in frame #0 0x805525f in tst_assert_str /home/rich/qa/ltp/lib/tst_assert.c:57 This frame has 1 object(s): [32, 1056) 'sys_val' (line 58) <== Memory access at offset 1056 overflows this variable Fixes: f4919b1 ("lib: Add TST_ASSERT_FILE_INT and TST_ASSERT_FILE_STR") Reviewed-by: Cyril Hrubis <chrubis@suse.cz> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>

On Intel sapphire rapids server, BIOS could allocate one memory block for CXL node when the server boot up, and this node "MemUsed" is 0 when CXL is not used like as follow: " cat /sys/devices/system/node/node2/meminfo Node 2 MemTotal: 4194304 kB Node 2 MemFree: 4194304 kB Node 2 MemUsed: 0 kB ... " And it caused get_mempolicy01/02 and set_mempolicy01/02/03/04 cases to fail like as follow sample: " tag=get_mempolicy01 stime=1683272855 cmdline="get_mempolicy01" contacts="" analysis=exit <<<test_output>>> incrementing stop tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s tst_numa.c:200: TINFO: Found 3 NUMA memory nodes tst_numa.c:165: TWARN: Failed to parse '/sys/devices/system/node/node2/meminfo' get_mempolicy01.c:188: TINFO: test #1: policy: MPOL_DEFAULT, no target get_mempolicy01.c:191: TPASS: policy: MPOL_DEFAULT, no target passed get_mempolicy01.c:188: TINFO: test #2: policy: MPOL_BIND get_mempolicy01.c:191: TPASS: policy: MPOL_BIND passed get_mempolicy01.c:188: TINFO: test #3: policy: MPOL_INTERLEAVE get_mempolicy01.c:191: TPASS: policy: MPOL_INTERLEAVE passed get_mempolicy01.c:188: TINFO: test #4: policy: MPOL_PREFERRED, no target get_mempolicy01.c:191: TPASS: policy: MPOL_PREFERRED, no target passed get_mempolicy01.c:188: TINFO: test #5: policy: MPOL_PREFERRED get_mempolicy01.c:191: TPASS: policy: MPOL_PREFERRED passed get_mempolicy01.c:188: TINFO: test #6: policy: MPOL_DEFAULT, flags: MPOL_F_ADDR, no target get_mempolicy01.c:191: TPASS: policy: MPOL_DEFAULT, flags: MPOL_F_ADDR, no target passed get_mempolicy01.c:188: TINFO: test #7: policy: MPOL_BIND, flags: MPOL_F_ADDR get_mempolicy01.c:191: TPASS: policy: MPOL_BIND, flags: MPOL_F_ADDR passed get_mempolicy01.c:188: TINFO: test #8: policy: MPOL_INTERLEAVE, flags: MPOL_F_ADDR get_mempolicy01.c:191: TPASS: policy: MPOL_INTERLEAVE, flags: MPOL_F_ADDR passed get_mempolicy01.c:188: TINFO: test #9: policy: MPOL_PREFERRED, flags: MPOL_F_ADDR, no target get_mempolicy01.c:191: TPASS: policy: MPOL_PREFERRED, flags: MPOL_F_ADDR, no target passed get_mempolicy01.c:188: TINFO: test #10: policy: MPOL_PREFERRED, flags: MPOL_F_ADDR get_mempolicy01.c:191: TPASS: policy: MPOL_PREFERRED, flags: MPOL_F_ADDR passed Summary: passed 10 failed 0 broken 0 skipped 0 warnings 1 ... -------- ------ ---------- get_mempolicy01 FAIL 4 " So fixed the fake failure when CXL node memory is not being used. Signed-off-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>

63e8c1e introduced a regression on 32 bit compilation when compiled with -fstack-protector-strong, because struct timespec is probably too small for timespec64. PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig CFLAGS="-m32 -fstack-protector-strong" LDFLAGS="-m32 -fstack-protector-strong" ./configure ... # gdb ./abort01 (gdb) set follow-fork-mode child (gdb) run Starting program: testcases/kernel/syscalls/abort/abort01 Missing separate debuginfos, use: zypper install glibc-32bit-debuginfo-2.31-150300.52.2.x86_64 tst_test.c:1690: TINFO: LTP version: 20230929-7-gff6cdc67f tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s [Attaching after process 3357 fork to child process 3360] [New inferior 2 (process 3360)] [Detaching after fork from parent process 3357] [Inferior 1 (process 3357) detached] *** stack smashing detected ***: terminated Thread 2.1 "abort01" received signal SIGABRT, Aborted. [Switching to process 3360] 0xf7fd2559 in __kernel_vsyscall () (gdb) bt #0 0xf7fd2559 in __kernel_vsyscall () linux-test-project#1 0xf7e08aa2 in raise () from /lib/libc.so.6 linux-test-project#2 0xf7e09efd in abort () from /lib/libc.so.6 linux-test-project#3 0xf7e4d91b in __libc_message () from /lib/libc.so.6 linux-test-project#4 0xf7eeb2cc in __fortify_fail () from /lib/libc.so.6 linux-test-project#5 0xf7eeb299 in __stack_chk_fail () from /lib/libc.so.6 linux-test-project#6 0x0805c501 in syscall_supported_by_kernel (sysnr=403) at tst_clocks.c:27 linux-test-project#7 0x0805c80d in tst_clock_gettime (clk_id=1, ts=0x807cfb0 <tst_start_time>) at tst_clocks.c:66 linux-test-project#8 0x080531df in heartbeat () at tst_test.c:1374 linux-test-project#9 0x08053ba2 in testrun () at tst_test.c:1458 linux-test-project#10 fork_testrun () at tst_test.c:1608 linux-test-project#11 0x08055afa in tst_run_tcases (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at tst_test.c:1704 linux-test-project#12 0x0804b3f0 in main (argc=1, argv=0xffffc414) at ../../../../include/tst_test.h:401 (gdb) Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Fixes: 63e8c1e ("tst_clocks: Fix unaddressable byte warning") Reported-by: Petr Cervinka <pcervinka@suse.com> Suggested-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Petr Vorel <pvorel@suse.cz>

63e8c1e introduced a regression on 32 bit compilation when compiled with -fstack-protector-strong, because struct timespec is probably too small for timespec64. PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig CFLAGS="-m32 -fstack-protector-strong" LDFLAGS="-m32 -fstack-protector-strong" ./configure ... # gdb ./abort01 (gdb) set follow-fork-mode child (gdb) run Starting program: testcases/kernel/syscalls/abort/abort01 Missing separate debuginfos, use: zypper install glibc-32bit-debuginfo-2.31-150300.52.2.x86_64 tst_test.c:1690: TINFO: LTP version: 20230929-7-gff6cdc67f tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s [Attaching after process 3357 fork to child process 3360] [New inferior 2 (process 3360)] [Detaching after fork from parent process 3357] [Inferior 1 (process 3357) detached] *** stack smashing detected ***: terminated Thread 2.1 "abort01" received signal SIGABRT, Aborted. [Switching to process 3360] 0xf7fd2559 in __kernel_vsyscall () (gdb) bt #0 0xf7fd2559 in __kernel_vsyscall () linux-test-project#1 0xf7e08aa2 in raise () from /lib/libc.so.6 linux-test-project#2 0xf7e09efd in abort () from /lib/libc.so.6 linux-test-project#3 0xf7e4d91b in __libc_message () from /lib/libc.so.6 linux-test-project#4 0xf7eeb2cc in __fortify_fail () from /lib/libc.so.6 linux-test-project#5 0xf7eeb299 in __stack_chk_fail () from /lib/libc.so.6 linux-test-project#6 0x0805c501 in syscall_supported_by_kernel (sysnr=403) at tst_clocks.c:27 linux-test-project#7 0x0805c80d in tst_clock_gettime (clk_id=1, ts=0x807cfb0 <tst_start_time>) at tst_clocks.c:66 linux-test-project#8 0x080531df in heartbeat () at tst_test.c:1374 linux-test-project#9 0x08053ba2 in testrun () at tst_test.c:1458 linux-test-project#10 fork_testrun () at tst_test.c:1608 linux-test-project#11 0x08055afa in tst_run_tcases (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at tst_test.c:1704 linux-test-project#12 0x0804b3f0 in main (argc=1, argv=0xffffc414) at ../../../../include/tst_test.h:401 (gdb) Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Fixes: 63e8c1e ("tst_clocks: Fix unaddressable byte warning") Link: https://lore.kernel.org/ltp/20231012091546.607702-1-pvorel@suse.cz/ Reported-by: Petr Cervinka <pcervinka@suse.com> Suggested-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Petr Vorel <pvorel@suse.cz>

63e8c1e introduced a regression on 32 bit compilation when compiled with -fstack-protector-strong, because struct timespec is probably too small for timespec64. PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig CFLAGS="-m32 -fstack-protector-strong" LDFLAGS="-m32 -fstack-protector-strong" ./configure ... # gdb ./abort01 (gdb) set follow-fork-mode child (gdb) run Starting program: testcases/kernel/syscalls/abort/abort01 Missing separate debuginfos, use: zypper install glibc-32bit-debuginfo-2.31-150300.52.2.x86_64 tst_test.c:1690: TINFO: LTP version: 20230929-7-gff6cdc67f tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s [Attaching after process 3357 fork to child process 3360] [New inferior 2 (process 3360)] [Detaching after fork from parent process 3357] [Inferior 1 (process 3357) detached] *** stack smashing detected ***: terminated Thread 2.1 "abort01" received signal SIGABRT, Aborted. [Switching to process 3360] 0xf7fd2559 in __kernel_vsyscall () (gdb) bt #0 0xf7fd2559 in __kernel_vsyscall () #1 0xf7e08aa2 in raise () from /lib/libc.so.6 #2 0xf7e09efd in abort () from /lib/libc.so.6 #3 0xf7e4d91b in __libc_message () from /lib/libc.so.6 #4 0xf7eeb2cc in __fortify_fail () from /lib/libc.so.6 #5 0xf7eeb299 in __stack_chk_fail () from /lib/libc.so.6 #6 0x0805c501 in syscall_supported_by_kernel (sysnr=403) at tst_clocks.c:27 #7 0x0805c80d in tst_clock_gettime (clk_id=1, ts=0x807cfb0 <tst_start_time>) at tst_clocks.c:66 #8 0x080531df in heartbeat () at tst_test.c:1374 #9 0x08053ba2 in testrun () at tst_test.c:1458 #10 fork_testrun () at tst_test.c:1608 #11 0x08055afa in tst_run_tcases (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at tst_test.c:1704 #12 0x0804b3f0 in main (argc=1, argv=0xffffc414) at ../../../../include/tst_test.h:401 (gdb) Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Fixes: 63e8c1e ("tst_clocks: Fix unaddressable byte warning") Reported-by: Petr Cervinka <pcervinka@suse.com> Suggested-by: Cyril Hrubis <chrubis@suse.cz> Reviewed-by: Marius Kittler <mkittler@suse.de> Signed-off-by: Petr Vorel <pvorel@suse.cz>

Recently, we started seeing the following segfault when running ksm01 and ksm02 tests on an s390 KSM guest: """ [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000] [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800 [ 119.302826] Fault in primary space mode while using user ASCE. [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400 [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux) [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010 [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000 [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000 [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20 [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a 000003ff915b14a2: e31032000036 pfd 1,512(%r3) #000003ff915b14a8: e31022000036 pfd 1,512(%r2) >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2) 000003ff915b14b4: a784ffef brc 8,000003ff915b1492 000003ff915b14b8: b2220020 ipm %r2 000003ff915b14bc: eb220022000d sllg %r2,%r2,34 000003ff915b14c2: eb22003e000a srag %r2,%r2,62 [ 119.302867] Last Breaking-Event-Address: [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000] """ This segfault is triggered by the memcmp() call in verify(): """ memcmp(memory[start], s, (end - start) * (end2 - start2) """ In the default case, this call checks if the memory area starting in memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this assumes that the memory areas in memory[] are contiguous. This is wrong, since create_ksm_child() allocates 128 individual areas of 1MB each. As, in this particular case, memory[0] happens to be the last 1MB area in the VMA created by the kernel, we hit a segault at the first byte beyond memory[0]. Now, the question is how this has worked for so long and why it may still work on arm64 and x86 (even on s390 it ocassionaly works). For the s390 case, the reason is upstream kernel commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries"). Before this commit, the kernel would always map a library right after the memory[0] area in the process address space. This causes memcmp() to return non-zero when reading the first byte beyond memory[0], which in turn causes the nested loop in verify() to execute. The nested loop is correct (ie. it doesn't assume the memory areas in memory[] are contiguous) so the test doesn't fail. The mentioned upstream commit causes the first byte beyond memory[0] not to be mapped most of the time on s390, which may result in a segfault. Now, as it turns out on arm64 and x86 the kernel still maps a library right after memory[0] which causes the test to suceed as explained above (this can be easily verified by printing the return value for memcmp()). This commit fixes verify() to do a byte-by-byte check on each individual memory area. This also simplifies verify() a lot, which is what we want to avoid this kind of issue in the future. Signed-off-by: Luiz Capitulino <luizcap@redhat.com> Reviewed-by: Li Wang <liwang@redhat.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Acked-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Cyril Hrubis <chrubis@suse.cz>

Harald Weppner added 2 commits March 3, 2014 14:10

Merge remote branch 'upstream/master'

2cf39b8

metan-ucw closed this Mar 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add block device option in runltplite.sh #8

Add block device option in runltplite.sh #8

Uh oh!

HarryWeppner commented Mar 4, 2014

Uh oh!

metan-ucw commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add block device option in runltplite.sh #8

Add block device option in runltplite.sh #8

Uh oh!

Conversation

HarryWeppner commented Mar 4, 2014

Uh oh!

metan-ucw commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants