Bug #67565
closed/tmp/ccxRxwnL.s: Fatal error: can't close fs/file.o: Bad file descriptor in kernel_untar_build.sh
0%
Description
/a/yuriw-2024-08-02_15:42:13-powercycle-squid-release-distro-default-smithi/7833420
2024-08-03T06:55:27.453 INFO:tasks.workunit.client.0.smithi052.stderr:/tmp/ccxRxwnL.s: Assembler messages: 2024-08-03T06:55:27.453 INFO:tasks.workunit.client.0.smithi052.stderr:/tmp/ccxRxwnL.s: Fatal error: can't close fs/file.o: Bad file descriptor 2024-08-03T06:55:27.453 INFO:tasks.workunit.client.0.smithi052.stderr:make[3]: *** [scripts/Makefile.build:243: fs/file.o] Error 1 2024-08-03T06:55:27.453 INFO:tasks.workunit.client.0.smithi052.stderr:make[3]: *** Waiting for unfinished jobs.... ... 2024-08-03T07:07:19.346 DEBUG:teuthology.orchestra.run:got remote process result: 2 2024-08-03T07:07:19.347 INFO:tasks.workunit.client.0.smithi052.stderr:make: *** [Makefile:234: __sub-make] Error 2
Looks like an actual build failure building the kernel.
Updated by Brad Hubbard over 1 year ago · Edited
I reproduced this and when I try to change into the test directory I see the following.
$ cd /home/ubuntu/cephtest/mnt.0/client.0/tmp -bash: cd: /home/ubuntu/cephtest/mnt.0/client.0/tmp: Transport endpoint is not connected $ mount|grep cephtest nsfs on /run/netns/ceph-ns--home-ubuntu-cephtest-mnt.0 type nsfs (rw) nsfs on /run/netns/ceph-ns--home-ubuntu-cephtest-mnt.0 type nsfs (rw) ceph-fuse on /home/ubuntu/cephtest/mnt.0 type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
In the logs I see this and I noticed the same stack trace in one of my other
reproducers, but not all. Chicken or egg?
2024-08-15T06:39:41.465+0000 7fa575ffb640 -1 *** Caught signal (Segmentation fault) ** in thread 7fa575ffb640 thread_name:ceph-fuse ceph version 19.1.0-1260-g26c3fb8e (26c3fb8e197dcf7a49a54d1f4c8a7362ee35a8ea) squid (rc) 1: /lib64/libc.so.6(+0x3e6f0) [0x7fa59383e6f0] 2: (Client::ll_write(Fh*, long, long, char const*)+0xd0) [0x55ede16eac40] 3: ceph-fuse(+0x9167d) [0x55ede164b67d] 4: /lib64/libfuse.so.2(+0x13f13) [0x7fa594b26f13] 5: /lib64/libfuse.so.2(+0x1f9ac) [0x7fa594b329ac] 6: /lib64/libfuse.so.2(+0x109ad) [0x7fa594b239ad] 7: /lib64/libc.so.6(+0x89c02) [0x7fa593889c02] 8: /lib64/libc.so.6(+0x10ec40) [0x7fa59390ec40] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Looking at the coredump now.
(gdb) bt
#0 0x00007fa59388b94c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fa59383e646 in raise () from /lib64/libc.so.6
#2 0x000055ede1724d9a in reraise_fatal (signum=11) at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/global/signal_handler.cc:88
#3 handle_oneshot_fatal_signal (signum=11) at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/global/signal_handler.cc:367
#4 <signal handler called>
#5 std::_Rb_tree<Fh*, Fh*, std::_Identity<Fh*>, std::less<Fh*>, std::allocator<Fh*> >::_S_left (__x=<optimized out>) at /usr/include/c++/11/bits/stl_function.h:447
#6 std::_Rb_tree<Fh*, Fh*, std::_Identity<Fh*>, std::less<Fh*>, std::allocator<Fh*> >::_M_lower_bound (__k=@0x7fa575ff8e38: 0x7fa51c267c10, __y=0x7fa4c8230ee0, __x=0x6be8e0, this=0x55ede205cb98)
at /usr/include/c++/11/bits/stl_tree.h:1922
#7 std::_Rb_tree<Fh*, Fh*, std::_Identity<Fh*>, std::less<Fh*>, std::allocator<Fh*> >::find (__k=@0x7fa575ff8e38: 0x7fa51c267c10, this=0x55ede205cb98) at /usr/include/c++/11/bits/stl_tree.h:2536
#8 std::set<Fh*, std::less<Fh*>, std::allocator<Fh*> >::count (__x=@0x7fa575ff8e38: 0x7fa51c267c10, this=0x55ede205cb98) at /usr/include/c++/11/bits/stl_set.h:749
#9 Client::_ll_fh_exists (f=0x7fa51c267c10, this=0x55ede205bea0) at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/client/Client.h:1054
#10 Client::ll_write (this=0x55ede205bea0, fh=0x7fa51c267c10, off=2624, len=40, data=0x7fa55a55b310 "") at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/client/Client.cc:15933
#11 0x000055ede164b67d in fuse_ll_write (req=0x7fa4f0b5f110, ino=<optimized out>, buf=0x7fa55a55b310 "", size=40, off=2624, fi=0x7fa575ff8f10) at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/client/fuse_ll.cc:906
#12 0x00007fa594b26f13 in do_write.lto_priv () from /lib64/libfuse.so.2
#13 0x00007fa594b329ac in fuse_ll_process_buf () from /lib64/libfuse.so.2
#14 0x00007fa594b239ad in fuse_do_work () from /lib64/libfuse.so.2
#15 0x00007fa593889c02 in start_thread () from /lib64/libc.so.6
#16 0x00007fa59390ec40 in clone3 () from /lib64/libc.so.6
(gdb) f 10
#10 Client::ll_write (this=0x55ede205bea0, fh=0x7fa51c267c10, off=2624, len=40, data=0x7fa55a55b310 "") at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/client/Client.cc:15933
15933 if (fh == NULL || !_ll_fh_exists(fh)) {
(gdb) down
#9 Client::_ll_fh_exists (f=0x7fa51c267c10, this=0x55ede205bea0) at /usr/src/debug/ceph-19.1.0-1260.g26c3fb8e.el9.x86_64/src/client/Client.h:1054
1054 return ll_unclosed_fh_set.count(f);
(gdb) whatis ll_unclosed_fh_set
type = std::set<Fh*>
(gdb) p ll_unclosed_fh_set
$5 = std::set with 45 elements = {
[0] = 0x7fa4a03f5590,
[1] = 0x7fa4a04062b0,
[2] = 0x7fa4ac3976c0,
[3] = 0x7fa4ac3a5a50,
[4] = 0x7fa4b44a3ec0,
[5] = 0x7fa4b44d31b0,
[6] = 0x7fa4b44eac70,
[7] = 0x7fa4b44eafd0,
[8] = 0x7fa4b44eb4a0,
[9] = 0x7fa4b44eb7e0,
[10] = 0x7fa4b44ed450,
[11] = 0x7fa4b44ee9e0,
[12] = 0x7fa4b44f1d00,
[13] = 0x7fa4b44f2550,
[14] = 0x7fa4bc4bf070,
[15] = 0x7fa4bc4c5800,
[16] = 0x7fa4c87accc0,
[17] = 0x7fa4c87f4090,
[18] = 0x7fa4d494a1f0,
[19] = 0x7fa4e810ff20,
[20] = 0x7fa4e814f8c0,
[21] = 0x7fa4ec1bc7a0,
[22] = 0x7fa4f0102450,
[23] = 0x7fa4f01f7600,
[24] = 0x7fa4f0ba37a0,
[25] = 0x7fa4fc1bb4a0,
[26] = 0x7fa50c252820,
[27] = 0x7fa50c252980,
[28] = 0x7fa5100aa540,
[29] = 0x7fa5100fc510,
[30] = 0x7fa51c1d1250,
[31] = 0x7fa51c24da30,
[32] = 0x7fa51c267350,
[33] = 0x7fa51c267c10,
[34] = 0x7fa51c272850,
[35] = 0x7fa5401e5ae0,
[36] = 0x7fa540c53f90,
[37] = 0x7fa5415b3cc0,
[38] = 0x7fa55811fff0,
[39] = 0x7fa559f44350,
[40] = 0x7fa55aab1c00,
[41] = 0x7fa55b0174c0,
[42] = 0x7fa55b088c20,
[43] = 0x7fa55b1f38c0,
[44] = 0x7fa55bb91250
(gdb) p f
$6 = (Fh *) 0x7fa51c267c10
So count() should have returned 1 but obviously encountered corruption but it's
not immediately clear why?
Updated by Brad Hubbard over 1 year ago · Edited
- Project changed from Ceph to CephFS
- Category deleted (
qa)
Moving this to FS for triage there.
Could this be related to https://tracker.ceph.com/issues/66771 ?
Updated by Brad Hubbard over 1 year ago
- Related to Bug #67567: stdout:Probably out of disk space in /qa/workunits/suites/ffsb.sh added
Updated by Venky Shankar over 1 year ago
Brad Hubbard wrote in #note-2:
I reproduced this and when I try to change into the test directory I see the following.
[...]
In the logs I see this and I noticed the same stack trace in one of my other
reproducers, but not all. Chicken or egg?[...]
Looking at the coredump now.
[...]
So count() should have returned 1 but obviously encountered corruption but it's
not immediately clear why?
Likely because call to _ll_fh_exists should happen under client_lock.
Updated by Venky Shankar over 1 year ago
- Category set to Correctness/Safety
- Status changed from In Progress to Fix Under Review
- Target version set to v20.0.0
- Backport set to reef,squid
- Pull request ID set to 59300
Updated by Venky Shankar over 1 year ago
Brad Hubbard wrote in #note-2:
I reproduced this and when I try to change into the test directory I see the following.
Thanks for the reproducing and the crash backtrace - it helped immensely.
I checked the failed teuthology job and unfortunately there isn't a coredump captured o_O
Updated by Venky Shankar over 1 year ago
- Related to Bug #66030: dbench.sh fails with Bad file descriptor (fs:cephadm:multivolume) added
Updated by Venky Shankar over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67775: squid: /tmp/ccxRxwnL.s: Fatal error: can't close fs/file.o: Bad file descriptor in kernel_untar_build.sh added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67776: reef: /tmp/ccxRxwnL.s: Fatal error: can't close fs/file.o: Bad file descriptor in kernel_untar_build.sh added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Brad Hubbard about 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Patrick Donnelly 9 months ago
- Merge Commit set to 704d4f67a0d63a551e28aa7f6c4b729554357954
- Fixed In set to v19.3.0-4564-g704d4f67a0d6
Updated by Upkeep Bot 9 months ago
- Fixed In changed from v19.3.0-4564-g704d4f67a0d6 to v19.3.0-4564-g704d4f67a0d
- Upkeep Timestamp set to 2025-07-08T18:29:42+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-4564-g704d4f67a0d to v19.3.0-4564-g704d4f67a0
- Upkeep Timestamp changed from 2025-07-08T18:29:42+00:00 to 2025-07-14T17:10:23+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2140
- Upkeep Timestamp changed from 2025-07-14T17:10:23+00:00 to 2025-11-01T01:27:44+00:00