Project

General

Profile

Actions

Bug #64390

closed

client: ll_preadv_pwritev can leak refs

Added by Dhairya Parmar about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

test case:

TEST_F(TestClient, LlreadvLlwritevDataPoolFull) {
  /* Test perfoming async I/O after filling the fs and make sure it handles
  the read/write gracefully */
  int mypid = getpid();
  char filename[256];

  client->unmount();
  TearDown();
  SetUp();

  sprintf(filename, "test_llreadvllwritevdatapoolfullfile%u", mypid);

  Inode *root, *file;
  root = client->get_root();
  ASSERT_NE(root, (Inode *)NULL);

  Fh *fh;
  struct ceph_statx stx;

  ASSERT_EQ(0, client->ll_createx(root, filename, 0666,
                                  O_RDWR | O_CREAT | O_TRUNC,
                                  &file, &fh, &stx, 0, 0, myperm));

  struct statvfs stbuf;
  int64_t rc;
  rc = client->ll_statfs(root, &stbuf, myperm);
  ASSERT_EQ(rc, 0);
  int64_t fs_available_space = stbuf.f_bfree * stbuf.f_bsize;
  ASSERT_GT(fs_available_space, 0);

  const int64_t BUFSIZE = 1024 * 1024 * 1024;
  int64_t bytes_written = 0, offset = 0;

  auto buf = std::make_unique<char[]>(BUFSIZE);
  memset(buf.get(), 0xCC, BUFSIZE);

  std::unique_ptr<char[]> small_buf = nullptr;

  while(fs_available_space > 0) {
    if (fs_available_space >= BUFSIZE) {
      bytes_written = client->ll_write(fh, offset, BUFSIZE, buf.get());
      ASSERT_EQ(bytes_written, BUFSIZE);
      offset += BUFSIZE;
      fs_available_space -= BUFSIZE;
    } else {
      small_buf = std::make_unique<char[]>(fs_available_space);
      memset(small_buf.get(), 0xDD, fs_available_space);
      bytes_written = client->ll_write(fh, offset, fs_available_space,
                                       small_buf.get());
      ASSERT_EQ(bytes_written, fs_available_space);
      offset += BUFSIZE;
      break;
    }
  }

  std::unique_ptr<C_SaferCond> writefinish = nullptr;
  writefinish.reset(new C_SaferCond("test-nonblocking-writefinish-datapool-full"));

  auto out_buf_0 = std::make_unique<char[]>(BUFSIZE);
  memset(out_buf_0.get(), 0xDD, BUFSIZE);
  auto out_buf_1 = std::make_unique<char[]>(BUFSIZE);
  memset(out_buf_1.get(), 0xFF, BUFSIZE);

  struct iovec iov_out[2] = {
    {out_buf_0.get(), BUFSIZE},
    {out_buf_1.get(), BUFSIZE}
  };

  rc = client->sync_fs();
  ASSERT_EQ(rc, 0);
  sleep(30);
  rc = client->ll_statfs(root, &stbuf, myperm);
  ASSERT_EQ(rc, 0);
  fs_available_space = stbuf.f_bfree * stbuf.f_bsize;
  ASSERT_EQ(fs_available_space, 0);

  rc = client->ll_preadv_pwritev(fh, iov_out, 2, 0, true, writefinish.get(),
                                 nullptr);
  ASSERT_EQ(rc, 0);
  bytes_written = writefinish->wait();
  ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC);

  client->ll_release(fh);
  ASSERT_EQ(0, client->ll_unlink(root, filename, myperm));
}

firstly the assertion fails after the async write call

2024-02-12T19:09:43.795+0530 7f84bac686c0 19 client.4304 C_Write_Finisher::try_complete this 0x5594b702bcc0 onuninlinefinished 1 iofinished 1 iofinished_r 2147483647 fsync_finished 1
2024-02-12T19:09:43.795+0530 7f84bac686c0 19 client.4304  complete with iofinished_r 2147483647
/home/dparmar/CephRepoForRunningTestsLocally/ceph/src/test/client/nonblocking.cc:800: Failure
Expected equality of these values:
  bytes_written
    Which is: 2147483647
  -28

i expected the API to return ENOSPC but it returned be 2GiB i.e. 33% data was written (shouldn't happen since I had filled up all the available space in the first place)

Do we get the ENOSPC error when releasing file handle after this:

2024-02-12T19:09:43.795+0530 7f84bf65c9c0  1 client.4304 _release_fh 0x5594b6f277f0 on inode 0x10000000000.head(faked_ino=0 nref=8 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=0} mode=100666 size=106287857664/110582824960 nlink=1 btime=2024-02-12T18:42:52.646736+0530 mtime=2024-02-12T19:09:43.796040+0530 ctime=2024-02-12T19:09:43.796040+0530 change_attr=100 caps=p(0=p) flushing_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile1269955"] 0x7f84900088e0) caught async_err = (28) No space left on device

and then this, the call is stalled:

2024-02-12T19:09:43.976+0530 7f84977fe6c0 20 client.4304 upkeep thread waiting interval 1.000000000s
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0 10 client.4304 unmounting: trim pass, size was 0+2
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0 20 client.4304 trim_cache size 0 max 16384
2024-02-12T19:09:44.614+0530 7f84b1ffb6c0 10 client.4304 unmounting: trim pass, size still 0+2
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 tick
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 collect_and_send_metrics
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 collect_and_send_global_metrics
2024-02-12T19:09:44.977+0530 7f84977fe6c0 10 client.4304 _put_inode on 0x1.head(faked_ino=0 nref=2 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2024-02-12T18:41:58.976066+0530 mtime=2024-02-12T18:42:52.646736+0530 ctime=2024-02-12T18:42:52.646736+0530 change_attr=1 caps=pAsLsXs(0=pAsLsXs) has_dir_layout 0x7f84900081e0) n = 1
2024-02-12T19:09:44.977+0530 7f84977fe6c0 10 client.4304 remove_cap mds.0 on 0x1.head(faked_ino=0 nref=1 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2024-02-12T18:41:58.976066+0530 mtime=2024-02-12T18:42:52.646736+0530 ctime=2024-02-12T18:42:52.646736+0530 change_attr=1 caps=pAsLsXs(0=pAsLsXs) has_dir_layout 0x7f84900081e0)
2024-02-12T19:09:44.977+0530 7f84977fe6c0 15 client.4304 remove_cap last one, closing snaprealm 0x7f84900080f0
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 put_snap_realm 0x1 0x7f84900080f0 2 -> 1
2024-02-12T19:09:44.977+0530 7f84977fe6c0 10 client.4304 _put_inode deleting 0x1.head(faked_ino=0 nref=1 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2024-02-12T18:41:58.976066+0530 mtime=2024-02-12T18:42:52.646736+0530 ctime=2024-02-12T18:42:52.646736+0530 change_attr=1 caps=- has_dir_layout 0x7f84900081e0)
2024-02-12T19:09:44.977+0530 7f84977fe6c0 10 client.4304 _put_inode on 0x10000000000.head(faked_ino=0 nref=4 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=0} mode=100666 size=106287857664/110582824960 nlink=1 btime=2024-02-12T18:42:52.646736+0530 mtime=2024-02-12T19:09:43.796040+0530 ctime=2024-02-12T19:09:43.796040+0530 change_attr=100 caps=p(0=p) flushing_caps=Fw objectset[0x10000000000 ts 0/0 objects 332 dirty_or_tx 0] 0x7f84900088e0) n = 2
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 trim_cache size 0 max 16384
2024-02-12T19:09:44.977+0530 7f84977fe6c0 20 client.4304 upkeep thread waiting interval 1.000000000s
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0  1 client.4304 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0 10 client.4304 unmounting: trim pass, size was 0+1
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0 20 client.4304 trim_cache size 0 max 16384
2024-02-12T19:09:45.682+0530 7f84b1ffb6c0 10 client.4304 unmounting: trim pass, size still 0+1
2024-02-12T19:09:45.977+0530 7f84977fe6c0 20 client.4304 tick
2024-02-12T19:09:45.977+0530 7f84977fe6c0 20 client.4304 collect_and_send_metrics
2024-02-12T19:09:45.977+0530 7f84977fe6c0 20 client.4304 collect_and_send_global_metrics
2024-02-12T19:09:45.977+0530 7f84977fe6c0 20 client.4304 trim_cache size 0 max 16384
2024-02-12T19:09:45.977+0530 7f84977fe6c0 20 client.4304 upkeep thread waiting interval 1.000000000s
2024-02-12T19:09:46.977+0530 7f84977fe6c0 20 client.4304 tick
Actions #1

Updated by Venky Shankar about 2 years ago

  • Category set to Correctness/Safety
  • Assignee set to Dhairya Parmar
  • Target version set to v19.0.0
Actions #2

Updated by Venky Shankar about 2 years ago

  • Component(FS) Client added
Actions #3

Updated by Dhairya Parmar about 2 years ago

  • Description updated (diff)
Actions #4

Updated by Patrick Donnelly almost 2 years ago

  • Target version changed from v19.0.0 to v20.0.0
Actions #5

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #6

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #7

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #8

Updated by Dhairya Parmar almost 2 years ago

I switched to using `unique_ptr` instead of `new` and `delete`, the test case fails.

From logs:

2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.

tries to unmount and put the inode:

2024-05-28T01:20:11.695+0530 7f10fbc889c0  2 client.4315 unmount
2024-05-28T01:20:11.695+0530 7f10fbc889c0  2 client.4315 unmounting
2024-05-28T01:20:11.695+0530 7f10fbc889c0 20 client.4315 put_inode on 0x1.head(faked_ino=0 nref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2024-05-28T01:19:08.112554+0530 mtime=2024-05-28T01:19:23.954360+0530 ctime=2024-05-28T01:19:23.954360+0530 change_attr=1 caps=pAsLsXs(0=pAsLsXs) COMPLETE has_dir_layout 0x7f10cc0067f0) n = 1

and then stalls:

2024-05-28T01:20:11.710+0530 7f10fbc889c0 10 client.4315 _put_inode on 0x10000000000.head(faked_ino=0 nref=12 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pFxcwb(0=pFxcwb) objectset[0x10000000000 ts 0/0 objects 1001 dirty_or_tx 0] 0x7f10cc006dd0) n = 9
2024-05-28T01:20:11.710+0530 7f10fbc889c0  2 client.4315 cache still has 0+1 items, waiting (for caps to release?)
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315  mds.0 seq now 11
2024-05-28T01:20:11.710+0530 7f10e27fc6c0  5 client.4315 handle_cap_grant on in 0x10000000000 mds.0 seq 6 caps now pFxcwb was pFxcwb
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315 update_inode_file_time 0x10000000000.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pFxcwb(0=pFxcwb) objectset[0x10000000000 ts 0/0 objects 1001 dirty_or_tx 0] 0x7f10cc006dd0) pFxcwb ctime 2024-05-28T01:20:08.007684+0530 mtime 2024-05-28T01:20:08.007684+0530
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315 max_size 8589934592 -> 9604956160
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315   caps unchanged at pFxcwb
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315 handle_cap_grant calling signal_caps_inode
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315 unmounting: trim pass, size was 0+1
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:11.710+0530 7f10e27fc6c0 10 client.4315 unmounting: trim pass, size still 0+1
2024-05-28T01:20:12.538+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:12.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:12.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:12.538+0530 7f10cbfff6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:12.538+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
2024-05-28T01:20:13.539+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:13.539+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:13.539+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:13.539+0530 7f10cbfff6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:13.539+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
2024-05-28T01:20:14.538+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:14.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:14.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:14.538+0530 7f10cbfff6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:14.538+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
2024-05-28T01:20:15.538+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:15.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:15.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:15.538+0530 7f10cbfff6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:15.538+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
2024-05-28T01:20:16.539+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:16.539+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:16.539+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:16.539+0530 7f10cbfff6c0 20 client.4315 trim_cache size 0 max 16384
2024-05-28T01:20:16.539+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
2024-05-28T01:20:16.710+0530 7f10fbc889c0  1 client.4315 dump_cache
2024-05-28T01:20:16.710+0530 7f10fbc889c0  1 client.4315 dump_inode: DISCONNECTED inode 0x10000000000 #0x10000000000 ref 3 0x10000000000.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=5309988864/9604956160 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pFxcwb(0=pFxcwb) objectset[0x10000000000 ts 0/0 objects 1001 dirty_or_tx 0] 0x7f10cc006dd0)
Actions #9

Updated by Dhairya Parmar almost 2 years ago

a clearer view on the "End of buffer" error:

2024-05-28T01:20:11.537+0530 7f10fbc889c0 10 client.4315 get_caps 0x10000000000.head(faked_ino=0 nref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile110658"] 0x7f10cc006dd0) have pAsxLsXsxFsxcrwb need AsFw want Fb revoking -
2024-05-28T01:20:11.537+0530 7f10fbc889c0 20 client.4315 clear_suid_sgid 0x10000000000.head(faked_ino=0 nref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=1,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile110658"] 0x7f10cc006dd0); UserPerm(uid: 0, gid: 0) defer 0
2024-05-28T01:20:11.537+0530 7f10fbc889c0 10 client.4315  snaprealm snaprealm(0x1 nref=2 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[] last_modified=2024-05-28T01:19:08.112554+0530 change_attr=0)
2024-05-28T01:20:11.537+0530 7f10fbc889c0  5 client.4315 get_cap_ref got first FILE_BUFFER ref on 0x10000000000.head(faked_ino=0 nref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=1,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile110658"] 0x7f10cc006dd0)
2024-05-28T01:20:11.537+0530 7f10fbc889c0  5 client.4315 get_cap_ref got first FILE_CACHE ref on 0x10000000000.head(faked_ino=0 nref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=1,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:19:23.954360+0530 mtime=2024-05-28T01:20:08.007684+0530 ctime=2024-05-28T01:20:08.007684+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile110658"] 0x7f10cc006dd0)
2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 tick
2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_metrics
2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 collect_and_send_global_metrics
2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 trim_cache size 1 max 16384
2024-05-28T01:20:11.538+0530 7f10cbfff6c0 20 client.4315 upkeep thread waiting interval 1.000000000s
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T01:20:11.695+0530 7f10fbc889c0  2 client.4315 unmount
2024-05-28T01:20:11.695+0530 7f10fbc889c0  2 client.4315 unmounting
2024-05-28T01:20:11.695+0530 7f10fbc889c0 20 client.4315 put_inode on 0x1.head(faked_ino=0 nref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2024-05-28T01:19:08.112554+0530 mtime=2024-05-28T01:19:23.954360+0530 ctime=2024-05-28T01:19:23.954360+0530 change_attr=1 caps=pAsLsXs(0=pAsLsXs) COMPLETE has_dir_layout 0x7f10cc0067f0) n = 1

so somehow after getting the `Fc`, some buffer overflow occurred

Actions #10

Updated by Dhairya Parmar almost 2 years ago · Edited

  • Description updated (diff)

Okay the buffer issue was coming from `const int64_t MEGA_BUFSIZE = BUFSIZE * 2;`, switched to using a gibibyte instead of two but the issue(successfully writing data even after the data pool being full) and stall still exists:

it wrote two gibibytes of data (it shouldn't have, the data pool is full!):

2024-05-28T01:46:58.825+0530 7f5e11a566c0 19 client.4333 C_Write_Finisher::try_complete this 0x55d868594c60 onuninlinefinished 1 iofinished 1 iofinished_r 2147483647 fsync_finished 1
2024-05-28T01:46:58.825+0530 7f5e11a566c0 19 client.4333  complete with iofinished_r 2147483647
/home/dparmar/CephRepoForRunningTestsLocally/ceph/src/test/client/nonblocking.cc:792: Failure
Expected equality of these values:
  bytes_written
    Which is: 2147483647
  -28

although immediately it does accept the pool is full but this happens after the assertion:

2024-05-28T01:46:58.848+0530 7f5dee7fc6c0 10 client.4333 _put_inode on 0x10000000000.head(faked_ino=0 nref=7 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:46:11.142016+0530 mtime=2024-05-28T01:46:58.825562+0530 ctime=2024-05-28T01:46:58.825562+0530 change_attr=6 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile130654"] 0x7f5df0006dd0) n = 2
2024-05-28T01:46:58.848+0530 7f5dee7fc6c0 20 client.4333 trim_cache size 1 max 16384
2024-05-28T01:46:58.848+0530 7f5dee7fc6c0 20 client.4333 upkeep thread waiting interval 1.000000000s
2024-05-28T01:46:58.886+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-05-28T01:46:58.886+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-05-28T01:46:58.886+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-05-28T01:46:58.962+0530 7f5e1647a9c0  2 client.4333 unmount
2024-05-28T01:46:58.962+0530 7f5e1647a9c0  2 client.4333 unmounting

and only at the time of releasing file handle it say ENOSPC:

2024-05-28T01:46:58.963+0530 7f5e1647a9c0  1 client.4333 _release_fh 0x55d8686de1f0 on inode 0x10000000000.head(faked_ino=0 nref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=0} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:46:11.142016+0530 mtime=2024-05-28T01:46:58.825562+0530 ctime=2024-05-28T01:46:58.825562+0530 change_attr=6 caps=p(0=p) flushing_caps=Fw objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile130654"] 0x7f5df0006dd0) caught async_err = (28) No space left on device

and then there is just the stall (`size still 0+1` confirms something didn't get de-referenced):

2024-05-28T01:46:59.849+0530 7f5dee7fc6c0 10 client.4333 _put_inode on 0x10000000000.head(faked_ino=0 nref=10 ll_ref=0 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=0} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-05-28T01:46:11.142016+0530 mtime=2024-05-28T01:46:58.825562+0530 ctime=2024-05-28T01:46:58.825562+0530 change_attr=6 caps=p(0=p) flushing_caps=Fw objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] 0x7f5df0006dd0) n = 8
2024-05-28T01:46:59.849+0530 7f5dee7fc6c0 20 client.4333 trim_cache size 0 max 16384
2024-05-28T01:46:59.849+0530 7f5dee7fc6c0 20 client.4333 upkeep thread waiting interval 1.000000000s
2024-05-28T01:47:00.848+0530 7f5dee7fc6c0 20 client.4333 tick
2024-05-28T01:47:00.848+0530 7f5dee7fc6c0 20 client.4333 collect_and_send_metrics
2024-05-28T01:47:00.848+0530 7f5dee7fc6c0 20 client.4333 collect_and_send_global_metrics
2024-05-28T01:47:00.848+0530 7f5dee7fc6c0 20 client.4333 trim_cache size 0 max 16384
2024-05-28T01:47:00.848+0530 7f5dee7fc6c0 20 client.4333 upkeep thread waiting interval 1.000000000s
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0 10 client.4333 unmounting: trim pass, size was 0+1
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0 20 client.4333 trim_cache size 0 max 16384
2024-05-28T01:47:00.869+0530 7f5e0cfed6c0 10 client.4333 unmounting: trim pass, size still 0+1
2024-05-28T01:47:01.849+0530 7f5dee7fc6c0 20 client.4333 tick
2024-05-28T01:47:01.849+0530 7f5dee7fc6c0 20 client.4333 collect_and_send_metrics
2024-05-28T01:47:01.849+0530 7f5dee7fc6c0 20 client.4333 collect_and_send_global_metrics
2024-05-28T01:47:01.849+0530 7f5dee7fc6c0 20 client.4333 trim_cache size 0 max 16384
2024-05-28T01:47:01.849+0530 7f5dee7fc6c0 20 client.4333 upkeep thread waiting interval 1.000000000s
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0  1 client.4333 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0 10 client.4333 unmounting: trim pass, size was 0+1
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0 20 client.4333 trim_cache size 0 max 16384
2024-05-28T01:47:01.986+0530 7f5e0cfed6c0 10 client.4333 unmounting: trim pass, size still 0+1

Actions #11

Updated by Dhairya Parmar almost 2 years ago · Edited

(Anyone looking at this, only the https://tracker.ceph.com/issues/64390#note-10 is important, please ignore all previous comments)

There are two issues here that i'm looking at:
1) Why did the write was successful when the data pool was full?
2) What didn't get de-referenced, why and where?

Targeting (2) first

Actions #12

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #13

Updated by Dhairya Parmar almost 2 years ago

the stall happens after `ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC);` fails, so it looks like the async io passed(well that is also an issue since the data pool had no space for it but anyway focusing now on the stall), wondering if even this is an issue at all since `ASSERT_EQ` is a test case function and irrelevant to code paths but how does this hamper the execution 0_o

Actions #14

Updated by Venky Shankar almost 2 years ago

  • Subject changed from client: async I/O stalls if the data pool gets full to client ll_preadv_pwritev can leak refs

This issue has gone through many loops. As it stands now, mounting the client isn't an issue - even with a full data pool, it's possible to mount the client.

Now, I debugging this with Dhairya and the issues is not that the async context callback isn't getting triggered - the assert ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC); is getting triggered (although the assert fails), but its the umount that's not finishing due to references still being held. Please check ll_preadv_pwritev for ref leaks.

Actions #15

Updated by Venky Shankar almost 2 years ago

  • Subject changed from client ll_preadv_pwritev can leak refs to client: ll_preadv_pwritev can leak refs
Actions #16

Updated by Dhairya Parmar almost 2 years ago

Venky Shankar wrote in #note-14:

This issue has gone through many loops. As it stands now, mounting the client isn't an issue - even with a full data pool, it's possible to mount the client.

Now, I debugging this with Dhairya and the issues is not that the async context callback isn't getting triggered - the assert ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC); is getting triggered (although the assert fails), but its the umount that's not finishing due to references still being held. Please check ll_preadv_pwritev for ref leaks.

ref leaks with a condition that the data pool is full

Actions #17

Updated by Venky Shankar almost 2 years ago

Dhairya Parmar wrote in #note-16:

Venky Shankar wrote in #note-14:

This issue has gone through many loops. As it stands now, mounting the client isn't an issue - even with a full data pool, it's possible to mount the client.

Now, I debugging this with Dhairya and the issues is not that the async context callback isn't getting triggered - the assert ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC); is getting triggered (although the assert fails), but its the umount that's not finishing due to references still being held. Please check ll_preadv_pwritev for ref leaks.

ref leaks with a condition that the data pool is full

ISTM that its a bug in the error handling path then.

Actions #18

Updated by Xiubo Li almost 2 years ago

If the Fb is issued the following write will be cached locally and couldn't trigger the osdmap FULL immediately:

bytes_written = client->ll_write(fh, offset, BUFSIZE, buf.get());

Maybe you should try to flush the caches and wait for a while. Else your following write could succeed too ?

For this test case I think you can refer to `qa/tasks/cephfs/test_full.py`.

Actions #19

Updated by Xiubo Li almost 2 years ago

Dhairya Parmar wrote:

test case:
[...]

firstly the assertion fails after the async write call
[...]

i expected the API to return ENOSPC but it returned be 2GiB i.e. 33% data was written (shouldn't happen since I had filled up all the available space in the first place)

Do we get the ENOSPC error when releasing file handle after this:
[...]

This is because when closing the opened file it will flush the cache and then hit the -ENOSPC error, and if my memories are correct when you try to flush the file just before the release then it will fail in the flush step and the later release should succeed.

You can have a try.

and then this, the call is stalled:

[...]

Actions #20

Updated by Dhairya Parmar almost 2 years ago

Xiubo Li wrote in #note-18:

If the Fb is issued the following write will be cached locally and couldn't trigger the osdmap FULL immediately:

If the clients caches the writes locally, there must be a set time(let's say some threshold val) or any such trigger when it flushes it, no? Another question is - Shouldn't the write call must do a fsync? Not always as I understand that such frequent flushes will take a performance hit and there will be ton of unwanted I/O but we need to take a tradeoff here because IMO having huge cached writes is as bad as having ton of fsync calls. What I feel should've happened here is since the writes are so huge in my test case, the clients should've periodically flushed the buffers out so that it doesn't take a toll because flushing huge buffers will definitely take a toll on performance and we will see huge (intermittent) I/O spikes.

Actions #21

Updated by Dhairya Parmar almost 2 years ago

Dhairya Parmar wrote in #note-20:

Xiubo Li wrote in #note-18:

If the Fb is issued the following write will be cached locally and couldn't trigger the osdmap FULL immediately:

If the clients caches the writes locally, there must be a set time(let's say some threshold val) or any such trigger when it flushes it, no? Another question is - Shouldn't the write call must do a fsync? Not always as I understand that such frequent flushes will take a performance hit and there will be ton of unwanted I/O but we need to take a tradeoff here because IMO having huge cached writes is as bad as having ton of fsync calls. What I feel should've happened here is since the writes are so huge in my test case, the clients should've periodically flushed the buffers out so that it doesn't take a toll because flushing huge buffers will definitely take a toll on performance and we will see huge (intermittent) I/O spikes.

Follow up question - If such huge buffers are being flushed out, obviously it is going to take a while, so meanwhile are other I/Os blocked? or queued? or something else?

Actions #22

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #23

Updated by Dhairya Parmar almost 2 years ago · Edited

From my understanding the issue when unmounting the client isn't because of any cap ref leak, I think it is because of the MDS since the ceph status shows me

health: HEALTH_WARN
       1 MDSs report slow metadata IOs

to be sure of what i'm saying makes sense, I ran the client test case again and it got stuck:

dparmar:build$ ./bin/ceph_test_client
2024-06-03T16:32:56.747+0530 7fd5750dda00 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-03T16:32:56.752+0530 7fd5750dda00 -1 WARNING: all dangerous and experimental features are enabled.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestClient
[ RUN      ] TestClient.LlreadvLlwritevDataPoolFull
2024-06-03T16:32:56.753+0530 7fd5674006c0 10 client.0 ms_handle_connect on v2:127.0.0.1:40756/0
2024-06-03T16:32:56.754+0530 7fd5750dda00 10 client.4333 Subscribing to map 'mdsmap'
2024-06-03T16:32:56.754+0530 7fd556a006c0 20 client.4333 tick
2024-06-03T16:32:56.754+0530 7fd556a006c0 20 client.4333 collect_and_send_metrics
2024-06-03T16:32:56.754+0530 7fd556a006c0 20 client.4333 collect_and_send_global_metrics
2024-06-03T16:32:56.754+0530 7fd556a006c0  5 client.4333 collect_and_send_global_metrics MDS rank 0 is not ready yet -- not sending metric
2024-06-03T16:32:56.754+0530 7fd556a006c0 20 client.4333 trim_cache size 0 max 16384
2024-06-03T16:32:56.754+0530 7fd556a006c0 20 client.4333 upkeep thread waiting interval 1.000000000s
2024-06-03T16:32:56.754+0530 7fd5674006c0  1 client.4333 handle_mds_map epoch 76
2024-06-03T16:32:56.754+0530 7fd5750dda00 20 client.4333 populate_metadata read hostname '<host_name_removed_intentioanlly>'
2024-06-03T16:32:56.754+0530 7fd5750dda00 10 client.4333 did not get mds through better means, so chose random mds 0
2024-06-03T16:32:56.754+0530 7fd5750dda00 20 client.4333 mds is 0
2024-06-03T16:32:56.754+0530 7fd5750dda00 10 client.4333 _open_mds_session mds.0
2024-06-03T16:32:56.754+0530 7fd5750dda00 10 client.4333 waiting for session to mds.0 to open
2024-06-03T16:32:56.754+0530 7fd5674006c0 10 client.4333 ms_handle_connect on v2:127.0.0.1:6830/2631976685
2024-06-03T16:32:57.754+0530 7fd556a006c0 20 client.4333 tick
2024-06-03T16:32:57.754+0530 7fd556a006c0 10 client.4333 renew_caps()
2024-06-03T16:32:57.754+0530 7fd556a006c0 15 client.4333 renew_caps requesting from mds.0
2024-06-03T16:32:57.754+0530 7fd556a006c0 10 client.4333 renew_caps mds.0
2024-06-03T16:32:57.754+0530 7fd556a006c0 20 client.4333 collect_and_send_metrics
2024-06-03T16:32:57.754+0530 7fd556a006c0 20 client.4333 collect_and_send_global_metrics
2024-06-03T16:32:57.754+0530 7fd556a006c0  5 client.4333 collect_and_send_global_metrics: no session with rank=0 -- not sending metric
2024-06-03T16:32:57.754+0530 7fd556a006c0 20 client.4333 trim_cache size 0 max 16384
2024-06-03T16:32:57.754+0530 7fd556a006c0 20 client.4333 upkeep thread waiting interval 1.000000000s

to be sure that client is really suffering due to MDS, i tried mounting:

dparmar:build$ mkdir mnt
dparmar:build$ ./bin/ceph-fuse mnt/
2024-06-03T16:33:38.247+0530 7fc1944ae500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-03T16:33:38.253+0530 7fc1944ae500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-03T16:33:38.255+0530 7fc1944ae500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-03T16:33:38.257+0530 7fc1944ae500 -1 init, newargv = 0x5649a16d0b40 newargc=15
ceph-fuse[566649]: starting ceph client

and had to terminate this abruptly by CTRL+C

so it seems when the MDS is reporting slow metadata IO, client suffers.

EDIT: slow metadata IO might be a symptom, not a cause, better fitting analysis coming up

Actions #24

Updated by Dhairya Parmar almost 2 years ago

  • Description updated (diff)
Actions #25

Updated by Dhairya Parmar almost 2 years ago · Edited

So, this is getting funny:

MDS is reporting slow metadata IO:

    health: HEALTH_WARN
            1 MDSs report slow metadata IOs

the test case didn't get stuck at `unmounting: trim pass, size still 0+1`:

2024-06-04T14:29:01.845+0530 7f0d990c5a00  2 client.4285 _close_mds_session mds.0 seq 10
2024-06-04T14:29:01.845+0530 7f0d990c5a00  2 client.4285 waiting for 1 mds session(s) to close (timeout: 30s)
2024-06-04T14:29:01.845+0530 7f0d8a0006c0 10 client.4285 unmounting: trim pass, size was 0+0
2024-06-04T14:29:01.845+0530 7f0d8a0006c0 20 client.4285 trim_cache size 0 max 16384
2024-06-04T14:29:01.845+0530 7f0d8a0006c0 10 client.4285 unmounting: trim pass, size still 0+0
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 10 client.4285 handle_client_session client_session(close) from mds.0
2024-06-04T14:29:01.893+0530 7f0d8a0006c0  5 client.4285 _closed_mds_session mds.0 seq 10
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 10 client.4285 remove_session_caps mds.0
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 10 client.4285 kick_requests_closed for mds.0
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 10 client.4285 unmounting: trim pass, size was 0+0
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 20 client.4285 trim_cache size 0 max 16384
2024-06-04T14:29:01.893+0530 7f0d8a0006c0 10 client.4285 unmounting: trim pass, size still 0+0
2024-06-04T14:29:01.893+0530 7f0d990c5a00  2 client.4285 unmounted.
2024-06-04T14:29:01.893+0530 7f0d990c5a00  1 client.4285 shutdown
2024-06-04T14:29:01.893+0530 7f0d990c5a00 20 client.4285 trim_cache size 0 max 16384
[  FAILED  ] TestClient.LlreadvLlwritevDataPoolFull (80395 ms)
[----------] 1 test from TestClient (80395 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (80395 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestClient.LlreadvLlwritevDataPoolFull

and even i can mount a client (wow!):

dparmar:build$ ./bin/ceph-fuse mnt/
2024-06-04T14:33:00.465+0530 7f4c7d26f500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-04T14:33:00.470+0530 7f4c7d26f500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-04T14:33:00.472+0530 7f4c7d26f500 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-04T14:33:00.473+0530 7f4c7d26f500 -1 init, newargv = 0x55c67a9d7440 newargc=15
ceph-fuse[107638]: starting ceph client
ceph-fuse[107638]: starting fuse
dparmar:build$

the same test case, sometimes gets stuck at `unmounting: trim pass, size still 0+1` while sometimes it doesn't (with MDS reporting slow metadata IO)

Logs aren't revealing anything exciting, MDS logs look normal, client doesn't reveal much apart from the handle_full_flag logs but that aren't much helpful. Not sure what/if am i missing something in the logs.

I can see OSD setting ENOSPC, but thats the only relevant log i found, maybe im missing something there but im not well equipped with osd code so not sure what to watch out for

Actions #26

Updated by Dhairya Parmar almost 2 years ago

OSD logs say misconfigured cluster:

osd.0.log:6944990:2024-06-04T16:15:09.415+0530 7fa6ee6006c0 -1 bluestore(/home/dparmar/CephRepoForRunningTests/ceph/build/dev/osd0) ENOSPC from bluestore, misconfigured cluster
osd.0.log:6945709:2024-06-04T16:15:09.445+0530 7fa6efa006c0 -1 bluestore(/home/dparmar/CephRepoForRunningTests/ceph/build/dev/osd0) ENOSPC from bluestore, misconfigured cluster
Actions #27

Updated by Dhairya Parmar almost 2 years ago · Edited

The more I run this single test case, the more randomness I find in the issues. Sometimes the client failed to trim cache while sometimes it succeeded but post adding the fs sync call before doing the async IO i.e. the addition of

  rc = client->sync_fs();
  ASSERT_EQ(rc, 0);
  sleep(30);
  rc = client->ll_statfs(root, &stbuf, myperm);
  ASSERT_EQ(rc, 0);
  fs_available_space = stbuf.f_bfree * stbuf.f_bsize;
  ASSERT_EQ(fs_available_space, 0);

we see this

2024-06-05T14:03:06.540+0530 7faa47667a00 10 client.4297 wait_sync_caps want 6 (last is 6, 0 total flushing)
2024-06-05T14:03:06.540+0530 7faa47667a00 15 client.4297 _sync_fs waiting on data to flush
2024-06-05T14:03:06.540+0530 7faa47667a00 15 client.4297 _sync_fs flush finished
2024-06-05T14:03:06.540+0530 7faa33e006c0 10 client.4297  mds.0 seq now 9
2024-06-05T14:03:06.540+0530 7faa33e006c0  5 client.4297 handle_cap_grant on in 0x10000000000 mds.0 seq 5 caps now pAsxLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
2024-06-05T14:03:06.540+0530 7faa33e006c0 10 client.4297 update_inode_file_time 0x10000000000.head(faked_ino=0 nref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100666 size=5309988864/8589934592 nlink=1 btime=2024-06-05T14:02:23.391154+0530 mtime=2024-06-05T14:03:04.971912+0530 ctime=2024-06-05T14:03:04.971912+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile147100"] 0x7faa10009600) pAsxLsXsxFsxcrwb ctime 2024-06-05T14:03:04.971912+0530 mtime 2024-06-05T14:03:04.971912+0530
2024-06-05T14:03:06.540+0530 7faa33e006c0 10 client.4297 max_size 8589934592 -> 9604956160
2024-06-05T14:03:06.540+0530 7faa33e006c0 10 client.4297   caps unchanged at pAsxLsXsxFsxcrwb
2024-06-05T14:03:06.540+0530 7faa33e006c0 10 client.4297 handle_cap_grant calling signal_caps_inode
2024-06-05T14:03:06.971+0530 7faa316006c0 20 client.4297 tick
2024-06-05T14:03:06.971+0530 7faa316006c0 20 client.4297 collect_and_send_metrics
2024-06-05T14:03:06.971+0530 7faa316006c0 20 client.4297 collect_and_send_global_metrics
2024-06-05T14:03:06.971+0530 7faa316006c0 10 client.4297 _put_inode on 0x10000000000.head(faked_ino=0 nref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100666 size=5309988864/9604956160 nlink=1 btime=2024-06-05T14:02:23.391154+0530 mtime=2024-06-05T14:03:04.971912+0530 ctime=2024-06-05T14:03:04.971912+0530 change_attr=5 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x1.head["test_llreadvllwritevdatapoolfullfile147100"] 0x7faa10009600) n = 1
2024-06-05T14:03:06.971+0530 7faa316006c0 20 client.4297 trim_cache size 1 max 16384
2024-06-05T14:03:06.971+0530 7faa316006c0 20 client.4297 upkeep thread waiting interval 1.000000000s
2024-06-05T14:03:07.971+0530 7faa316006c0 20 client.4297 tick
2024-06-05T14:03:07.971+0530 7faa316006c0 20 client.4297 collect_and_send_metrics
2024-06-05T14:03:07.971+0530 7faa316006c0 20 client.4297 collect_and_send_global_metrics
2024-06-05T14:03:07.971+0530 7faa316006c0 20 client.4297 trim_cache size 1 max 16384
2024-06-05T14:03:07.971+0530 7faa316006c0 20 client.4297 upkeep thread waiting interval 1.000000000s
2024-06-05T14:03:08.971+0530 7faa316006c0 20 client.4297 tick
2024-06-05T14:03:08.971+0530 7faa316006c0 20 client.4297 collect_and_send_metrics
2024-06-05T14:03:08.971+0530 7faa316006c0 20 client.4297 collect_and_send_global_metrics
2024-06-05T14:03:08.971+0530 7faa316006c0 20 client.4297 trim_cache size 1 max 16384
2024-06-05T14:03:08.971+0530 7faa316006c0 20 client.4297 upkeep thread waiting interval 1.000000000s
2024-06-05T14:03:09.972+0530 7faa316006c0 20 client.4297 tick
2024-06-05T14:03:09.972+0530 7faa316006c0 20 client.4297 collect_and_send_metrics
2024-06-05T14:03:09.972+0530 7faa316006c0 20 client.4297 collect_and_send_global_metrics
2024-06-05T14:03:09.972+0530 7faa316006c0 20 client.4297 trim_cache size 1 max 16384
2024-06-05T14:03:09.972+0530 7faa316006c0 20 client.4297 upkeep thread waiting interval 1.000000000s

`Client::trim_cache` revealed that the size of LRU is staying 1 which means there is something in the cache that isn't getting trimmed and the question is why? The reason seems to be revealed by `ceph df`:

--- RAW STORAGE ---
CLASS    SIZE    AVAIL    USED  RAW USED  %RAW USED
hdd    15 GiB  633 MiB  15 GiB    15 GiB      95.92
TOTAL  15 GiB  633 MiB  15 GiB    15 GiB      95.92

--- POOLS ---
POOL           ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr            1    1  449 KiB        2  1.3 MiB   0.43    103 MiB
cephfs.a.meta   2   16   21 KiB       22  144 KiB   0.05    103 MiB
cephfs.a.data   3  128  4.9 GiB    1.25k   15 GiB  97.97    103 MiB

From what it looks like, since there is little to no space left, there is no room for MDS to do flush any metadata and hence those slow metadata I/O warnings too. It seems the MDS is bottle-necked there and hence it makes client ops unstable

Actions #28

Updated by Venky Shankar almost 2 years ago

I suggested @Dhairya Parmar to create a separate tracker for the cluster full issue (since that's a bit of a digression from the intended change to fix this tracker). Please work towards getting the relevant changes up for review/testing.

Actions #29

Updated by Dhairya Parmar almost 2 years ago

No ref leaks found with async i/o.

Actions #30

Updated by Dhairya Parmar almost 2 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF