Skip to content

octopus: osdc: add timeout configs for mons/osds#37530

Merged
yuriw merged 8 commits intoceph:octopusfrom
batrick:i47734-octopus
Oct 29, 2020
Merged

octopus: osdc: add timeout configs for mons/osds#37530
yuriw merged 8 commits intoceph:octopusfrom
batrick:i47734-octopus

Conversation

@batrick
Copy link
Copy Markdown
Member

@batrick batrick commented Oct 3, 2020

@batrick batrick added rbd cephfs Ceph File System DNM labels Oct 3, 2020
@batrick batrick added this to the octopus milestone Oct 3, 2020
@batrick batrick changed the title osdc: add timeout configs for mons/osds octopus: osdc: add timeout configs for mons/osds Oct 3, 2020
@batrick batrick force-pushed the i47734-octopus branch 3 times, most recently from 3c8da8c to 7e1fb65 Compare October 4, 2020 05:13
@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 5, 2020

@batrick batrick force-pushed the i47734-octopus branch 7 times, most recently from af6902f to 13f2292 Compare October 6, 2020 16:46
@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 6, 2020

@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 6, 2020

retest this please

@batrick batrick marked this pull request as draft October 6, 2020 19:51
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 0feabb4)
@batrick batrick marked this pull request as ready for review October 22, 2020 20:36
@batrick batrick removed the DNM label Oct 22, 2020
@batrick batrick marked this pull request as draft October 22, 2020 23:54
@batrick batrick force-pushed the i47734-octopus branch 2 times, most recently from 3e70ca0 to 079286c Compare October 23, 2020 00:32
Have the Objecter track the rados_(mon|osd)_op_timeout configs so that
it can be configured at runtime/startup. This is useful for the
MDS/ceph-fuse so that we can avoid waiting forever for a response from
the Monitors that will never come (statfs on a deleted file system's
pools).

Also: make these configs take a time value rather than double. This is
simpler to deal with in the code and allows time units to be used (e.g.
"5m" for 5 minutes).

Fixes: https://tracker.ceph.com/issues/47734
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit a8a2374)

Conflicts:
	src/client/Client.cc
	src/librados/RadosClient.cc
	src/mds/MDSRank.cc
	src/mgr/MgrStandby.cc
	src/mon/MonClient.h
	src/neorados/RADOSImpl.cc
	src/osd/OSD.cc
	src/osdc/Objecter.cc
	src/osdc/Objecter.h
	src/test/mon/test_mon_workloadgen.cc
	src/tools/cephfs/MDSUtility.cc

        Notes: different Objecter cons arguments. Added conf obs for
        RadosClient.
Otherwise we have unnecessary timeout waits.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit bc25bd7)

Conflicts:
	qa/tasks/cephfs/test_admin.py

        Notes: delete_all_filesystems method moved
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 2432871)

Conflicts:
	qa/tasks/cephfs/fuse_mount.py
The mount.cleanup method will remove the mount point. This `rm -rf` will
always fail (with exit status 0).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8e7a890)

Conflicts:
	qa/tasks/cephfs/fuse_mount.py

        Notes: convert to cleanup call.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f8f607d)

Conflicts:
	qa/tasks/cephfs/mount.py

        Notes: skip as cleanup is abstract.
Now that the osdc Objecter obeys updates to these configs, let's use
them to avoid having them block forever on operations that may never
complete (or should complete in a timely manner).

Fixes: https://tracker.ceph.com/issues/47734
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit d060c9a)
Otherwise, the umount process will fail because the mount still exists
when the mountpoint cleanup (rmdir) is started.

See:

    2020-10-04T22:08:24.448 INFO:teuthology.nuke.actions:Clearing teuthology firewall rules...
    2020-10-04T22:08:24.449 INFO:teuthology.orchestra.run.smithi063:> sudo sh -c 'iptables-save | grep -v teuthology | iptables-restore'
    2020-10-04T22:08:24.464 INFO:teuthology.orchestra.run.smithi189:> sudo sh -c 'iptables-save | grep -v teuthology | iptables-restore'
    2020-10-04T22:08:24.482 INFO:teuthology.nuke.actions:Cleared teuthology firewall rules.
    2020-10-04T22:08:24.483 INFO:teuthology.orchestra.run:Running command with timeout 900
    2020-10-04T22:08:24.483 INFO:teuthology.orchestra.run.smithi063:> (cd /home/ubuntu/cephtest && exec stat --file-system '--printf=%T
    2020-10-04T22:08:24.483 INFO:teuthology.orchestra.run.smithi063:> ' -- /home/ubuntu/cephtest/mnt.0)
    2020-10-04T22:08:34.550 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:08:34.553 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:09:04.592 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:09:04.596 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:09:34.727 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:09:34.730 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:10:04.815 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:10:04.818 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:10:34.876 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:10:34.880 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:11:04.923 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:11:04.926 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:11:34.996 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:11:35.000 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:12:05.064 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:12:05.067 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:12:35.202 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:12:35.205 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:13:05.316 INFO:teuthology.orchestra.run.smithi063:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:13:05.318 INFO:teuthology.orchestra.run.smithi189:> sudo logrotate /etc/logrotate.d/ceph-test.conf
    2020-10-04T22:13:24.520 INFO:teuthology.orchestra.run.smithi063.stderr:stat: cannot read file system information for '/home/ubuntu/cephtest/mnt.0': Connection timed out
    2020-10-04T22:13:24.521 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2020-10-04T22:13:24.522 INFO:tasks.cephfs.fuse_mount:mount point does not exist: /home/ubuntu/cephtest/mnt.0
    2020-10-04T22:13:24.640 INFO:teuthology.orchestra.run:Running command with timeout 300
    2020-10-04T22:13:24.641 INFO:teuthology.orchestra.run.smithi063:> (cd /home/ubuntu/cephtest && exec rm -rf /home/ubuntu/cephtest/mnt.0)
    2020-10-04T22:13:24.688 INFO:teuthology.orchestra.run.smithi063.stderr:rm: cannot remove '/home/ubuntu/cephtest/mnt.0': Is a directory
    2020-10-04T22:13:24.688 DEBUG:teuthology.orchestra.run:got remote process result: 1

From: /ceph/teuthology-archive/pdonnell-2020-10-04_21:51:57-fs-wip-pdonnell-testing-20201004.051319-octopus-distro-basic-smithi/5494771/teuthology.log

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick batrick marked this pull request as ready for review October 23, 2020 17:26
@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 23, 2020

@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 23, 2020

This is ready for wider testing in regular octopus QA.

@batrick
Copy link
Copy Markdown
Member Author

batrick commented Oct 26, 2020

@yuriw this needs tested with #37256.

@yuriw
Copy link
Copy Markdown
Contributor

yuriw commented Oct 26, 2020

@yuriw
Copy link
Copy Markdown
Contributor

yuriw commented Oct 27, 2020

@yuriw
Copy link
Copy Markdown
Contributor

yuriw commented Oct 29, 2020

passed rbd approved by @dillaman

Copy link
Copy Markdown
Contributor

@yuriw yuriw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Patrick Donnelly pdonnell@redhat.com

@yuriw yuriw merged commit 79c1ec4 into ceph:octopus Oct 29, 2020
@batrick batrick deleted the i47734-octopus branch December 11, 2020 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants