Skip to content

qa/tasks/cephfs/mount.py: defer deleting the netnses and bridge#35944

Merged
batrick merged 5 commits intoceph:masterfrom
lxbsz:client_delay
Jul 31, 2020
Merged

qa/tasks/cephfs/mount.py: defer deleting the netnses and bridge#35944
batrick merged 5 commits intoceph:masterfrom
lxbsz:client_delay

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Jul 6, 2020

Fixes: https://tracker.ceph.com/issues/46282
Signed-off-by: Xiubo Li xiubli@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@batrick batrick added cephfs Ceph File System needs-review labels Jul 6, 2020
@lxbsz lxbsz force-pushed the client_delay branch 6 times, most recently from 3a2d705 to 6793049 Compare July 8, 2020 13:01
@lxbsz
Copy link
Member Author

lxbsz commented Jul 9, 2020

jenkins retest this please

args = ["sudo", "bash", "-c",
"iptables -A FORWARD -o {0} -i ceph-brx -j ACCEPT".format(gw)]
args = ['sudo', 'iptables', '-A', 'FORWARD', '-o', '{0}'.format(gw), '-i', 'ceph-brx', '-j', 'ACCEPT']
self.client_remote.run(args=args, timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here fixing this, why not do:

self.run_shell_payload(f"""
sudo iptables ...
sudo iptables ...
""", omit_sudo=False)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I didn't notice this helper, will fix it.
Thanks.

@lxbsz lxbsz changed the title qa/tasks/cephfs/mount.py: delete veth pair instead of single peer int… qa/tasks/cephfs/mount.py: defer deleting the netnses and bridge Jul 9, 2020
@lxbsz
Copy link
Member Author

lxbsz commented Jul 10, 2020

jenkins retest this please

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/mount.py:279:0: F541 f-string is missing placeholders
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/qa/.tox/flake8/bin/flake8 --select=F,E9 --exclude=venv,.tox (exited with code 1)

make check failure ^

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.


# This will cleanup the stale netnses, which are from the
# last failed test cases.
def cleanup_stale_netnses_and_bridge(remote):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a static method of CephFSMount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@lxbsz lxbsz force-pushed the client_delay branch 2 times, most recently from 5f075a3 to 0af0bf7 Compare July 13, 2020 01:29
@lxbsz
Copy link
Member Author

lxbsz commented Jul 13, 2020

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/mount.py:279:0: F541 f-string is missing placeholders
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/qa/.tox/flake8/bin/flake8 --select=F,E9 --exclude=venv,.tox (exited with code 1)

make check failure ^

Fixed it. Thx.

@lxbsz lxbsz requested a review from batrick July 13, 2020 01:31
@lxbsz
Copy link
Member Author

lxbsz commented Jul 13, 2020

jenkins test dashboard backend

Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Jul 16, 2020

Without setting omit_sudo to False, these commands will have no effect.

Done. thanks.

@batrick
Copy link
Member

batrick commented Jul 16, 2020

jenkins retest this please

@lxbsz lxbsz force-pushed the client_delay branch 2 times, most recently from f1e2b98 to 5c3db71 Compare July 17, 2020 01:18
@lxbsz
Copy link
Member Author

lxbsz commented Jul 17, 2020

jenkins test dashboard backend

@batrick
Copy link
Member

batrick commented Jul 28, 2020

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this error in QA:

2020-07-28T04:11:10.187 INFO:teuthology.orchestra.run.smithi094:> (cd /home/ubuntu/cephtest/mnt.0 && exec sudo bash -c '
2020-07-28T04:11:10.188 INFO:teuthology.orchestra.run.smithi094:>                 set -e
2020-07-28T04:11:10.188 INFO:teuthology.orchestra.run.smithi094:>                 sudo ip link add name ceph-brx type bridge
2020-07-28T04:11:10.189 INFO:teuthology.orchestra.run.smithi094:>                 sudo ip addr flush dev ceph-brx
2020-07-28T04:11:10.189 INFO:teuthology.orchestra.run.smithi094:>                 sudo ip link set ceph-brx up
2020-07-28T04:11:10.190 INFO:teuthology.orchestra.run.smithi094:>                 sudo ip addr add 192.168.255.254/16 brd 192.168.255.255 dev ceph-brx
2020-07-28T04:11:10.191 INFO:teuthology.orchestra.run.smithi094:>             ')
2020-07-28T04:11:10.232 INFO:teuthology.orchestra.run.smithi094.stderr:bash: line 0: cd: /home/ubuntu/cephtest/mnt.0: No such file or directory

From: /ceph/teuthology-archive/pdonnell-2020-07-28_03:46:25-fs-wip-pdonnell-testing-20200728.022107-distro-basic-smithi/5262950/teuthology.log

Xiubo, please run through teuthology when you've updated your PR so we can get this ready to merge ASAP.

sudo ip addr flush dev ceph-brx
sudo ip link set ceph-brx up
sudo ip addr add {ip}/{mask} brd {brd} dev ceph-brx
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""", timeout=(5*60), omit_sudo=False)
""", timeout=(5*60), omit_sudo=False, cwd='/')

sudo iptables -A FORWARD -o {gw} -i ceph-brx -j ACCEPT
sudo iptables -A FORWARD -i {gw} -o ceph-brx -j ACCEPT
sudo iptables -t nat -A POSTROUTING -s {ip}/{mask} -o {gw} -j MASQUERADE
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""", timeout=(5*60), omit_sudo=False)
""", timeout=(5*60), omit_sudo=False, cwd='/')

set -e
sudo ip netns add {self.netns_name}
sudo ip netns set {self.netns_name} {nsid}
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""", timeout=(5*60), omit_sudo=False)
""", timeout=(5*60), omit_sudo=False, cwd='/')

sudo ip netns exec {self.netns_name} ip link set veth0 up
sudo ip netns exec {self.netns_name} ip link set lo up
sudo ip netns exec {self.netns_name} ip route add default via {brxip}
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

set -e
sudo ip link set brx.{nsid} up
sudo ip link set dev brx.{nsid} master ceph-brx
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

sudo ip link set brx.{self.nsid} down
sudo ip link delete dev brx.{self.nsid}
sudo ip netns delete {self.netns_name}
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

set -e
sudo ip link set ceph-brx down
sudo ip link delete ceph-brx
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

sudo iptables -D FORWARD -o {gw} -i ceph-brx -j ACCEPT
sudo iptables -D FORWARD -i {gw} -o ceph-brx -j ACCEPT
sudo iptables -t nat -D POSTROUTING -s {ip}/{mask} -o {gw} -j MASQUERADE
""", timeout=(5*60), omit_sudo=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@batrick
Copy link
Member

batrick commented Jul 28, 2020

Xiubo, please run through teuthology when you've updated your PR so we can get this ready to merge ASAP.

Here's what I used ot test:

teuthology-suite -m smithi -p 29 --suite fs --ceph wip-pdonnell-testing-20200728.022107 --filter client-recovery

lxbsz added 3 commits July 29, 2020 08:40
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Once we have run the test cases and the ceph-brx bridge is setup,
it will save the config in "/etc/sysconfig/network-scripts/ifcfg-ceph-brx"
or somewhere else. It will be kept after the ceph-brx bridge removed.
So next time once the ceph-brx bridge is created or added, it will
read the config from it, then when we config it again we will get
error like:

    "RTNETLINK answers: File exists"

Here we need to flush it before config it.

Fixes: https://tracker.ceph.com/issues/45817
Signed-off-by: Xiubo Li <xiubli@redhat.com>
If the previous test cases failed, the netnses and bridge will be
left. Here will remove them when new test cases begin.

Fixes: https://tracker.ceph.com/issues/45806
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Jul 29, 2020

Xiubo, please run through teuthology when you've updated your PR so we can get this ready to merge ASAP.

Here's what I used ot test:

teuthology-suite -m smithi -p 29 --suite fs --ceph wip-pdonnell-testing-20200728.022107 --filter client-recovery

Sure, fixed them all and I am now running the test, the branch is wip-lxbsz-testing-20200729-0855.

Thanks.

@lxbsz
Copy link
Member Author

lxbsz commented Jul 29, 2020

jenkins test dashboard backend

@lxbsz
Copy link
Member Author

lxbsz commented Jul 30, 2020

Updated it with a small fix, for some test cases they will call mount_a.kill() first, which will suspend the netns by bring the network interface down, and a while later they will call the kill_cleanup() to do the unmount, if we reuse the netns later in the next test case, the netns will keep suspended, so we need to resume it.

The netnses maybe created/deleted many times in the whole test cases,
we can defer cleaning them untile the last mountpoint is unmounted
or when the test is exiting.

Fixes: https://tracker.ceph.com/issues/46282
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Jul 30, 2020

Xiubo, please run through teuthology when you've updated your PR so we can get this ready to merge ASAP.

Here's what I used ot test:

teuthology-suite -m smithi -p 29 --suite fs --ceph wip-pdonnell-testing-20200728.022107 --filter client-recovery

Test done and passed, please see: https://pulpito.ceph.com/xiubli-2020-07-30_04:53:06-fs-wip-lxbsz-testing-20200730-0901-distro-basic-smithi/

@batrick
Copy link
Member

batrick commented Jul 31, 2020

@batrick batrick merged commit ca7be68 into ceph:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants