Skip to content

qa: extend rank 1 lockup for test_quiesce_authpin_wait#56923

Merged
batrick merged 1 commit intoceph:mainfrom
batrick:i65508
Apr 30, 2024
Merged

qa: extend rank 1 lockup for test_quiesce_authpin_wait#56923
batrick merged 1 commit intoceph:mainfrom
batrick:i65508

Conversation

@batrick
Copy link
Member

@batrick batrick commented Apr 16, 2024

In teuthology, the lockup may not be long enough because clients are much faster there than in a vstart cluster where this test was designed.

Fixes: https://tracker.ceph.com/issues/65508

Checklist

  • Tracker (select at least one)
    • References tracker ticket
  • Component impact
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@batrick batrick added cephfs Ceph File System needs-review labels Apr 16, 2024
@batrick batrick requested a review from leonid-s-usov April 16, 2024 13:28
@github-actions github-actions bot added the tests label Apr 16, 2024
Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's see how it runs with the patch here

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, this patch takes the approach too far, and it causes multiple failures. Let's be honest, locking up MDS was only good as long as it worked, but now we have a good reason to rethink the strategy behind this test

@batrick batrick force-pushed the i65508 branch 2 times, most recently from 85cc353 to 0a13d39 Compare April 16, 2024 21:00
In teuthology, the lockup may not be long enough because clients are much
faster there than in a vstart cluster where this test was designed.

Fixes: https://tracker.ceph.com/issues/65508
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Apr 16, 2024

@batrick
Copy link
Member Author

batrick commented Apr 16, 2024

@leonid-s-usov have another look please.

@batrick
Copy link
Member Author

batrick commented Apr 17, 2024

This PR is under test in https://tracker.ceph.com/issues/65530.

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not happy with the lockup approach. I'll wait for the test results, but I'd appreciate a different approach to running this test.

We already went as far as adding a dedicated lockup admin command. I'd rather change that to something more precise that would not involve holding the mds lock for a long time. Maybe a command to pause iner-rank messaging, or specific (arbitrary) peer request ops or their acks

@batrick
Copy link
Member Author

batrick commented Apr 17, 2024

I'm still not happy with the lockup approach. I'll wait for the test results, but I'd appreciate a different approach to running this test.

We already went as far as adding a dedicated lockup admin command. I'd rather change that to something more precise that would not involve holding the mds lock for a long time. Maybe a command to pause iner-rank messaging, or specific (arbitrary) peer request ops or their acks

Sitting on the mds_lock can be useful for a lot of tests I would think. This was a trivial matter of silencing the right warnings (and there were not a lot).

@batrick
Copy link
Member Author

batrick commented Apr 18, 2024

This PR is under test in https://tracker.ceph.com/issues/65562.

@batrick
Copy link
Member Author

batrick commented Apr 20, 2024

This PR is under test in https://tracker.ceph.com/issues/65596.

@batrick
Copy link
Member Author

batrick commented Apr 25, 2024

This PR is under test in https://tracker.ceph.com/issues/65661.

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this as long as it's stable 👍🏻

batrick added a commit to batrick/ceph that referenced this pull request Apr 29, 2024
* refs/pull/56923/head:
	qa: extend rank 1 lockup for test_quiesce_authpin_wait
@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

jenkins test make check arm64

@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

This PR is under test in https://tracker.ceph.com/issues/65694.

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants