Skip to content

cls_lock: check expired lock before unlock#45762

Merged
yuriw merged 1 commit intoceph:masterfrom
NitzanMordhai:wip-nitzan-test-cls-lock-expiered-locks
May 11, 2022
Merged

cls_lock: check expired lock before unlock#45762
yuriw merged 1 commit intoceph:masterfrom
NitzanMordhai:wip-nitzan-test-cls-lock-expiered-locks

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Apr 4, 2022

Check if the lock was expired, if it is, unlock will return -ENOENT and not 0
that will cause the assert to error.

Fixes: https://tracker.ceph.com/issues/38357
Signed-off-by: Nitzan Mordechai nmordec@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@github-actions github-actions bot added the tests label Apr 4, 2022
@neha-ojha neha-ojha self-requested a review April 12, 2022 22:57
@neha-ojha neha-ojha assigned badone and unassigned badone Apr 12, 2022
@neha-ojha neha-ojha requested a review from badone April 12, 2022 22:57
@badone
Copy link
Contributor

badone commented Apr 13, 2022

This looks like a good solution Nitzan, thanks! Just a nit regarding the function name typo.

Check if the lock was expired, if it is, unlock will return -ENOENT and not 0
that will cause the assert to error.

Fixes: https://tracker.ceph.com/issues/38357
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-cls-lock-expiered-locks branch from 5c33e40 to cf9054d Compare April 13, 2022 04:21
@NitzanMordhai
Copy link
Contributor Author

https://shaman.ceph.com/builds/ceph/wip-yuri5-testing-2022-04-28-1007/847220033f596a97c5c107497f89b7e0027da12a/
http://pulpito.front.sepia.ceph.com/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi
http://pulpito.front.sepia.ceph.com/yuriw-2022-05-04_21:19:19-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi

A few rados tests failed

Failures, unrelated:
opened new trackers:
6813955 - Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown - RADOS - Ceph
6813921 - Bug #55607: tests CmpOmap.cmp_vals_u64_empty_default\CmpOmap.cmp_rm_keys_u64_empty failed - RADOS - Ceph

updated trackers:
6813824 - Bug #49591: no active mgr (MGR_DOWN)" in cluster log - RADOS - Ceph
6813940 - Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_rados.TestWatchNotify.test - RADOS - Ceph
6814082- Bug #48440: log [ERR] : scrub mismatch - RADOS - Ceph
6813848 - Bug #55531: octopus: No remote osd logs captured in dead jobs - Ceph - Ceph

Details:

  1. log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.y: unknown operation
  2. tests CmpOmap.cmp_vals_u64_empty_default\CmpOmap.cmp_rm_keys_u64_empty failed
  3. no active mgr (MGR_DOWN) - It seems that the monitor had trouble communicating with the manager: 2022-04-29T17:13:31.437489+0000 mon.a (mon.0) 87 : cluster [ERR] Health check failed: no active mgr (MGR_DOWN)
  4. test_rados.TestWatchNotify.test - known and unfixed issue with API test
  5. scrub mismatch after slow op.
  6. this looks like a Teuthology bug

@yuriw yuriw merged commit 3bc0354 into ceph:master May 11, 2022
k0ste pushed a commit to k0ste/ceph that referenced this pull request Aug 19, 2024
If the lock expired, the stat check shouldn't return -ENOENT,
We will change the lock duration to prevent lock expired before the
stat check.

Fixes: https://tracker.ceph.com/issues/56575
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
(cherry picked from commit d3457c6)

Comment from @idryomov:

There is a delta between the original commit and the backport and
explained it - due to ceph#45762 [1] not getting backported
anywhere (Rejected for octopus and pacific and still in
New for quincy).

[1] ceph#45762
k0ste pushed a commit to k0ste/ceph that referenced this pull request Aug 19, 2024
If the lock expired, the stat check shouldn't return -ENOENT,
We will change the lock duration to prevent lock expired before the
stat check.

Fixes: https://tracker.ceph.com/issues/56575
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
(cherry picked from commit d3457c6)

Comment from @idryomov:

There is a delta between the original commit and the backport due
to ceph#45762 [1] not getting backported to quincy.

[1] ceph#45762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants