Skip to content

test: increase retry duration when calculating manifest ref. count#43493

Merged
yuriw merged 1 commit intoceph:masterfrom
myoungwon:wip-52872
Dec 2, 2021
Merged

test: increase retry duration when calculating manifest ref. count#43493
yuriw merged 1 commit intoceph:masterfrom
myoungwon:wip-52872

Conversation

@myoungwon
Copy link
Member

@myoungwon myoungwon commented Oct 12, 2021

In situation where the object is degraded and delayed,
retry time can expire before the object is recovered
---it takes almost 6 minutes to be recovered according to
the log.

fixes: https://tracker.ceph.com/issues/52872,
https://tracker.ceph.com/issues/53219

Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myoungwon thanks for looking into this, could you please add the Fixes: https://tracker.ceph.com/issues/52872 line in the commit message

In situation where the object is degraded and delayed,
retry time can expire before the object is recovered
---it takes almost 6 minutes to be recovered according to
the log.

Fixes: https://tracker.ceph.com/issues/52872

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@myoungwon
Copy link
Member Author

@neha-ojha Done.

@myoungwon
Copy link
Member Author

@myoungwon
Copy link
Member Author

myoungwon commented Nov 12, 2021

@athanatos @neha-ojha Please take a look.

}
break;
}
ASSERT_TRUE(tries < 30);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually fail? Why does tries need to be outside of the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this line for the purpose of debugging. The two issues above generated error logs like ASSERT_TRUE(src_refcount == expected_refcount);. So, It is hard to catch the cause because I need to read all logs generated from the cluster. This line might help us to recognize the error caused by the retry.

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with going ahead with this version before we find a more reliable fix as discussed in https://tracker.ceph.com/issues/53219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants