mon/AuthMonitor: fix potential repeated global id#55006
mon/AuthMonitor: fix potential repeated global id#55006
Conversation
a0c21df to
6d7f96a
Compare
|
Use the following code to check correctness |
|
jenkins test make check |
When we expand or shrink monitors, there is the possibility of repeatedly allocating global ids. Because the calculation of last_allocated_id depends on mon_num and mon_rank, and expand or shrink monitors changes mon_num and mon_rank. A simple fix is to reset last_allocated_id when the election is complete. Fixs: https://tracker.ceph.com/issues/63891 Signed-off-by: shimin <shimin@kuaishou.com>
6d7f96a to
eccd106
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
does anyone handle this issue? |
|
cephfs client depend on global_id, and repeated global_id would result some inconsistent problem. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This is a bug, need to fix |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
| { | ||
| std::lock_guard l(auth_lock); | ||
| authmon()->_set_mon_num_rank(monmap->size(), rank); | ||
| authmon()->_reset_last_allocated_id(); |
There was a problem hiding this comment.
In my understanding this should lead to:
- temporarily ceasing issuance of new IDs:
uint64_t AuthMonitor::_assign_global_id()
{
ceph_assert(ceph_mutex_is_locked(mon->auth_lock));
if (mon_num < 1 || mon_rank < 0) {
dout(10) << __func__ << " inactive (num_mon " << mon_num
<< " rank " << mon_rank << ")" << dendl;
return 0;
}
if (!last_allocated_id) {
dout(10) << __func__ << " last_allocated_id == 0" << dendl;
return 0;
}
// ...- letting leader to maybe bump
max_global_idup:
int64_t AuthMonitor::assign_global_id(bool should_increase_max)
{
uint64_t id;
{
std::lock_guard l(mon->auth_lock);
id =_assign_global_id();
if (should_increase_max) {
should_increase_max = _should_increase_max_global_id();
}
}
if (mon->is_leader() &&
should_increase_max) {
increase_max_global_id();
}
return id;
}void AuthMonitor::increase_max_global_id()
{
ceph_assert(mon->is_leader());
Incremental inc;
inc.inc_type = GLOBAL_ID;
inc.max_global_id = max_global_id + g_conf()->mon_globalid_prealloc;
dout(10) << "increasing max_global_id to " << inc.max_global_id << dendl;
pending_auth.push_back(inc);
}- maybe letting all monitors to synchronize on new
max_global_id:
void AuthMonitor::update_from_paxos(bool *need_bootstrap)
{
switch (inc.inc_type) {
case GLOBAL_ID:
max_global_id = inc.max_global_id;
break;- finally start accepting clients again:
void AuthMonitor::update_from_paxos(bool *need_bootstrap)
{
max_global_id = inc.max_global_id;
// ...
{
std::lock_guard l(mon->auth_lock);
if (last_allocated_id == 0) {
last_allocated_id = max_global_id;
dout(10) << __func__ << " last_allocated_id initialized to "
<< max_global_id << dendl;
}
}Steps 2 and 3 are rather improbable for the AuthMonitor::_reset_last_allocated_id() case.
I agree the patch should help with the sharding of the ID space if mon_num or mon_rank change. (I will take 2nd look this week).
|
This PR is under test in https://tracker.ceph.com/issues/68089. |
batrick
left a comment
There was a problem hiding this comment.
This failed QA, see for example:
and the leader mon log:
ok, I check the log and reply soon as possible |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
When we expand or shrink monitors, there is the possibility of repeatedly allocating global ids. Because the calculation of last_allocated_id depends on mon_num and mon_rank, and expand or shrink monitors changes mon_num and mon_rank. A simple fix is to reset last_allocated_id when the election is complete, and that make all monitor' last_allocated_id start at max_global_id.
Fixs: https://tracker.ceph.com/issues/63891
Signed-off-by: shimin shimin@kuaishou.com