Bug #66059: OSD: PG stat is not synchronized between osds after deep-scrub - RADOS - Ceph

Actions

Copy link

Bug #66059

open

OSD: PG stat is not synchronized between osds after deep-scrub

Added by Md Mahamudur Rahaman Sajib almost 2 years ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Ronen Friedman

Category:

Target version:

Ceph - v20.0.0

% Done:

Source:

Development

Backport:

squid, reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.2.7

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Description

The way to reproduce it is
1. create a large omap object (more than 1gb)
2. do deep-scrub of the PG which contains that object.
3. large omap object count will 1(check `ceph health detail`)
4. Shutdown the primary osd which has control over that PG.
5. Some secondary osd will take control over, check the health detail again, warning will be gone.

A better way to identify this problem is logging in OSD::collect_pg_stat. When we created a larget omap object, it will be shown in pg stat of primary osd that large omap object count is 0. After deep-scrub it will be 1. If we shutdown the primary osd, then if we check the log of new primary osd we will get 0 in the pg stat of that pg.

root cause of the problem when we do some changes in the object then primary osd submit some transaction through `void ReplicatedBackend::submit_transaction` function which eventually call `Message * ReplicatedBackend::generate_subop`there it ships transaction including pg stat and log entries. But for deep-scrub even if it changes the pg stat of primary osd but it does not publish that pg stat to other osds.

Solution could be create some mechanism such that we can publish the pg stat to the non-primary osd after deep-scrub.
I am able to reproduce it in quincy v17.2.7 but other versions will have the problem as well.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Ilya Dryomov almost 2 years ago

Project changed from rbd to RADOS

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Assignee set to Ronen Friedman

Actions

Copy link

Updated by Md Mahamudur Rahaman Sajib almost 2 years ago

Assignee changed from Ronen Friedman to Md Mahamudur Rahaman Sajib
Pull request ID set to 57582

Actions

Copy link

Updated by Md Mahamudur Rahaman Sajib almost 2 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Note from scrub: letting Ronen know.

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Scrub note: bump up!

Actions

Copy link

Updated by Laura Flores almost 2 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

Updated by Igor Fedotov over 1 year ago

Backport set to squid, reef, quincy

Actions

Copy link

Updated by Md Mahamudur Rahaman Sajib over 1 year ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#10

Updated by Md Mahamudur Rahaman Sajib over 1 year ago

Copied to Backport #68439: quincy: OSD: PG stat is not synchronized between osds after deep-scrub added

Actions

Copy link

#11

Updated by Md Mahamudur Rahaman Sajib over 1 year ago

Copied to Backport #68440: reef: OSD: PG stat is not synchronized between osds after deep-scrub added

Actions

Copy link

#12

Updated by Md Mahamudur Rahaman Sajib over 1 year ago

Copied to Backport #68441: squid: OSD: PG stat is not synchronized between osds after deep-scrub added

Actions

Copy link

#13

Updated by Md Mahamudur Rahaman Sajib over 1 year ago

Tags (freeform) set to backport_processed

Actions

Copy link

#14

Updated by Ronen Friedman over 1 year ago

Status changed from Pending Backport to In Progress
Pull request ID deleted (~~57582~~)

Reverted the status to Open, as PR #57582 creates test failures and will be reverted.

Actions

Copy link

#15

Updated by Ronen Friedman over 1 year ago · Edited

As far as I understand, the root-cause analysis in the description isn't correct.
Scrub-generated fix operations to the 'info' are indeed performed on the
Primary, but are published immediately (at the end of 'scrub_finish(), which - for our
omap case - is a few lines of code below the info update).

The update function is share_pg_info().

The part I am verifying now: seems that
PeeringState::proc_primary_info() would
(1) only update some scrub info data (and not the 'large omap' counter), and
(2) is only triggered when there are 'scrub errors' (num_scrub_errors > 0); and num_scrub_errors is 0 if our only problem is large omaps issues.

.

Actions

Copy link

#16