mon,osd: new mechanism for managing full and nearfull OSDs for luminous#13615
mon,osd: new mechanism for managing full and nearfull OSDs for luminous#13615liewegas merged 16 commits intoceph:masterfrom
Conversation
liewegas
commented
Feb 23, 2017
- per-osd nearfull and full flags
- full_ratio and nearfull_ratio stored in the osdmap
- new message from osd to mon requesting a state change (by the osd)
- mon switches cluster full behavior over from old pg-map scheme to osdmap scheme when require_luminous_osds is set
- new mon commands to adjust the osdmap thresholds
- some cleanup of the osd-side code
|
retest this please |
1 similar comment
|
retest this please |
|
This still has a bunch of failures and they don't all look like noise. |
614cb33 to
be14a35
Compare
|
@gregsfortytwo failures fixed |
523e109 to
fcbb6f6
Compare
|
passed testing, awaiting final review |
Signed-off-by: Sage Weil <sage@redhat.com>
This used to live in PGMap; we're moving it here for luminous (which makes more sense anyway!). Signed-off-by: Sage Weil <sage@redhat.com>
...and make most of these methods private to clarify the public interface Signed-off-by: Sage Weil <sage@redhat.com>
| if ((osdmap.get_state(from) & mask) == m->state) { | ||
| dout(7) << __func__ << " state already " << state << " for osd." << from | ||
| << " " << m->get_orig_source_inst() << dendl; | ||
| _reply_map(op, m->version); |
src/osd/OSD.cc
Outdated
| << " -> " << get_full_state_name(new_state) << dendl; | ||
| if (new_state == FAILSAFE) { | ||
| clog->error() << "failsafe engaged, dropping updates, now " | ||
| << (int)(ratio * 100) << "% full"; |
There was a problem hiding this comment.
My merged change uses (int)roundf(ratio * 100) which is being dropped in 2 places by your change
| modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re) | ||
| flags | ||
| full_ratio 0 | ||
| nearfull_ratio 0 |
There was a problem hiding this comment.
that's just the default osdmap value; the mon sets it to something better during mkfs, or from pgmap during upgrade
| if (!service.need_fullness_update()) | ||
| return; | ||
| unsigned state = 0; | ||
| if (service.is_full()) { |
There was a problem hiding this comment.
Should this also add "|| check_failsafe_fullI()"? is_full() only says cur_state == FULL?
Better yet make this code and need_fullness_update() consistent by changing is_full() to return true if FULL or FAILSAFE and fix need_fullness_update() to use is_full() and is_nearfull() to create want value.
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
src/osd/OSD.cc
Outdated
| bool OSDService::is_nearfull() | ||
| { | ||
| Mutex::Locker l(full_status_lock); | ||
| return cur_state >= NEARFULL; |
There was a problem hiding this comment.
I would have left this "== NEARFULL" because it could be argued that near full represents only the range from nearfull ratio to below full ration.
First, eliminate the useless nearfull failsafe--all it did was generate a log message, which we can do based on the OSDMap states. Add some new helpers. Unify the cluster nearfull/full vs failsafe states so that failsafe is a "really" full state that is more severe than full, so we have NONE, NEARFULL, FULL, FAILSAFE. Pull the full/nearfull ratios out of the OSDMap (remember that we require luminous mons, so these will be initialized). Signed-off-by: Sage Weil <sage@redhat.com>
This ensures that we don't have a down osd that is marked full go up, then realize it's not actually full, and then clear its full flag. That would result in a cluster full blip that isn't needed. This can easily happen if the full_ratio in the osdmap is increased while the OSD is down. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
For luminous, set cluster flags based on osd flags. Until require_luminous is set, stick with the old pgmap-based behavior. Move the new check to encode_pending so that the cluster flag is set in the same epoch that the osd state(s) change. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Note that this tells us how many OSDs are full or nearfull; it does not include detailed warnings telling you exactly what the utilization is because we don't have the full osd_stat_t available. We leave it to ceph-mgr to generate those health messages. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
|
Minor nit: cluster log changes like this need updates to the log whitelists in the fs suite (http://tracker.ceph.com/issues/19253) |