mon,osd: new mechanism for managing full and nearfull OSDs for luminous by liewegas · Pull Request #13615 · ceph/ceph

liewegas · 2017-02-23T21:37:36Z

per-osd nearfull and full flags
full_ratio and nearfull_ratio stored in the osdmap
new message from osd to mon requesting a state change (by the osd)
mon switches cluster full behavior over from old pg-map scheme to osdmap scheme when require_luminous_osds is set
new mon commands to adjust the osdmap thresholds
some cleanup of the osd-side code

liewegas · 2017-02-26T03:13:51Z

retest this please

liewegas · 2017-02-27T15:50:32Z

retest this please

gregsfortytwo · 2017-02-27T15:57:10Z

This still has a bunch of failures and they don't all look like noise.

liewegas · 2017-03-02T01:07:49Z

@gregsfortytwo failures fixed

liewegas · 2017-03-06T16:27:36Z

passed testing, awaiting final review

Signed-off-by: Sage Weil <sage@redhat.com>

This used to live in PGMap; we're moving it here for luminous (which makes more sense anyway!). Signed-off-by: Sage Weil <sage@redhat.com>

...and make most of these methods private to clarify the public interface Signed-off-by: Sage Weil <sage@redhat.com>

dzafman · 2017-03-06T19:25:08Z

src/mon/OSDMonitor.cc

+  if ((osdmap.get_state(from) & mask) == m->state) {
+    dout(7) << __func__ << " state already " << state << " for osd." << from
+	    << " " << m->get_orig_source_inst() << dendl;
+    _reply_map(op, m->version);


goto ignore?

dzafman · 2017-03-06T20:22:25Z

src/osd/OSD.cc

+	     << " -> " << get_full_state_name(new_state) << dendl;
+    if (new_state == FAILSAFE) {
+      clog->error() << "failsafe engaged, dropping updates, now "
+		    << (int)(ratio * 100) << "% full";


My merged change uses (int)roundf(ratio * 100) which is being dropped in 2 places by your change

dzafman · 2017-03-06T20:36:27Z

src/test/cli/osdmaptool/clobber.t

  modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
  flags 
+  full_ratio 0
+  nearfull_ratio 0


Why are these values 0?

that's just the default osdmap value; the mon sets it to something better during mkfs, or from pgmap during upgrade

dzafman · 2017-03-06T20:45:43Z

src/osd/OSD.cc

+  if (!service.need_fullness_update())
+    return;
+  unsigned state = 0;
+  if (service.is_full()) {


Should this also add "|| check_failsafe_fullI()"? is_full() only says cur_state == FULL?

Better yet make this code and need_fullness_update() consistent by changing is_full() to return true if FULL or FAILSAFE and fix need_fullness_update() to use is_full() and is_nearfull() to create want value.

Signed-off-by: Sage Weil <sage@redhat.com>

dzafman · 2017-03-06T22:12:18Z

src/osd/OSD.cc

+bool OSDService::is_nearfull()
+{
+  Mutex::Locker l(full_status_lock);
+  return cur_state >= NEARFULL;


I would have left this "== NEARFULL" because it could be argued that near full represents only the range from nearfull ratio to below full ration.

First, eliminate the useless nearfull failsafe--all it did was generate a log message, which we can do based on the OSDMap states. Add some new helpers. Unify the cluster nearfull/full vs failsafe states so that failsafe is a "really" full state that is more severe than full, so we have NONE, NEARFULL, FULL, FAILSAFE. Pull the full/nearfull ratios out of the OSDMap (remember that we require luminous mons, so these will be initialized). Signed-off-by: Sage Weil <sage@redhat.com>

This ensures that we don't have a down osd that is marked full go up, then realize it's not actually full, and then clear its full flag. That would result in a cluster full blip that isn't needed. This can easily happen if the full_ratio in the osdmap is increased while the OSD is down. Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

For luminous, set cluster flags based on osd flags. Until require_luminous is set, stick with the old pgmap-based behavior. Move the new check to encode_pending so that the cluster flag is set in the same epoch that the osd state(s) change. Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

Note that this tells us how many OSDs are full or nearfull; it does not include detailed warnings telling you exactly what the utilization is because we don't have the full osd_stat_t available. We leave it to ceph-mgr to generate those health messages. Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

jcsp · 2017-03-10T10:43:35Z

Minor nit: cluster log changes like this need updates to the log whitelists in the fs suite (http://tracker.ceph.com/issues/19253)

liewegas added core feature labels Feb 23, 2017

liewegas requested a review from gregsfortytwo February 23, 2017 21:37

liewegas added the mon label Feb 23, 2017

liewegas mentioned this pull request Feb 24, 2017

osd: various changes for preventing internal ENOSPC condition #13425

Merged

liewegas force-pushed the wip-osd-full branch from 614cb33 to be14a35 Compare February 27, 2017 21:34

liewegas force-pushed the wip-osd-full branch 4 times, most recently from 523e109 to fcbb6f6 Compare March 3, 2017 02:56

liewegas added 3 commits March 6, 2017 13:59

osd: add per-osd FULL and NEARFULL state bits

8a73202

Signed-off-by: Sage Weil <sage@redhat.com>

osd/OSDMap: add [near]full_ratio to OSDMap[::Incremental]

5c6b9d9

This used to live in PGMap; we're moving it here for luminous (which makes more sense anyway!). Signed-off-by: Sage Weil <sage@redhat.com>

osd: rename failsafe [near]full getters appropriately

4e9c362

...and make most of these methods private to clarify the public interface Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-osd-full branch from fcbb6f6 to db71c80 Compare March 6, 2017 19:02

dzafman suggested changes Mar 6, 2017

View reviewed changes

liewegas added 6 commits March 6, 2017 16:42

mon/OSDMonitor: handle MOSDFull messages from OSDs

14b1ab1

Signed-off-by: Sage Weil <sage@redhat.com>

mon/OSDMonitor: set osdmap ratios on mkfs

15f8970

Signed-off-by: Sage Weil <sage@redhat.com>

mon/OSDMonitor: initialize osdmap ratios from pgmap on upgrade

0da7561

Signed-off-by: Sage Weil <sage@redhat.com>

mon/OSDMonitor: implement new 'osd set-[near]full-ratio ...' commands

6422e0a

Signed-off-by: Sage Weil <sage@redhat.com>

qa/workunits/cephtool/test.sh: change [near]full_ratio tests

03287f7

Signed-off-by: Sage Weil <sage@redhat.com>

mon/PGMonitor: disable old 'pg set_[near]full_ratio ...' in luminous

394e45a

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-osd-full branch from db71c80 to 1aaa75b Compare March 6, 2017 21:43

dzafman approved these changes Mar 6, 2017

View reviewed changes

dzafman reviewed Mar 6, 2017

View reviewed changes

liewegas added 6 commits March 6, 2017 17:21

osd: request a fullness state change during tick if needed

00a8bfa

Signed-off-by: Sage Weil <sage@redhat.com>

mon/PGMonitor: stop generating health warnings with luminous

8bab735

Signed-off-by: Sage Weil <sage@redhat.com>

test/cli/osdmaptool: fix osdmap output

699df7d

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-osd-full branch from 1aaa75b to 699df7d Compare March 6, 2017 22:21

liewegas added the wip-sage-testing label Mar 7, 2017

liewegas merged commit a681069 into ceph:master Mar 8, 2017

liewegas deleted the wip-osd-full branch March 8, 2017 03:33

smithfarm mentioned this pull request Apr 10, 2017

[DNM] jewel: osd ops (sent and?) arrive at osd out of order #13885

Closed

Conversation

liewegas commented Feb 23, 2017

Uh oh!

liewegas commented Feb 26, 2017

Uh oh!

liewegas commented Feb 27, 2017

Uh oh!

gregsfortytwo commented Feb 27, 2017

Uh oh!

liewegas commented Mar 2, 2017

Uh oh!

liewegas commented Mar 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcsp commented Mar 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants