fixes for CRC mismatch failures on OSD when upgrade by tchaikov · Pull Request #11258 · ceph/ceph

tchaikov · 2016-09-29T09:18:43Z

No description provided.

add an option named "osd_ignore_bad_map_crc" for OSD. the OSD will request for full maps from monitor if the CRC of the resulting full map after applying the incremental map does not match with the one stored in inc map being applied. in most cases, this behaviour is just annoying but harmless. it just adds up the burden of monitor add cause extra traffic in the network. but in a large deployment, after upgrading the monitor with change(s) in the encoding of OSDMap, any changes related to osd map, the monitors is obliged to reply the full map request from old clients which are not able to re-encode the full map with the expected CRC, this incurs unnecessary overhead of the network and the load on both peers. so we can choose to disable this functionality on the client side while upgrading the cluster. Fixes: http://tracker.ceph.com/issues/17386 Signed-off-by: Kefu Chai <kchai@redhat.com>

if OSD does not request for full map at seeing mismatched CRC of re-encoded osd map, the osd maps on the OSD and monitor will diverge over time. this could cause mysterious problem because of the divergence in the cluster. so we need to enable this option and reboot the osd if `osd_ignore_bad_map_crc` is enabled when upgrading this OSD in question. Fixes: http://tracker.ceph.com/issues/17386 Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov · 2016-09-29T09:19:35Z

diff --git a/src/messages/MOSDMap.h b/src/messages/MOSDMap.h
index 06e79de..05ff0b6 100644
--- a/src/messages/MOSDMap.h
+++ b/src/messages/MOSDMap.h
@@ -85,7 +85,8 @@ public:
     if ((features & CEPH_FEATURE_PGID64) == 0 ||
        (features & CEPH_FEATURE_PGPOOL3) == 0 ||
        (features & CEPH_FEATURE_OSDENC) == 0 ||
-        (features & CEPH_FEATURE_OSDMAP_ENC) == 0) {
+        (features & CEPH_FEATURE_OSDMAP_ENC) == 0 ||
+       (features & CEPH_FEATURE_OSD_HITSET_GMT) == 0) {
       if ((features & CEPH_FEATURE_PGID64) == 0 ||
          (features & CEPH_FEATURE_PGPOOL3) == 0)
        header.version = 1;  // old old_client version
diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 7961d2f..ce3596d 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -404,7 +404,8 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
   // OSDMaps.  others should be passing around the canonical encoded
   // buffers from on high.  select out those callers by passing in an
   // "impossible" feature bit.
-  assert(features & CEPH_FEATURE_RESERVED);
+  // temporarily disabled for http://tracker.ceph.com/issues/17386
+  // assert(features & CEPH_FEATURE_RESERVED);
   features &= ~CEPH_FEATURE_RESERVED;

   size_t start_offset = bl.length();
@@ -1889,7 +1890,8 @@ void OSDMap::encode(bufferlist& bl, uint64_t features) const
   // OSDMaps.  others should be passing around the canonical encoded
   // buffers from on high.  select out those callers by passing in an
   // "impossible" feature bit.
-  assert(features & CEPH_FEATURE_RESERVED);
+  // assert(features & CEPH_FEATURE_RESERVED);
+  // temporarily disabled for http://tracker.ceph.com/issues/17386
   features &= ~CEPH_FEATURE_RESERVED;

   size_t start_offset = bl.length();

this patch is not included in this patchset.

tchaikov · 2016-09-29T09:21:43Z

@athanatos shall we document in the doc/install/upgrading-ceph.rst to suggest users plan to upgrade from 0.94.6 to upgrade OSD first?

athanatos · 2016-09-29T17:34:18Z

@tchaikov Wouldn't there still be crc mismatch?

athanatos · 2016-09-29T20:45:08Z

@liewegas Does this look right to you?

liewegas · 2016-09-29T21:26:16Z

That third patch is tricky, but I think it will work. We just need to be really careful to remove the ability once we add something to the OSDMap encoding that matters or else someone who does an upgrade will leave this option turned on and be unhappy later. Or rather we should only apply it to hammer... because if the OSDs have the first patch, which we'll backport to jewel, they wouldn't need the third one... it's only there so that current deployments can upgrade mons only and avoid pain. Right?

athanatos · 2016-09-29T21:32:33Z

Yeah, I'm not entirely convinced by that last patch (except as an emergency hotfix which then gets rolled back).

tchaikov · 2016-09-30T06:27:57Z

@athanatos yes, i was wrong, thought that the pg_pool_t::encode() took the GMT_HITSET feature bit into consideration.

it's only there so that current deployments can upgrade mons only and avoid pain. Right?

right, without the patch of #11258 (comment), this change does not make sense. i am dropping the "do not encode crc bits if crc_full is not valid anymore" commit from this changeset.

tchaikov · 2016-09-30T09:34:00Z

@athanatos shall we put this down in doc/install/upgrading-ceph.rst

for user upgrade from 0.94.6 to jewel, it is suggested to upgrade OSD first, and then monitor to workaround a known issue, where a large cluster could experience network saturation and slow requests caused CRC mismatch.

bstillwell · 2016-09-30T17:21:39Z

We're stuck in the middle of an upgrade from 0.94.6 to 0.94.9 because of this problem. The OSD nodes are about 10% on 0.94.9 and 90% on 0.94.6. All the mon nodes are on 0.94.9. Would it make sense to downgrade the mon nodes to 0.94.6, upgrade all the OSD nodes to 0.94.9, and then upgrade the mon nodes? I ask this because even with the hotfix we're still seeing some of the "cluster [WRN] failed to encode map eNNNN with expected crc" messages appear on our test clusters.

tchaikov · 2016-10-04T04:35:14Z

note to myself, we need to note down this in the upgrade doc:

for user planning to upgrade from 0.94.6 to 0.94.9 or up, a large cluster could experience network saturation and slow requests caused CRC mismatch.

a workaround of this known issue is to upgrade all OSD to 0.94.9 first, and then the monitors to jewel.

another workaround is to upgrade all monitors to a version with following fix first and then OSD, as the general practice of upgrade puts.

fixes for CRC mismatch failures on OSD when upgrade #11258 (comment) and

the dropped tricky fix which switches to the old OSDMap encoding without crc bits if the features does not have OSD_HITSET_GMT

tchaikov · 2016-10-24T10:12:56Z

@Sage, with your patches to encode the osdmap with the feature set allowed by the connection, i think we can dispense with this changeset?

liewegas · 2016-10-24T10:19:49Z

Yes and no. If we backport the changes ot jewel and/or hammer, and add a condition to encode the full osdmap without the hitset feature, then yeah. The current PR only pulls out a handful of features, though, and they have to be explicitly called out.

We could adjust that to mask against osdmap::get_up_osd_features too, I suppose, but I think it might be better to explicitly call out any changes that affect osdmap encoding.

liewegas · 2016-10-24T10:20:18Z

In any case, though, I think we can drop this. For now the simple workaround of upgrading OSDs before mons will always work, too.

When MDS is no longer laggy, it should process deferred messages first, then process newly received messages. Fix: ceph#11258 Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit ccdeaf8)

tchaikov added 2 commits September 28, 2016 17:11

tchaikov added bug-fix core labels Sep 29, 2016

tchaikov assigned athanatos Sep 29, 2016

athanatos assigned liewegas Sep 29, 2016

tchaikov force-pushed the wip-17386 branch from e5efa54 to ad0876e Compare September 30, 2016 06:17

tchaikov mentioned this pull request Oct 8, 2016

hammer: doc: add "Upgrading to Hammer" section #11372

Merged

liewegas closed this Oct 24, 2016

tchaikov deleted the wip-17386 branch October 24, 2016 10:30

ukernel mentioned this pull request Jun 27, 2016

hammer: mds: fix out-of-order messages #6754

Closed

Conversation

tchaikov commented Sep 29, 2016

Uh oh!

tchaikov commented Sep 29, 2016

Uh oh!

tchaikov commented Sep 29, 2016

Uh oh!

athanatos commented Sep 29, 2016

Uh oh!

athanatos commented Sep 29, 2016

Uh oh!

liewegas commented Sep 29, 2016

Uh oh!

athanatos commented Sep 29, 2016

Uh oh!

tchaikov commented Sep 30, 2016

Uh oh!

tchaikov commented Sep 30, 2016

Uh oh!

bstillwell commented Sep 30, 2016

Uh oh!

tchaikov commented Oct 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchaikov commented Oct 24, 2016

Uh oh!

liewegas commented Oct 24, 2016

Uh oh!

liewegas commented Oct 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tchaikov commented Oct 4, 2016 •

edited

Loading