fixes for CRC mismatch failures on OSD when upgrade#11258
fixes for CRC mismatch failures on OSD when upgrade#11258tchaikov wants to merge 2 commits intoceph:masterfrom
Conversation
add an option named "osd_ignore_bad_map_crc" for OSD. the OSD will request for full maps from monitor if the CRC of the resulting full map after applying the incremental map does not match with the one stored in inc map being applied. in most cases, this behaviour is just annoying but harmless. it just adds up the burden of monitor add cause extra traffic in the network. but in a large deployment, after upgrading the monitor with change(s) in the encoding of OSDMap, any changes related to osd map, the monitors is obliged to reply the full map request from old clients which are not able to re-encode the full map with the expected CRC, this incurs unnecessary overhead of the network and the load on both peers. so we can choose to disable this functionality on the client side while upgrading the cluster. Fixes: http://tracker.ceph.com/issues/17386 Signed-off-by: Kefu Chai <kchai@redhat.com>
if OSD does not request for full map at seeing mismatched CRC of re-encoded osd map, the osd maps on the OSD and monitor will diverge over time. this could cause mysterious problem because of the divergence in the cluster. so we need to enable this option and reboot the osd if `osd_ignore_bad_map_crc` is enabled when upgrading this OSD in question. Fixes: http://tracker.ceph.com/issues/17386 Signed-off-by: Kefu Chai <kchai@redhat.com>
diff --git a/src/messages/MOSDMap.h b/src/messages/MOSDMap.h
index 06e79de..05ff0b6 100644
--- a/src/messages/MOSDMap.h
+++ b/src/messages/MOSDMap.h
@@ -85,7 +85,8 @@ public:
if ((features & CEPH_FEATURE_PGID64) == 0 ||
(features & CEPH_FEATURE_PGPOOL3) == 0 ||
(features & CEPH_FEATURE_OSDENC) == 0 ||
- (features & CEPH_FEATURE_OSDMAP_ENC) == 0) {
+ (features & CEPH_FEATURE_OSDMAP_ENC) == 0 ||
+ (features & CEPH_FEATURE_OSD_HITSET_GMT) == 0) {
if ((features & CEPH_FEATURE_PGID64) == 0 ||
(features & CEPH_FEATURE_PGPOOL3) == 0)
header.version = 1; // old old_client version
diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 7961d2f..ce3596d 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -404,7 +404,8 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
// OSDMaps. others should be passing around the canonical encoded
// buffers from on high. select out those callers by passing in an
// "impossible" feature bit.
- assert(features & CEPH_FEATURE_RESERVED);
+ // temporarily disabled for http://tracker.ceph.com/issues/17386
+ // assert(features & CEPH_FEATURE_RESERVED);
features &= ~CEPH_FEATURE_RESERVED;
size_t start_offset = bl.length();
@@ -1889,7 +1890,8 @@ void OSDMap::encode(bufferlist& bl, uint64_t features) const
// OSDMaps. others should be passing around the canonical encoded
// buffers from on high. select out those callers by passing in an
// "impossible" feature bit.
- assert(features & CEPH_FEATURE_RESERVED);
+ // assert(features & CEPH_FEATURE_RESERVED);
+ // temporarily disabled for http://tracker.ceph.com/issues/17386
features &= ~CEPH_FEATURE_RESERVED;
size_t start_offset = bl.length();this patch is not included in this patchset. |
|
@athanatos shall we document in the |
|
@tchaikov Wouldn't there still be crc mismatch? |
|
@liewegas Does this look right to you? |
|
That third patch is tricky, but I think it will work. We just need to be really careful to remove the ability once we add something to the OSDMap encoding that matters or else someone who does an upgrade will leave this option turned on and be unhappy later. Or rather we should only apply it to hammer... because if the OSDs have the first patch, which we'll backport to jewel, they wouldn't need the third one... it's only there so that current deployments can upgrade mons only and avoid pain. Right? |
|
Yeah, I'm not entirely convinced by that last patch (except as an emergency hotfix which then gets rolled back). |
|
@athanatos yes, i was wrong, thought that the
right, without the patch of #11258 (comment), this change does not make sense. i am dropping the "do not encode crc bits if crc_full is not valid anymore" commit from this changeset. |
|
@athanatos shall we put this down in
|
|
We're stuck in the middle of an upgrade from 0.94.6 to 0.94.9 because of this problem. The OSD nodes are about 10% on 0.94.9 and 90% on 0.94.6. All the mon nodes are on 0.94.9. Would it make sense to downgrade the mon nodes to 0.94.6, upgrade all the OSD nodes to 0.94.9, and then upgrade the mon nodes? I ask this because even with the hotfix we're still seeing some of the "cluster [WRN] failed to encode map eNNNN with expected crc" messages appear on our test clusters. |
|
note to myself, we need to note down this in the upgrade doc:
|
|
@Sage, with your patches to encode the osdmap with the feature set allowed by the connection, i think we can dispense with this changeset? |
|
Yes and no. If we backport the changes ot jewel and/or hammer, and add a condition to encode the full osdmap without the hitset feature, then yeah. The current PR only pulls out a handful of features, though, and they have to be explicitly called out. We could adjust that to mask against osdmap::get_up_osd_features too, I suppose, but I think it might be better to explicitly call out any changes that affect osdmap encoding. |
|
In any case, though, I think we can drop this. For now the simple workaround of upgrading OSDs before mons will always work, too. |
When MDS is no longer laggy, it should process deferred messages first, then process newly received messages. Fix: ceph#11258 Signed-off-by: Yan, Zheng <zyan@redhat.com> (cherry picked from commit ccdeaf8)
No description provided.