Conversation
|
@athanatos can you review it? thx. |
662ae2d to
42e81b1
Compare
|
I will try to get to this this week. |
|
Not going to get to it today, I'll review it Monday morning. |
athanatos
left a comment
There was a problem hiding this comment.
I commented on some things, but I don't really feel that I understand the intent here. Perhaps you could explain in a bit more detail how the additions in this patch work?
src/mon/NVMeofGwMap.cc
Outdated
|
|
||
| bool NVMeofGwMap::validate_number_locations(int num_gws, int num_locations) | ||
| { | ||
| return true; // This function may have not empty body in the nex commit |
There was a problem hiding this comment.
There is no next commit. Is this for a future PR? Perhaps something like // TODO: add validation
src/mon/NVMeofGwMap.cc
Outdated
|
|
||
| int NVMeofGwMap::cfg_set_location(const NvmeGwId &gw_id, | ||
| const NvmeGroupKey& group_key, | ||
| std::string &location, bool &propose_pending, bool test) { |
src/mon/NVMeofGwMap.cc
Outdated
| */ | ||
| #define MIN_NUM_ANA_GROUPS 0xFFF | ||
| uint32_t min_num_ana_groups_in_gw = MIN_NUM_ANA_GROUPS; | ||
| NvmeGwId min_gw_id = "empty"; |
There was a problem hiding this comment.
What's the intent here -- is empty not a valid gw id?
There was a problem hiding this comment.
@athanatos thanks, agree , will remove this assignment
src/mon/NVMeofGwMap.cc
Outdated
| for (auto& found_gw_state: gws_states) { | ||
| auto st = found_gw_state.second; | ||
| if (st.location == location) { | ||
| if(st.availability == gw_availability_t::GW_AVAILABLE && |
src/mon/NVMeofGwMap.cc
Outdated
| * for ana-grp in list make relocation. | ||
| * if all ana-grps in location active remove location from the map failbacks_in_progress | ||
| */ | ||
| FailbackLocation location = ""; |
There was a problem hiding this comment.
Why declare this here with a placeholder value? Seems like you could just declare it when set below at
location = failbacks_in_progress[group_key];
src/mon/NVMeofGwMap.cc
Outdated
| * function tries to found the minimum loaded gw in location | ||
| * if found, relocate ana grp to it using regular failback FSM | ||
| */ | ||
| #define MIN_NUM_ANA_GROUPS 0xFFF |
There was a problem hiding this comment.
So if any particular gw has more than MIN_NUM_ANA_GROUPS you won't migrate to it? Seems like a bad name.
There was a problem hiding this comment.
@athanatos , the problem is name of value ? what do you propose?
There was a problem hiding this comment.
Well, it seems like a max? MAX_NUM_ANA_GROUPS_FOR_RELOCATE? Also, don't use #defines for things like this, use a constexpr.
src/mon/NVMeofGwMap.cc
Outdated
| bool do_failback = false; | ||
| dout(10) << "Find failback GW for GW " << gw_id << dendl; | ||
| bool force_inter_location = false; | ||
| FailbackLocation location = ""; |
There was a problem hiding this comment.
Once again, why initialize this with a dummy value? It only seems to be used in the if block immediately below, just declare it when assigned to.
src/mon/NVMeofGwMap.cc
Outdated
| for (auto& found_gw_state: gws_states) { | ||
| auto st = found_gw_state.second; | ||
| if (st.availability == gw_availability_t::GW_AVAILABLE) | ||
| if (location == "" || st.location == location) { |
There was a problem hiding this comment.
you are right , this is a bad style because "" is also valid location, thought the current code is also working,
will add parameter to the function "ignore_location"
ea9fb24 to
b561d27
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
b561d27 to
60ebc15
Compare
60ebc15 to
25845c7
Compare
|
I'm still basically just not sure how these commands are supposed to work. Is there documentation to tell users when and how to actually make use of them? |
25845c7 to
08afd86
Compare
|
@athanatos , I added a description to the PR and motivations for the update. |
08afd86 to
3458e78
Compare
|
@athanatos will you be able to continue the review soon? thx. |
athanatos
left a comment
There was a problem hiding this comment.
I don't see any serious problems. I still don't entirely follow the nvme specific logic -- perhaps someone from the nvme team could do a review?
| } | ||
|
|
||
| bool NVMeofGwMap::is_location_in_disaster(const NvmeGroupKey& group_key, | ||
| NvmeLocation& location, bool &cleanup_in_process) { |
There was a problem hiding this comment.
If the goal isn't to be able to call this multiple times with the same cleanup_in_progress param, it would be a lot better to set it in every exit path.
There was a problem hiding this comment.
agree , will change
There was a problem hiding this comment.
@athanatos @baum reviewed and approved. Will you be able to approve.
3458e78 to
a0d57de
Compare
a0d57de to
9dac29a
Compare
GW commands added : set location and set admin state enable/disable added start-failback <location> command. failover logic is impacted by GW location implement GW admin commands enable/disable added map for location-failback-in-progress failback between locations happens only by monitor command implemented new ana-group relocation process used when inter-location failback command sent added upgrade rules fixes: https://tracker.ceph.com/issues/74210 Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
changes in behavior of failover/failback, additional
validations added
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
9dac29a to
380ebf0
Compare
athanatos
left a comment
There was a problem hiding this comment.
This should go through rados teuthology prior to merge.
remove invalid code from NVMeofGwSerialize.h put encode/decode at the file NVmeofGwMap.h Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
|
jenkins test this please |
This reverts commit 6dddf54, reversing changes made to 07ec509. Backport 6dddf54 introduced a new connection feature bit NVMEOF_BEACON_DIFF but there are plans (ceph#66624) to make further enhancements on that feature bit. This would cause the mons to crash during upgrades. However, this connection feature bit should not have been added to begin with. The correct way to do this is extend e55ad7b by @athanatos to require `CEPH_MON_FEATURE_INCOMPAT_NVMEOF_BEACON_DIFF` if all mons support it. This should be done by having mons add/update their supported features the MonMap via an update from `MMonJoin` (see for instance `crush_loc` which was recently added to `mon_info_t`). Once the supported features indicated for each mon in the `MonMap` show they understand the new NVMEOF_BEACON_DIFF, then it should be turned on globally in the `MonMap` as a required feature (added to the incompat set). Conflicts: src/mon/NVMeofGwMon.h: conflicts with header change from 19c9be2 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This reverts commit 6dddf54, reversing changes made to 07ec509. Backport 6dddf54 introduced a new connection feature bit NVMEOF_BEACON_DIFF but there are plans (ceph#66624) to make further enhancements on that feature bit. This would cause the mons to crash during upgrades. However, this connection feature bit should not have been added to begin with. The correct way to do this is extend e55ad7b by @athanatos to require `CEPH_MON_FEATURE_INCOMPAT_NVMEOF_BEACON_DIFF` if all mons support it. This should be done by having mons add/update their supported features the MonMap via an update from `MMonJoin` (see for instance `crush_loc` which was recently added to `mon_info_t`). Once the supported features indicated for each mon in the `MonMap` show they understand the new NVMEOF_BEACON_DIFF, then it should be turned on globally in the `MonMap` as a required feature (added to the incompat set). Conflicts: src/mon/NVMeofGwMon.h: conflicts with header change from 19c9be2 fix missing header change in ceph#66584 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The NVMe-oF GWs are currently unaware of the data center location. When all of the GWs in the group are collocated in the same location, there is no issue. But if GWs (within the same GW group) are located in different locations within the same Ceph cluster, the failover target might be wrong. Choosing the wrong failover target will bring into a situation where the compute is on one site but storage is on another site. This is causing a significant impact to the I/O performance.
Also when disaster cluster recovered failback of ANA groups to it should be done only by monitor command.
Last update
Introduce explicit commands to set and clear a disaster state for a location:
nvme-gw pool group disaster Set
nvme-gw pool group disaster Clear
These commands explicitly inform the monitor of the disaster status of a given location.
remove the command:
nvme-gw pool group start-failback
Expected Behavior
With this mechanism in place:
Failbacks from other locations are allowed for all gateways except those belonging to a location marked as being in a disaster state.
Gateways in a disaster location, even if partially recovered, will not acquire ANA group ownership during failovers caused by gateway failures in other locations.
When a location is marked as recovered using the “disaster Clear” command, the system triggers the same workflow previously initiated by the start-failback command, forcing failbacks to the gateways in the previously affected location.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.