KEP-3314: CSI Changed Block Tracking#4082
Conversation
Signed-off-by: Ivan Sim <ivan.sim@dell.com>
Signed-off-by: Ivan Sim <ivan.sim@dell.com>
Filled in the Proposal, Caveats and Risks. Put in the CSI spec in the Details section.
Clarified the proposal.
Initial structure. Filled in the Proposal, Caveats and Risks. Put in the CSI spec in the Details section.
|
|
|
|
||
| The `BlockMetadataType` enumeration specifies the style used: `FIXED_LENGTH` or `VARIABLE_LENGTH`. | ||
| When the *block-based* style (`FIXED_LENGTH`) is used it is up to the SP plugin to define the | ||
| block size. |
There was a problem hiding this comment.
How will backup clients make use of FIXED_LENGTH/VARIABLE_LENGTH?
There was a problem hiding this comment.
A backup client would, in general, have to map the "view" of the block device returned by the API to its own view of the block device. For example, a client could have its own concept of block size and would have to map, say, a VMware extent map, to a range of blocks; conversely, it may have to map a range of AWS EBS blocks to an extent map.
There was a problem hiding this comment.
The purpose of FIXED_LENGTH/VARIABLE_LENGTH is still not clear to me.
The API returns an ascending, non-overlapping sequence of BlockMetadata elements. Backup clients may accumulate adjacent BlockMetadata elements (i.e. convert from blocks to extents) and then align offsets/lengths to the backup client's internal block size, but this does not require BlockMetadataType.
Can you give a concrete example of how a backup client will use BlockMetadataType to clarify?
There was a problem hiding this comment.
When we discussed this in the CSI WG, the argument was thus:
- If the SP uses FIXED_LENGTH, it's a promise that lengths will be constant for the given pair of snapshots, which may unlock certain optimizations on the backup software side which wouldn't be possible with VARIABLE_LENGTH
- If the SP uses VARIABLE_LENGTH, it remains free to return whatever lengths it wants -- possible folding consecutive change blocks and thereby compressing the metadata, and the backup software has to cope with it.
If these details aren't clear from the CSI spec, then it's a failure of spec writing, and we should amend the spec to be more clear about the exact semantics of that field.
There was a problem hiding this comment.
@bswartz While the spec covers the effect of BlockMetadataType on the Length field, it's still unclear to me how a backup client would use this information.
The one example I can think of, making decisions about backup data layout (e.g. block size and the backup client's internal metadata parameters), doesn't seem to be well-served by BlockMetadataType:
BlockMetadataType only applies for the current API call. Maybe it could even change between successive calls on the same snapshot pair. Definitely between different snapshot pairs from the same volume.
Is a backup client that uses this information making the assumption that BlockMetadataType and the block size does not change? If yes, then either backup clients cannot use BlockMetadataType for this purpose or the API design needs adjustment.
An API that captures permanent attributes like block size would be something like GetVolumeMetadataHints(volume_id) -> VolumeMetadataHints where the spec guarantees the stability of the hints across snapshots of the volume.
Or maybe I'm still missing how backup clients could use BlockMetadataType?
There was a problem hiding this comment.
Recall that these new snapshot metadata APIs are stream-oriented, so "one API call" is actually the whole stream of metadata for a given pair of snapshots. If the SP commits to always use a Length of 4096 (for example) then the backup software can safely allocate a fixed-size bitmap where each bit represents 4k of change for the whole snasphot.
If an SP returns variable length blocks, then the backup software has to be prepared to use something other than a bitmap to represent the delta, or be prepared to guess a optimal bitmap granularity and be prepared to adjust delta blocks to fit the bitmap, possibly wasting small amounts of space by backing up unchanged data.
There was a problem hiding this comment.
FIXED_LENGTH is expensive over the wire because adjacent BlockMetadata elements cannot be sent merged. It would be more efficient for the SP to include an optional block size hint field in the response and allow the Length field to be vary instead of having BlockMetadataType.
There was a problem hiding this comment.
Good thing the API is alpha and we can evolve it. I agree this isn't the most efficient transport mechanism, but it's dramatically more efficient that what's being done in the absence of this feature. The designers were optimizing for simplicity rather than efficiency, which is the right place to focus in an alpha version of anything.
| repeated BlockMetadata block_metadata = 3; | ||
| } | ||
|
|
||
| message GetMetadataDeltaRequest { |
There was a problem hiding this comment.
Is there a difference between GetMetadataAllocatedRequest and GetMetadataDeltaRequest aside from the snapshot_id vs base_snapshot_id/target_snapshot_id fields?
If there is no difference, maybe make base_snapshot_id optional (like the CSI spec's Snapshot group_snapshot_id field) and unify the two requests to reduce duplication. When base_snapshot_id is empty then allocated data is reported. When base_snapshot_id is non-empty then the delta is reported.
There was a problem hiding this comment.
Is there a difference between GetMetadataAllocatedRequest and GetMetadataDeltaRequest aside from the snapshot_id vs base_snapshot_id/target_snapshot_id fields?
No, not really. We had considered your suggestion and decided to explicitly document separate methods for clarity.
| ``` | ||
| cbt.storage.k8s.io/driver: NAME_OF_THE_CSI_DRIVER | ||
| ``` | ||
| The presence of this label allows a backup application to efficiently locate |
There was a problem hiding this comment.
Can you have multiple CRs for the same driver?
Instead of using labels, can the name of the object == the name of the driver, which is what we require for the CSIDriver object: https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/
Or should we put it in the CSIDriver object itself?
There was a problem hiding this comment.
Yeah, I don't see why we can't use the object name, instead of labels. We explored using the CSIDriver object to store this information, but was told that it was mainly used for kubelet/storage related spec, not things like HTTP endpoints.
There was a problem hiding this comment.
It would be ambiguous to find multiple CRs for a given driver; which one should the application pick?
I agree the object could be named for the driver but the spec doesn't mandate that - it would totally depend on the driver installer. The reason is that the spec does not require these CRs to be in any particular namespace; the app finds them via a label search.
I think there were discussions on modifying the CSIDriver object in the WG and the concept was shot down.
There was a problem hiding this comment.
Which spec doesn't mandate the name of the CR? We could define this requirement as such like we did for the CSIDriver object.
There was a problem hiding this comment.
Sounds reasonable. I'll add something on the name of the CR.
There was a problem hiding this comment.
I had forgotten to remove the CR label requirement - fixed now!
| existing tests to make this code solid enough prior to committing the changes necessary | ||
| to implement this enhancement. | ||
|
|
||
| ##### Prerequisite testing updates |
There was a problem hiding this comment.
We need to make sure that our standard k8s-csi logging methods are not logging any tokens/secrets.
There was a problem hiding this comment.
I agree with the sentiment, but should we be stating that in the spec?
There was a problem hiding this comment.
No, I don't think this needs to be explicit in the spec. It's just an implementation requirement for our common libraries/sidecars as this has been a source of CVEs in the past
Signed-off-by: Ivan Sim <ihcsim@gmail.com>
|
/assign @liggitt |
| @@ -0,0 +1,3 @@ | |||
| kep-number: 3314 | |||
| alpha: | |||
| approver: "@johnbelamaric" | |||
There was a problem hiding this comment.
I've been looking at this before as shadow, and John is overbooked. So let's switch to me with PRR approval on this one.
|
/label tide/merge-method-squash |
soltysh
left a comment
There was a problem hiding this comment.
The PRR is mostly good, I'll wait for approval from sig-storage and sig-auth.
| - Components depending on the feature gate: | ||
| - [x] Other | ||
| - Describe the mechanism: | ||
| The new components will be implemented as part of the out-of-tree CSI framework. |
There was a problem hiding this comment.
This still holds, not a blocking, but would be nice to answer this and questions around monitoring and troubleshooting.
| to Kubernetes backup applications | ||
| by creating a [SnapshotMetadataService CR](#snapshot-metadata-service-custom-resource) | ||
| that contains the service's TCP endpoint address, CA certificate and | ||
| an audience string needed for token authentication. |
There was a problem hiding this comment.
Let's follow up offline on this, but we may want to enforce some syntax/restrictions on the allowed audience strings. The primary goal is to prevent reusing some reserved audience for kube-apiserver. cc @liggitt
|
/lgtm |
|
@soltysh could you provide approval for PRR? Thanks. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ihcsim, msau42, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Uh oh!
There was an error while loading. Please reload this page.