Added the SnapshotMetadata service.#551
Added the SnapshotMetadata service.#551bswartz merged 1 commit intocontainer-storage-interface:masterfrom
Conversation
|
What about this PR #522? |
It has been closed. |
| // SNAPSHOT_METADATA_SERVICE indicates that the Plugin provides | ||
| // RPCs to retrieve metadata on the allocated blocks of a single | ||
| // snapshot, or the changed blocks between a pair of snapshots of | ||
| // the same block volume. | ||
| // The presence of this capability determines whether the CO will | ||
| // attempt to invoke the OPTIONAL SnapshotMetadata service RPCs. | ||
| SNAPSHOT_METADATA_SERVICE = 4 [(alpha_enum_value) = true]; |
There was a problem hiding this comment.
I blindly copied this pattern for the SnapshotMetadata service as it is an optional service.
In the Kubernetes CO implementation proposal, the existence of this gRPC service would be advertised by the presence of a CR, so this capability is not strictly required.
It occurs to me that defining this capability implies that if an SP wanted to use a separate container for the SnapshotMetadata service that they'd have to also implement an Identity service in this container.
There was a problem hiding this comment.
Feedback I received verbally on this is that every CSI server process must provide an Identity service.
f4a058e to
e1bb128
Compare
|
cc @bswartz |
|
cc @jdef |
| // allocated blocks of a snapshot: i.e. this identifies the data ranges | ||
| // that have valid data as they were the target of some previous write | ||
| // operation on the volume. | ||
| message GetAllocatedRequest { |
There was a problem hiding this comment.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and GetDeltaRequest) need to provide credentials like the secrets as in CreateVolumeRequest:
map<string, string> secrets = 5 [(csi_secret) = true];
There was a problem hiding this comment.
I think that makes sense. Many providers might be using this for volume operations @carlbraganza wdyt?
We also might want to add a way to pass the provider-specific params just like the way it's done with CreateVolumeRequest?
// Plugin specific creation-time parameters passed in as opaque
// key-value pairs. This field is OPTIONAL. The Plugin is responsible
// for parsing and validating these parameters. COs will treat
// these as opaque.
map<string, string> parameters = 4;
There was a problem hiding this comment.
Parameters are probably not required, as the CSI-driver should be able to fetch those based on the snapshot_id.
There was a problem hiding this comment.
@nixpanic In K8s, do the secret data come from either of:
csi.storage.k8s.io/snapshotter-secret-[name|namespace]entries of theVolumeSnapshotClass.Parameterscsi.storage.k8s.io/provisioner-secret-[name|namespace]entries of aStorageClass.Parameters
There was a problem hiding this comment.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and
GetDeltaRequest) need to provide credentials like thesecretsas inCreateVolumeRequest:map<string, string> secrets = 5 [(csi_secret) = true];
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
There was a problem hiding this comment.
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
Ceph-CSI for example has a single instance for the CSI-drivers, but (in Kubernetes) the StorageClass points to a particular cluster. Different StorageClasses can point to different clusters, each with their own secrets (also linked in the StorageClass). Because of the stateless nature, there is no guarantee that secrets can be (temporarily) stored for plugin before APIs of this new SnapshotMetadata service are called.
There was a problem hiding this comment.
@nixpanic In K8s, do the secret data come from either of:
* `csi.storage.k8s.io/snapshotter-secret-[name|namespace]` entries of the `VolumeSnapshotClass.Parameters` * `csi.storage.k8s.io/provisioner-secret-[name|namespace]` entries of a `StorageClass.Parameters`
Yes, that is right. The kubernetes-csi/external-snapshotter resolves those secrets and passes the contents in the CSI procedures.
There was a problem hiding this comment.
@jdef, I would say yes, this is an exercise in future proofing. Every other CSI RPC that we've added without a secrets field has later needed a secrets field to be added. My stance is that the default should be to require the CO to provide pre-RPC secrets and to figure out what those look like, unless we can prove it doesn't make sense to do so.
The underlying problem are the few CSI drivers that use per-volume secrets which the CO is responsible for managing. Those drivers find it impossible to implement new features until a secrets field is added and the CO is updated to provide the appropriate per-volume secrets.
| // in this list is controlled by the max_results value in the | ||
| // GetAllocatedRequest message. | ||
| // This field is OPTIONAL. | ||
| repeated BlockMetadata block_metadata = 3; |
There was a problem hiding this comment.
| repeated BlockMetadata block_metadata = 3; | |
| repeated BlockMetadata block_metadata = 3; | |
| // Indicates there are no more allocated blocks in the list. | |
| // This field is REQUIRED. | |
| bool end_of_list = 4; |
There was a problem hiding this comment.
As I understand it, a gRPC stream returns an EOF on proper termination and an error otherwise. Would that not suffice?
There was a problem hiding this comment.
My concern here is when max_results is specified, and you get an EOF, how do you know if you need to call it one more time? Or even if max_results is set to zero, can the SP decide to send less than all the results in one stream?
During my review I was going to add like a whole paragraph explaining the relationship between the max_results field and the number of elements in the block_metadata field, and I decided it was too confusing and it would be better to make it explicit.
The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.
There was a problem hiding this comment.
The core of the problem is that we've introduced BOTH pagination and streaming and this means the client can't rely on the stream ending to indicate that there are no more pages.
Hmm ... I'm not sure I understand your confusion! Perhaps we can use this snippet of Go client prototype code (written by @PrasadG193) to dissect the problem?
c.initGRPCClient(snapMetaSvc.Spec.CACert, snapMetaSvc.Spec.Address, saToken, snapNamespace)
stream, err := c.client.GetDelta(ctx, &pgrpc.GetDeltaRequest{
BaseSnapshotId: snapNamespace + "/" + baseVolumeSnapshot,
TargetSnapshotId: snapNamespace + "/" + targetVolumeSnapshot,
StartingOffset: 0,
MaxResults: uint32(256),
})
if err != nil {
return err
}
done := make(chan bool)
fmt.Println("Resp received:")
go func() {
for {
resp, err := stream.Recv()
if err == io.EOF {
done <- true //means stream is finished
return
}
if err != nil {
log.Fatalf("cannot receive %v", err)
}
respJson, _ := json.Marshal(resp)
fmt.Println(string(respJson))
}
}()
<-done //we will wait until all response is received
The example above doesn't really illustrate processing of the multiple entries in the returned block_metadata slice, as the response is simply dumped as a string in JSON format, but it is clear that instead of the fmt.Println a real client would process the block_metadata slice ("page") in the resp. The stream processing loop itself is exited only on a non-nil err value from stream.Recv(), where io.EOF is not really an error but the formal proper end-of-stream indicator.
The corresponding server (sidecar) side logic for this primarily consists of a loop doing this:
func (s *Server) GetDelta(req *pgrpc.GetDeltaRequest, cbtClientStream pgrpc.SnapshotMetadata_GetDeltaServer) error {
... // call the SP service after assembling the parameters
done := make(chan bool)
go func() {
for {
... // receive resp from the SP service
log.Print("Received response from csi driver, proxying to client")
if err := cbtClientStream.Send(resp); err != nil {
log.Printf(fmt.Sprintf("cannot send %v", err))
return
}
}
}()
<-done //we will wait until all response is received
return nil
}
(The prototype sidecar has very little error handling)
There was a problem hiding this comment.
Oh I see. You're using an error to signal a normal condition. So you always take an extra trip through the loop, except when there are are no records to return. My main concern was how we signal termination to the client, and the error semantics don't show up in the protobuf definitions so I overlooked it.
There was a problem hiding this comment.
Yes, it appears to be the idiomatic way in Go.
There was a problem hiding this comment.
I don't hate this way of doing it, but even if it's idiomatic in Go, I'm not sure we use errors in this way anywhere else in CSI. I just want to highlight that a boolean flag in the return message would achieve the same effect as using an error to signal the end of iteration.
There was a problem hiding this comment.
Why not both? Conventionally, client side should keep the connection alive until io.EOF, to avoid leaks. The end_of_list is a confirmation from the server to the client that everything has been sent. If there is an io.EOF but end_of_list is false, then the client knows it got an incomplete list, and will need to handle it.
There was a problem hiding this comment.
IMO, its a bit paranoic!
There was a problem hiding this comment.
I was thinking of a server-side graceful termination scenario, where the sidecar (or gRPC) had a chance to send an io.EOF before an unexpected termination, and not all the block metadata has been retrieved from the driver plugin. There is no way for the backup software to know the list it got is incomplete. Is there not a valid scenario?
| // allocated blocks of a snapshot: i.e. this identifies the data ranges | ||
| // that have valid data as they were the target of some previous write | ||
| // operation on the volume. | ||
| message GetAllocatedRequest { |
There was a problem hiding this comment.
Ceph-CSI would like to implement this functionality. In order to communicate with the Ceph cluster, the requests (this one and
GetDeltaRequest) need to provide credentials like thesecretsas inCreateVolumeRequest:map<string, string> secrets = 5 [(csi_secret) = true];
why aren't backend secrets configured at the plugin level? why do they need to be part of the RPC here? is this an exercise in future-proofing?
| } | ||
| } | ||
|
|
||
| service SnapshotMetadata { |
There was a problem hiding this comment.
i'm wondering about "stream" for the responses. we don't stream any other type of response. i notice that each of the response types already has repeated block metadata. in practice, how large do we expect these metadata chains to be? it also looks like there's a mechanism to resume from some offset (perhaps in case the stream is interrupted?) - which means that clients already have to keep track of what they've consumed.
why not just send a very large batch of metadata for each request? the golang default gRPC max message size is 4MiB. is that not good enough?
is the use of streaming an example of premature optimization?
There was a problem hiding this comment.
This is definitely a new gRPC pattern for the CSI API, but I don't think this is a case of premature optimization.
The amount of changed block data would be proportional to the size of the volume, the amount of change, and the mechanism chosen to represent changed areas (FIXED_LENGH or VARIABLE_LENGTH). For example, assuming a 1TiB volume (2^40) and 1MiB blocks, there would be 1MiB such blocks; with 8KiB blocks there would 128MiB such blocks. In the worst case for FIXED_LENGTH all the blocks could have changed. In the worst case for VARIABLE_LENGTH every alternative block could have changed. 16 bytes (+ some overhead?) are used for each changed block datum, which should also be factored in, and can approach or exceed this 4MiB threshold.
Beyond the length of a single stream response, assuming a fixed slice length forces the client to support this size, which could break some memory conservative clients. (This is why there is an option for the client to request a max_results limit).
Recall too, that an earlier iteration of this KEP was criticized for passing all this metadata through the Kubernetes API server. The direct connection from the backup app to this new CSI service bypasses the K8s API server but cannot reduce the quantity of metadata!
There was a problem hiding this comment.
@jdef We discussed this in the SIG Data Protection WG meeting. Some relevant observations:
- We're dealing with metadata access here, and not a control path. There is no existing precedent for such operations in the current CSI spec.
- The performance of a gRPC stream would better than the comparative use of a classic pagination API.
- A classic pagination API would involve multiple round-trip calls, versus a single call with the stream API.
- A classic pagination API requires the SP to fill the next page of metadata to return. Depending on the underlying APIs, this may require the SP to reissue the original allocated/diff call and skip the metadata records already returned in previous calls. In contrast, an SP server processing a stream API can send multiple responses without reissuing the allocated/diff call.
There was a problem hiding this comment.
the last bullet point is probably the most convincing here
|
@carlbraganza can you make sure you've signed the CLA. See https://github.com/container-storage-interface/spec/blob/master/CONTRIBUTING.md#licence-and-cla |
Thanks. Looking into getting this signed. |
|
|
||
| // The FIXED_LENGTH value indicates that data ranges are | ||
| // returned in fixed size blocks. | ||
| FIXED_LENGTH=1; |
There was a problem hiding this comment.
nit: consistent spacing around = sign, here and below
| ``` | ||
|
|
||
|
|
||
| #### GetMetadataAllocated RPC |
There was a problem hiding this comment.
nit: other RPC section headers don't end w/ RPC. i haven't scanned the entire doc, but whatever we do here should be consistent w/ the rest (or, at least, the majority of the rest)
There was a problem hiding this comment.
Looks like other RPC sections have the name in the form: RPC Name (within backquotes). The errors subsection has no special formatting. Changing...
| - The **FIXED_LENGTH** style requires a fixed value for the `size_bytes` field in all tuples in a given sequence. | ||
| - The **VARIABLE_LENGTH** style does not constrain the value of the `size_bytes` field in a sequence. | ||
|
|
||
| The Snapshot Metadata service permits either style at the discretion of the plugin as long as the style does not change mid-stream in any given RPC. |
There was a problem hiding this comment.
this seems to imply that subsequent invocations of the same RPC may specify different length styles - even for the same snapshot. is this intended? i hope so, because otherwise the plugin would need to maintain state to enforce consistent styles across multiple RPCs - and we probably don't want to force plugins to do that
There was a problem hiding this comment.
Yes. The spec leaves it to the SP to chose the style on a per RPC basis; consistency is required only within each RPC.
|
|
||
| The Snapshot Metadata service is an optional service that is used to retrieve metadata on the allocated blocks of a single snapshot, or the changed blocks between a pair of snapshots of the same block volume. | ||
| Retrieval of the data blocks of a snapshot is not addressed by this service and it is assumed that existing mechanisms to read snapshot data will suffice. | ||
|
|
There was a problem hiding this comment.
There is general language around concurrency of RPCs issued by CO to a plugin for some volume - some recommendation meant to ease the burden on plugins. I don't see the same wording applied for snapshots (and we should clarify that, in general). But my hunch is that the spirit of such calls would be the same, limiting concurrency w/ respect to calls that pertain to a particular snapshot (or snapshot group). Has thought been given to the concurrency of these new RPCs? i.e. is the intent to align w/ the current spirit of the spec w/ respect to concurrency, or is a different concurrency model needed/wanted?
There was a problem hiding this comment.
I had not looked at that in detail previously.
Looking at the Concurrency and Error Scheme sections it appears that returning an ABORTED error code would suffice if the SP decides to limit concurrent operations. The existing recommendation that the caller retry with an exponential back off is also sufficient.
jdef
left a comment
There was a problem hiding this comment.
thanks for being patient. completed another round of review
| // The plugin will determine an appropriate value if 0, and is | ||
| // always free to send less than the requested value. | ||
| // This field is OPTIONAL. | ||
| uint32 max_results = 3; |
There was a problem hiding this comment.
another "max" field uses int32 instead, another int64. maybe be consistent and not specify unsigned?
There was a problem hiding this comment.
Okay. I'll use the int32 data type used in ListVolumesRequest and ListSnapshotsRequest. The odd one out is NodeGetInfoResponse which uses int64.
| map<string, string> secrets = 4 [(csi_secret) = true]; | ||
| } | ||
|
|
||
| // GetMetadataAllocatedResponse messages are returned in gRPC stream. |
There was a problem hiding this comment.
grammar: ".. are returned in streaming fashion." or something
| // This value must be the same in all such messages returned by | ||
| // the stream. | ||
| // This field is REQUIRED. | ||
| uint64 volume_capacity_bytes = 2; |
|
|
||
| // This is a list of data range tuples. | ||
| // If the value of max_results in the GetMetadataAllocatedRequest | ||
| // message is non-zero, then the number of entries in this list |
There was a problem hiding this comment.
is greater than zero (if we drop the unsigned)
| // MUST be less than or equal to that value. | ||
| // The byte_offset field of each message MUST be strictly increasing. | ||
| // The SP MUST NOT return a BlockMetadata message with a byte_offset | ||
| // less than than the value of starting_offset in the |
There was a problem hiding this comment.
this is inconsistent w/ the downward rounding option granted to plugins as previously described - please reconcile
There was a problem hiding this comment.
for example, if the CO requests a starting offset of 600 and the plugin realigns/rounds to boundary starting at offset 512, then the first block metadata response could be reported w/ an initial offset of 512 - which is not strictly increasing from the CO-specified starting offset.
or am I missing something?
There was a problem hiding this comment.
No, I think you're correct here - this is not quite right. Rewording to:
// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataAllocatedRequest
// message.
| map<string, string> secrets = 5 [(csi_secret) = true]; | ||
| } | ||
|
|
||
| // GetMetadataDeltaResponse messages are returned in gRPC stream. |
| // This value must be the same in all such messages returned by | ||
| // the stream. | ||
| // This field is REQUIRED. | ||
| uint64 volume_capacity_bytes = 2; |
|
|
||
| // This is a list of data range tuples. | ||
| // If the value of max_results in the GetMetadataDeltaRequest message | ||
| // is non-zero, then the number of entries in this list MUST be less |
| // than or equal to that value. | ||
| // The byte_offset field of each message MUST be strictly increasing. | ||
| // The SP MUST NOT return a BlockMetadata message with a byte_offset | ||
| // less than than the value of starting_offset in the |
There was a problem hiding this comment.
same conflict w/ earlier, optional rounding-down by plugin
There was a problem hiding this comment.
Changed to:
// The data range of the first BlockMetadata message in the list MUST
// include the starting_offset requested in the GetMetadataDeltaRequest
// message.
There was a problem hiding this comment.
@jdef This was subsequently modified in response to feedback from @bswartz. Please see this comment below: #551 (comment)
| ##### GetMetadataDelta Errors | ||
| If the plugin is unable to complete the `GetMetadataDelta` call successfully it must return a non-OK gRPC code in the gRPC status. | ||
|
|
||
| The following conditions are well defined: |
There was a problem hiding this comment.
similar concerns w/ snapshots in not-ready states: does NOT_FOUND work or some other code needed?
|
@jdef This comment was in an outdated part of the document so I'm not sure if it was addressed:
This was brought to our attention by @nixpanic, because the Ceph CSI plugin supports multiple sets of secrets (per Pool, I believe). |
Thanks @jdef. I tried to address all the items you surfaced. PTAL. |
| // The plugins may round down this offset to the nearest alignment | ||
| // boundary based on the BlockMetadataType used. |
There was a problem hiding this comment.
Sorry to be pedantic about this but the explanation isn't clear enough that someone could write a test to verify this behavior. Is the SP only allowed to round down if the type is FIXED_LENGTH? And is it only allowed to round by the exactly the size of one block? Would an SP that rounds down by more than one block (because internally it operates on groups of blocks) be in violation of the spec? If an SP uses VARIABLE_LENGTH blocks, how much can it round down by?
My worry is that if we leave it up to SPs and COs to figure out who should discard the "repeated" metadata blocks that might result from an interrupted transmission, they might make incompatible assumptions such that in cases of actual interruption, one or more changed blocks would get missed, but only on certain SP+CO combinations, and the bug is unlikely to get caught because it's a rare thing for these connections to get interrupted unless you specifically test for it. This bug would result in very subtle data corruption, which wouldn't even have a chance of breaking anything until a corrupted backup is restored, and even then the corruption might not get noticed, but it would be there.
I want to be crystal clear about what the SP is allowed to do with this field and what assumptions the CO is allowed to make, so we can write automated tests that ensure there is no chance of a misunderstanding by a SP or a CO implementer resulting in data corruption. I realize that the earlier wording of this section (which I had recommended) no longer made sense after the additional changes we agreed to, but we still need more clarity here.
| // The data range of the first BlockMetadata message in the list MUST | ||
| // include the starting_offset requested in the | ||
| // GetMetadataAllocatedRequest message. |
There was a problem hiding this comment.
Oh wait, this is the language I was looking for in my earlier comment. I think this may be clear enough because it's precise. Do we need to add any clarification in the case that this message is streamed and there are multiple responses for a single request? In that case the restriction mentioned only applies to the first response. The second and later responses must follow after the last block of the previous response (which is hopefully obvious).
There was a problem hiding this comment.
Whew!
Okay. I'll clarify that it is the first tuple in the stream that must have this property. Will now read as:
// The data range of the first BlockMetadata message in the stream
// of GetMetadataAllocatedResponse messages returned MUST include the
// starting_offset requested in the GetMetadataAllocatedRequest
// message.
The language is likewise modified in the description of the GetMetadataDeltaResponse message.
There was a problem hiding this comment.
Oops! @bswartz please see the follow up comment below: #551 (comment)
There was a problem hiding this comment.
Thanks for pointing out the case about the block containing starting_offset not containing any changes -- I had missed this. I'm convinced we can be more precise -- I'm going to take a swing a recommending some new language which is impossible to misunderstand.
|
After responding to @bswartz's previous comment, it occurred to me that requiring the
The language is likewise modified in the description of the I don't know if this negatively impact's @bswartz's request for precise language that can be tested, but it is more accurate. |
|
@bswartz wrote:
I think this is a great idea. I'm adding the following section between the description of the metadata format and the RPC definitions: (Updated after @jdef's feedback and formatted for display above). |
jdef
left a comment
There was a problem hiding this comment.
playing catch-up here, trying to review individual commits. additional word-smithing may be needed.
| it MAY attempt to **continue** the operation by re-sending its request message but with a | ||
| starting_offset adjusted to the NEXT byte position beyond that of any metadata already received. | ||
| The SP MUST always examine the starting_offset and ensure that metadata sent in | ||
| the gRPC stream logically includes the starting_offset but not any byte position prior |
There was a problem hiding this comment.
"but not any byte position prior" seems to contradict the gRPC method docs, which seem to indicate that starting_offset can live in the window defined by (offset, offset+size). or am I reading this wrong?
There was a problem hiding this comment.
Yes, you're correct! I think I got stuck in the previous thread on continuation. Changing to:
The SP MUST always ensure that the
starting_offsetrequested be considered in the computation of the data range for the first message in the returned gRPC stream, though the data range of the first message is not required to actually include thestarting_offsetif there is no applicable data between thestarting_offsetand the start of the data range returned by the first message.
This took quite a few rewrites to be unambiguous and yet semi-readable; please feel free to offer alternatives.
| // The SP MUST NOT return any data ranges for which the byte_offset | ||
| // plus the size_bytes is less than starting_offset. | ||
| // The SP MAY return data ranges with byte_offset less than | ||
| // starting_offset if starting_offset is less than byte_offset plus |
There was a problem hiding this comment.
"... byte_offset less than starting_offset if starting_offset is less than byte_offset ..."
sorry, i'm still confused by this phrasing. what am i missing?
There was a problem hiding this comment.
This is language I recommended, and what I was trying to achieve is to constrain the SP's behavior in case of restarting an abnormally terminated stream. Basically I want to be very precise about what starting_offset means, so that SPs don't take liberties and start somewhere other than where the CO expects. Unfortunately, the only way to do that is to describe what mathematical relationships must hold between the values in the request and response.
There was a problem hiding this comment.
Given a requested starting_offset S, and the first block_metadata B0 of the first SP response in a stream, the SP MUST ensure that B0.byte_offset <= S < B0.byte_offset + B0.size_bytes.
.. is this what you're getting at? if so, can we represent the mathematical relationship using language like this instead of the two sentences that attempt to model this in English? I see where I went wrong reading it the first time, and I probably would have not had the same trouble w/ the same in equation form.
There was a problem hiding this comment.
Yeah I'm perfectly fine using language like that. I had worried that introducing new variables like S and B0 might be extra confusing, but this seems like a case where it will be a little confusing either way, and I do like the clarity of mathematical symbols to express what is really a mathematical relationship.
@carlbraganza do you think you can reword using the language that @jdef recommends?
There was a problem hiding this comment.
I replaced the 2 sentences above with the following:
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that start before but not overlap
// the requested starting_offset: i.e. if S is the requested
// starting_offset, and B0 is block_metadata[0] of the first message
// in the response stream, then the following must always be true:
// B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes.
| // The SP MUST NOT return any data ranges for which the byte_offset | ||
| // plus the size_bytes is less than starting_offset. | ||
| // The SP MAY return data ranges with byte_offset less than | ||
| // starting_offset if starting_offset is less than byte_offset plus |
PrasadG193
left a comment
There was a problem hiding this comment.
Change occurences of "0 based" -> "zero based" to maintain consistency as suggested by jdef
| // the requested starting_offset: i.e. if S is the requested | ||
| // starting_offset, and B0 is block_metadata[0] of the first message | ||
| // in the response stream, then the following must always be true: | ||
| // B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes. |
There was a problem hiding this comment.
| // B0.byte_offset > S || S < B0.byte_offset + B0.size_bytes. | |
| // S < B0.byte_offset or-else S < B0.byte_offset + B0.size_bytes. |
^ this seems more readable? (for me, anyway); otherwise 👍
There was a problem hiding this comment.
No problem. How about if I change it to:
...
in the response stream, then either (S < B0.byte_offset) must be
true or else (S < B0.byte_offset + B0.size_bytes) must be true.
I think this accurately replaces || in my original sentence.
bswartz
left a comment
There was a problem hiding this comment.
I found some more typos, and rather than just recommend the minimal fix, I am recommending a significant simplification of the language, because using mathematical expressions allows us to be more precise with less verbiage.
| // The SP MUST ensure that the returned response stream does not | ||
| // contain BlockMetadata tuples that start before but not overlap | ||
| // the requested starting_offset: i.e. if S is the requested | ||
| // starting_offset, and B0 is block_metadata[0] of the first message | ||
| // in the response stream, then either (S < B0.byte_offset) must be | ||
| // true or else (S < B0.byte_offset + B0.size_bytes) must be true. |
There was a problem hiding this comment.
Now that we have precise math expressions, we can make the English verbiage a bit more terse. Note that I'm not nitpicking here, the previous language had typos that needed fixing anyways, so I'm going to recommend a wholesale simplification of this block. Also, the two inequalities expressed were redundant, because the second implies the first as long as size_bytes is non-negative (and we actually require it to be positive).
| // The SP MUST ensure that the returned response stream does not | |
| // contain BlockMetadata tuples that start before but not overlap | |
| // the requested starting_offset: i.e. if S is the requested | |
| // starting_offset, and B0 is block_metadata[0] of the first message | |
| // in the response stream, then either (S < B0.byte_offset) must be | |
| // true or else (S < B0.byte_offset + B0.size_bytes) must be true. | |
| // The SP MUST ensure that the returned response stream does not | |
| // contain BlockMetadata tuples that end before the requested | |
| // starting_offset: i.e. if S is the requested starting_offset, and B0 is | |
| // block_metadata[0] of the first message in the response stream, | |
| // then S < B0.byte_offset + B0.size_bytes must be true. |
There was a problem hiding this comment.
Thanks, Ben. That is simpler. Will now read as:
// The SP MUST ensure that the returned response stream does not
// contain BlockMetadata tuples that end before the requested
// starting_offset: i.e. if S is the requested starting_offset, and
// B0 is block_metadata[0] of the first message in the response
// stream, then (S < B0.byte_offset + B0.size_bytes) must be true.
| int64 byte_offset = 1; | ||
|
|
||
| // This is the size of the data range. | ||
| // size_bytes MUST be greater than 0. |
There was a problem hiding this comment.
| // size_bytes MUST be greater than 0. | |
| // size_bytes MUST be greater than zero. |
| int64 byte_offset = 1; | ||
|
|
||
| // This is the size of the data range. | ||
| // size_bytes MUST be greater than 0. |
There was a problem hiding this comment.
| // size_bytes MUST be greater than 0. | |
| // size_bytes MUST be greater than zero. |
| // This is a list of data range tuples. | ||
| // If the value of max_results in the GetMetadataAllocatedRequest | ||
| // message is greater than zero, then the number of entries in this | ||
| // list MUST be less than or equal to that value. | ||
| // The byte_offset field of each message MUST be strictly increasing. | ||
| // The computation of the data range in the first BlockMetadata | ||
| // message in the stream of GetMetadataAllocatedResponse messages | ||
| // returned MUST consider the starting_offset requested in the | ||
| // GetMetadataAllocatedRequest message. | ||
| // The SP MUST NOT return data ranges for which byte_offset plus | ||
| // size_bytes is less than or equal to starting_offset. | ||
| // Allocated data ranges which overlap the starting_offset | ||
| // or for which byte_offset plus size_bytes is greater than | ||
| // starting_offset MUST be returned by the SP. | ||
| // BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | ||
| // for any two BlockMetadata messages, A and B, if | ||
| // A.byte_offset < B.byte_offset then | ||
| // A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. |
There was a problem hiding this comment.
Another simplification of the English since we can lean on mathematical expressions for precision. Also I think that we don't need to repeat the language about starting_offset here. A simple reference to the other section should be enough.
| // This is a list of data range tuples. | |
| // If the value of max_results in the GetMetadataAllocatedRequest | |
| // message is greater than zero, then the number of entries in this | |
| // list MUST be less than or equal to that value. | |
| // The byte_offset field of each message MUST be strictly increasing. | |
| // The computation of the data range in the first BlockMetadata | |
| // message in the stream of GetMetadataAllocatedResponse messages | |
| // returned MUST consider the starting_offset requested in the | |
| // GetMetadataAllocatedRequest message. | |
| // The SP MUST NOT return data ranges for which byte_offset plus | |
| // size_bytes is less than or equal to starting_offset. | |
| // Allocated data ranges which overlap the starting_offset | |
| // or for which byte_offset plus size_bytes is greater than | |
| // starting_offset MUST be returned by the SP. | |
| // BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | |
| // for any two BlockMetadata messages, A and B, if | |
| // A.byte_offset < B.byte_offset then | |
| // A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. | |
| // This is a list of data range tuples. | |
| // If the value of max_results in the GetMetadataAllocatedRequest | |
| // message is greater than zero, then the number of entries in this | |
| // list MUST be less than or equal to that value. | |
| // The SP MUST respect the value of starting_offset in the request. | |
| // The byte_offset field of each BlockMetadata message MUST be | |
| // strictly increasing and BlockMetadata messages MUST NOT overlap: | |
| // i.e. for any two BlockMetadata messages, A and B, if A is returned | |
| // before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST | |
| // be true. |
There was a problem hiding this comment.
Changed to read:
// This is a list of data range tuples.
// If the value of max_results in the GetMetadataAllocatedRequest
// message is greater than zero, then the number of entries in this
// list MUST be less than or equal to that value.
// The SP MUST respect the value of starting_offset in the request.
// The byte_offset fields of adjacent BlockMetadata messages
// MUST be strictly increasing and messages MUST NOT overlap:
// i.e. for any two BlockMetadata messages, A and B, if A is returned
// before B, then (A.byte_offset + A.size_bytes <= B.byte_offset)
// MUST be true.
// This MUST also be true if A and B are from block_metadata lists in
// different GetMetadataAllocatedResponse messages in the gRPC stream.
// This field is OPTIONAL.
I added an additional sentence to extend the notion of "adjacent" to include the entire gRPC stream.
| // The SP MUST ensure that the returned response stream does not | ||
| // contain BlockMetadata tuples that start before but not overlap | ||
| // the requested starting_offset: i.e. if S is the requested | ||
| // starting_offset, and B0 is block_metadata[0] of the first message | ||
| // in the response stream, then either (S < B0.byte_offset) must be | ||
| // true or else (S < B0.byte_offset + B0.size_bytes) must be true. |
There was a problem hiding this comment.
| // The SP MUST ensure that the returned response stream does not | |
| // contain BlockMetadata tuples that start before but not overlap | |
| // the requested starting_offset: i.e. if S is the requested | |
| // starting_offset, and B0 is block_metadata[0] of the first message | |
| // in the response stream, then either (S < B0.byte_offset) must be | |
| // true or else (S < B0.byte_offset + B0.size_bytes) must be true. | |
| // The SP MUST ensure that the returned response stream does not | |
| // contain BlockMetadata tuples that end before the requested | |
| // starting_offset: i.e. if S is the requested starting_offset, and B0 is | |
| // block_metadata[0] of the first message in the response stream, | |
| // then S < B0.byte_offset + B0.size_bytes must be true. |
There was a problem hiding this comment.
Handled like the previous case.
| // This is a list of data range tuples. | ||
| // If the value of max_results in the GetMetadataDeltaRequest message | ||
| // is greater than zero, then the number of entries in this list MUST | ||
| // be less than or equal to that value. | ||
| // The byte_offset field of each message MUST be strictly increasing. | ||
| // The computation of the data range in the first BlockMetadata | ||
| // message in the stream of GetMetadataDeltaResponse messages | ||
| // returned MUST consider the starting_offset requested in the | ||
| // GetMetadataDeltaRequest message. | ||
| // The SP MUST NOT return data ranges for which byte_offset plus | ||
| // size_bytes is less than or equal to starting_offset. | ||
| // Changed data ranges which overlap the starting_offset | ||
| // or for which byte_offset plus size_bytes is greater than | ||
| // starting_offset MUST be returned by the SP. | ||
| // BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | ||
| // for any two BlockMetadata messages, A and B, if | ||
| // A.byte_offset < B.byte_offset then | ||
| // A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. |
There was a problem hiding this comment.
| // This is a list of data range tuples. | |
| // If the value of max_results in the GetMetadataDeltaRequest message | |
| // is greater than zero, then the number of entries in this list MUST | |
| // be less than or equal to that value. | |
| // The byte_offset field of each message MUST be strictly increasing. | |
| // The computation of the data range in the first BlockMetadata | |
| // message in the stream of GetMetadataDeltaResponse messages | |
| // returned MUST consider the starting_offset requested in the | |
| // GetMetadataDeltaRequest message. | |
| // The SP MUST NOT return data ranges for which byte_offset plus | |
| // size_bytes is less than or equal to starting_offset. | |
| // Changed data ranges which overlap the starting_offset | |
| // or for which byte_offset plus size_bytes is greater than | |
| // starting_offset MUST be returned by the SP. | |
| // BlockMetadata messages MUST NOT overlap for a given snapshot, i.e. | |
| // for any two BlockMetadata messages, A and B, if | |
| // A.byte_offset < B.byte_offset then | |
| // A.byte_offset + A.size_bytes <= B.byte_offset MUST be true. | |
| // This is a list of data range tuples. | |
| // If the value of max_results in the GetMetadataAllocatedRequest | |
| // message is greater than zero, then the number of entries in this | |
| // list MUST be less than or equal to that value. | |
| // The SP MUST respect the value of starting_offset in the request. | |
| // The byte_offset field of each BlockMetadata message MUST be | |
| // strictly increasing and BlockMetadata messages MUST NOT overlap: | |
| // i.e. for any two BlockMetadata messages, A and B, if A is returned | |
| // before B, then A.byte_offset + A.size_bytes <= B.byte_offset MUST | |
| // be true. |
There was a problem hiding this comment.
Handled like the previous case.
|
skimmed through the changes yesterday, LGTM only remaining nit is the needless newline whitespaces that pad the new protobuf blocks |
|
@carlbraganza Can you squash the commits? |
|
@carlbraganza I can't find evidence of a signed CLA. Did that end up happening? |
|
BTW I can squash the commits too, but we should add some more detail to the squashed commit message. |
f1eae06 to
afc847a
Compare
|
Rebased on master and squashed comments. |
|
As soon as we get a signed CLA from the author(s) we can merge this. |
|
I can confirm we've received CLA from Veeam including Carl Braganza |
|
I looked at the build test failure - my guess is that I used the wrong version of protoc. |
|
You probably need to rebase on HEAD and rebuild. We merged a patch that changed the Makefile significantly and also updated the protoc compiler version |
afc847a to
aa0e11d
Compare
* Add draft of CSI CBT KEP Signed-off-by: Ivan Sim <ivan.sim@dell.com> * Update KEP status Signed-off-by: Ivan Sim <ivan.sim@dell.com> * Initial structure. Filled in the Proposal, Caveats and Risks. Put in the CSI spec in the Details section. * Removed distracting links to common K8s definitions. Clarified the proposal. * More caveats. Better grammar. * Use "snapshot access session". * addressed most of the feedback in the PR. * Updated role figure. * More refinements. * Session figure. Renamed figure files. * Fix background of session figure. * Updated figures and roles. * Propose a new role for session data. * GRPC spec * Don't propose roles. * Add user stories in the proposal (#2) * Add user stories in the proposal Signed-off-by: Prasad Ghangal <prasad.ghangal@gmail.com> * Remove acceptance criteria for the user stories * Make changes suggested by Carl --------- Signed-off-by: Prasad Ghangal <prasad.ghangal@gmail.com> * Added details to the manager, sidecar and SP service sections. Fixed session figure errors and rewrote the client gRPC description in the risks section. * Called out UNRESOLVED issues. More on the SP service and sidecar. * Resolved issues with expiry and advertising. * Updated TOC * Fixed typo and svg space rendering. * Fixed typo in perms figure. * Typo in session figure. More detail in user stories. * Add SnapshotSession CRDs (#5) * Add SnapshotSession CRDs * Add CR descriptions * Address review comments * Address review comments * Remove typo * Remove unnecessary new line * Added image of the flow when the TokenRequest and TokenReview APIs are used. * Fixed figure spacing * Updated permissions svg; removed session. * Updated figures. Removed session figure. * Added explanation of permissions. * Updated overview and risks. * Updated RPC and components. * Completed remaining rewrite. * Updated to CSI spec to reflect container-storage-interface/spec#551 * Removed the security_token and namespace from the gRPC spec. Pass the security token via the metadata authorization key. Pass the namespace as part of the K8s snapshot id string. * Update sections on test plan, PRR and graduation criteria Signed-off-by: Ivan Sim <ihcsim@gmail.com> * More neutral language on passing the auth token. * Updated to reflect changes in the CSI spec PR. * Use a separate gRPC API for the sidecar. * Replaced authorization gRPC metadata with a security_token field in request messages. * Fixed typo. * Updated CSI spec; downplayed similarity between the K8s and CSI gRPC services. * Add beta and GA graduation criteria Signed-off-by: Ivan Sim <ihcsim@gmail.com> * Updated CSI spec again - no unsigned numbers used. * Update KEP milestone to v1.30 Signed-off-by: Ivan Sim <ihcsim@gmail.com> * Update 'Scalability' section Signed-off-by: Ivan Sim <ihcsim@gmail.com> * Add sig-auth as participating sigs Signed-off-by: Ivan Sim <ihcsim@gmail.com> * Require that the CR be named for the driver. * Removed the label requirement for the CR. * Replaced johnbelamaric with soltysh for PRR approver. * Bump up milestone to v1.31 * Change KEP status to implementable --------- Signed-off-by: Ivan Sim <ivan.sim@dell.com> Signed-off-by: Prasad Ghangal <prasad.ghangal@gmail.com> Signed-off-by: Ivan Sim <ihcsim@gmail.com> Co-authored-by: Carl Braganza <carl@kasten.io> Co-authored-by: Prasad Ghangal <prasad.ghangal@gmail.com>
What type of PR is this?
This is a proposed addition to the CSI specification.
What this PR does / why we need it:
It adds a SnapshotMetadata service through which one can obtain allocated or changed block metadata for snapshots.
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
Derived from kubernetes/enhancements#4082
Does this PR introduce an API-breaking change?: