Skip to content

Draft: supporting API minor versions (Do not submit)#14943

Closed
adisuissa wants to merge 10 commits intoenvoyproxy:mainfrom
adisuissa:deprecated_minor_version_v5
Closed

Draft: supporting API minor versions (Do not submit)#14943
adisuissa wants to merge 10 commits intoenvoyproxy:mainfrom
adisuissa:deprecated_minor_version_v5

Conversation

@adisuissa
Copy link
Copy Markdown
Contributor

Signed-off-by: Adi Suissa-Peleg adip@google.com

This is a draft PR based on the proposal in #8416, allowing to gather comments on whether we are heading in the right direction.
The plan is to gather some feedback, and introduce incremental PRs.

Roadmap:

  1. Introduce API version (X.Y.Z) & accessors - current draft (API_VERSION, source/common/version/api_version.h)
  2. Introduce proto’s deprecated fields/enum minor_version annotation - current draft + example (api/envoy/annotations/deprecation.proto)
  3. API feature life-cycle description - current draft (api/API_VERSIONING.md)
  4. Adding minor version range as context params - current draft (source/common/config/context_provider_impl.h)
  5. Adding DiscoveryResponseStatus - current draft (api/envoy/service/discovery/v3/discovery.proto)
  6. Test showing the minor version range request - current draft (ads_integration_test.cc)
  7. Adding patch version to a (Delta)DiscoveryRequest - current draft(api/envoy/service/discovery/v3/discovery.proto)
  8. Example of a test using minor version - current draft (ads_integration_test.cc)
  9. Example of a test using patch version - future work
  10. Introduce stats counters for failed resource queries due to versioning - future work
  11. Manifest file design doc - future work
  12. Manifest file implementation - future work
  13. Tools supporting minor version updates - future work
  14. Tools supporting patch version updates - future work
  15. Tools for Manifest file updates - future work

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #14943 was opened by adisuissa.

see: more, trace.

@adisuissa
Copy link
Copy Markdown
Contributor Author

@envoyproxy/api-shepherds PTAL

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@htuch htuch self-assigned this Feb 4, 2021
@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 4, 2021

I'll take a first pass.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is great to see coming to fruition. I've left some high level design comments. @markdroth probably should also take a pass (in particular around transport-level changes) as a next round.
/wait

@@ -0,0 +1 @@
3.7.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI at the beginning of Q2 I think we will be bumping to minor version 3.1.0. We will tag all deprecated fields as deprecated at 3.0.0 prior to that. Then in 2022 Q2, we will bump to 3.2.0 and remove support in Envoy for everything deprecated 3.0.0. This gives 2 years of support for minor version 3.0.0 and any field that is removed in 2022 will have had 1-2 years life as deprecated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the first minor version will be 3.1.0, and all deprecated fields will have the deprecated_at annotation set to 3.0.
At the moment it is set to 3.7.0 to "comply" with the api/envoy/config/core/v3/deprecaetd_example.proto example contents.

The minor version enables API stability and support, and to safely deprecate old
features and reduce the tech debt. A client can support a range of minor API
versions, and can request xDS resources that match this range, using the resource
[context parameters](https://github.com/cncf/udpa/blob/master/xds/core/v3/context_params.proto).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Node will indicate for simple string names. I'd generally lead in with the old way of doing it and then point out the new way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added user_agent_latest_api_version and user_agent_oldest_api_version to Node.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Suggest also updating the docs here to cover both Node and xdstp://

// from a resource name to its status. A resource that doesn't appear in the map
// implies the resource response is valid. No status map implies that all the
// resources in the response are valid.
map<string, DiscoveryResponseStatus> resources_status = 7;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, we should probably move this to a separate PR for this feature in general. @markdroth would this help get rid of the "15 second rule"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think something like this is a good idea. But I do think it should probably be a separate PR.

Do we actually want this to be a separate map? An alternative would be to embed the DiscoveryResponseStatus inside of the Resource message in the normal resources field. That way, clients just need to look at one field for every resource, and some of those resources might be a special type that indicates that the server didn't have the requested resource.

Also, what is the expected semantic here when a subscribed resource does not exist and multiple responses are sent during the time it does not exist? For LDS and CDS, is the server obligated to send every non-existent resource in every response, just like it normally does? For other resource types, can the server send the response status just once, and the client assumes that the status doesn't change until/unless it receives a subsequent response that either populates the resource or changes its response status? (The semantic here might be a bit more obvious if the DiscoveryResponseStatus was inside of the resources field instead of being a separate map.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes more sense to add the DiscoveryResponseStatus to each specific resource (in Resource).
I'm probably missing something here, but IIUC the resources field in a SotW response is of type google.protobuf.Any, and the response is not required to return the resources in a wrapped Resource. If this implies that the Resource wrapper is required how do we plan to the gradual introduction of this to existing xDS services?

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…rsion_v5

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…rsion_v5

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Copy Markdown
Contributor Author

A description of the tools needed for minor/patch versioning verifications and updates (without the Manifest tools):
https://docs.google.com/document/d/1cZjuqBp8lIY4vOzppauFpexvRahtkRKpFlpEG0uQRkI/edit?usp=sharing

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
htuch
htuch previously approved these changes Feb 14, 2021
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I've left another round of feedback. LMK when to take another pass. Thanks.
/wait

The minor version enables API stability and support, and to safely deprecate old
features and reduce the tech debt. A client can support a range of minor API
versions, and can request xDS resources that match this range, using the resource
[context parameters](https://github.com/cncf/udpa/blob/master/xds/core/v3/context_params.proto).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Suggest also updating the docs here to cover both Node and xdstp://

versions, and can request xDS resources that match this range, using the resource
[context parameters](https://github.com/cncf/udpa/blob/master/xds/core/v3/context_params.proto).
Deprecated features, from previous minor or major versions (and their code), can be safely removed
from the client. In general, within a major and minor API version, we do not allow any breaking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within a major version (including all minor versions within that major version), we do not allow ..

The minor version will be reset to 0 upon a release of a new major version.

The patch version tracks incremental changes to the API, such as, adding a new
package, adding a new field, deprecating a field, etc.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making this a list and enumerating explicitly all things that affect path version, since this is basically the specification doc for versioning.

In general, within a package's major API version, we do not allow any breaking changes. The guiding
principle is that neither the wire format nor protobuf compiler generated language bindings should
experience a backward compatible break on a change. Specifically:
In general, within a major and minor API version, we do not allow any breaking changes the protobuf
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within a major version (including all minor versions within that major version), we do not allow (seems a repeat of above)

## Minor version lifecycle

A new API minor version is expected to be released once a year, at the Q2 release. This
allows for gradual introduction of new capabilities and features that will be supported
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows for introduction of new capabilities and features that will be supported by Envoy and gradual deprecation of older feature support in Envoy.

(7 to 8), and features deprecated in minor version 6 will be removed.
When the API version becomes v3.9.0, `field1` will be removed from the API.

Assume the [xDS API shepherds](https://github.com/orgs/envoyproxy/teams/api-shepherds) decide that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, delete this paragraph, I agree it is confusing to keep when reading the context around it.

When the minor version is cut (starting with minor version `Y+1`), the protoxform tool will first
verify that all deprecated fields have a `minor_version` annotations and remove any field or enum
value with minor version `Y-1`. Note that the API should not contain features of
version smaller than or equal to `Y-1` after the a new minor version is introduced.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No API field removal :) These will be human changes to the Envoy where we delete stuff that is not trully unsupported at minor version clock ticks.

// the minor and patch versions are updated upon incremental changes to the API.
message ApiVersionNumber {
// SemVer version of the API.
type.v3.SemanticVersion version = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if you anticipate any other fields in this message?


// [#not-implemented-hide:]
// An optional API patch version of the requested resource that is supported by
// the client. This is useful for use-cases where the client and server support new
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd phrase this as Latest xDS API patch version supported by the client.

envoy::config::core::v3::ApiVersionNumber ApiVersionInfo::makeApiVersion(const char* version) {
envoy::config::core::v3::ApiVersionNumber result;
// Split API_VERSION_NUMBER into version
std::regex ver_regex("([\\d]+)\\.([\\d]+)\\.([\\d]+)");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe just absl::StrSplit on .

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Copy link
Copy Markdown
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to review this; I've been pretty busy lately.

It might be a good idea for us to sync up in person to decide the best way to proceed with this. I'd still prefer to consider handling this via flexible range-based context parameter matching if possible.

Please let me know if you have any questions. Thanks!

@@ -0,0 +1,39 @@
syntax = "proto3";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file has a typo in it.

// from a resource name to its status. A resource that doesn't appear in the map
// implies the resource response is valid. No status map implies that all the
// resources in the response are valid.
map<string, DiscoveryResponseStatus> resources_status = 7;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think something like this is a good idea. But I do think it should probably be a separate PR.

Do we actually want this to be a separate map? An alternative would be to embed the DiscoveryResponseStatus inside of the Resource message in the normal resources field. That way, clients just need to look at one field for every resource, and some of those resources might be a special type that indicates that the server didn't have the requested resource.

Also, what is the expected semantic here when a subscribed resource does not exist and multiple responses are sent during the time it does not exist? For LDS and CDS, is the server obligated to send every non-existent resource in every response, just like it normally does? For other resource types, can the server send the response status just once, and the client assumes that the status doesn't change until/unless it receives a subsequent response that either populates the resource or changes its response status? (The semantic here might be a bit more obvious if the DiscoveryResponseStatus was inside of the resources field instead of being a separate map.)

The minor version enables API stability and support, and to safely deprecate old
features and reduce the tech debt. A client can support a range of minor API
versions, and can request xDS resources that match this range, using the resource
[context parameters](https://github.com/cncf/udpa/blob/master/xds/core/v3/context_params.proto).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like to consider adding a flexible range-based matching semantic for minor versions. Can we do that as part of this effort, or do we want to decouple that from this?

Hmm... Actually, now that we've decided to stick with string representations for new-style resource names, doesn't that mean that we don't actually have a way to add more flexible matching syntax later if we don't do it now? When we were going to be using ResourceLocator on the wire, it would have been easy to add new matching types, because they'd be expressed in different fields. But with the string form, it's basically just "key=value&key=value&...". So if we're interpreting the values literally, then any new special syntax we add there may break existing users.

It seems like we need to either add the more flexible matching syntax now or carve out some sort of escape hatch in the URI syntax to allow for future changes. (Actually, the latter would probably be a good idea even if we do the former, since we may want to add other flexible matcher types in the future.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this one. I would like to future proof with some notion of escape hatch in the URI syntax, that's a no brainer. I think I'd like to understand the full set of flexible match objectives before deciding on a concrete syntax and what do use here instead of min/max version. How do you think is best to go forward on that @markdroth?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that range-based matching semantics parameters should be supported, but I'm not certain I understand many of the use-cases that it will be used for.
I think that eventually we can replace this with a parameter similar to this: xds.api.range.minor_version={"start":5,"end":6}.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the concern here that we don't want magic params that have prior agree to semantics? Meaning, if we have an oldest value and a newest value it seems not hard to just do the range matching manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the issue of exact-matching is that the returned resource must have the same parameters.
Range-parameters enable explicit semantics. For example, a cache (or xds-relay) will be able to return the same resource with minor version 5 for a request with the range [4,5] and for a different request with the range [5,6].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the issue is that the current exact-match semantic means that the context params in the result must exactly match the ones in the request. So if you have one client requesting xds.api.range.minor_version=[0, 1] and another requesting xds.api.range.minor_version=[1, 2], you can provide the contents at version 1 to both of them, but you need to return each one with a different context parameter. This will basically lead to cache pollution, where any caching proxy will need an increasing number of copies of each identical resource under different names, and the pollution will increase combinatorically.

What I am proposing is a range-based matching mechanism where the clients can ask for something like xds.api.range.minor_version=SPECIAL_RANGE_SYNTAX[0, 1] and xds.api.range.minor_version=SPECIAL_RANGE_SYNTAX[1, 2], and both can get back the same resource with context parameter xds.api.range.minor_version=1. In order to make that work, we need to define the syntax for range-based matching, make sure it doesn't conflict with any "flat" context parameter values (e.g., by defining some sort of escaping mechanism), and make sure that xDS servers, clients, and proxies understand it.

This is a bit more work, but I think it's also a cleaner solution. And I think this same mechanism will be able to be used for other use-cases in the future. (We already have one use-case internally for which we've been discussing something like this.)

@htuch, assuming that we have a reasonable way to future-proof the syntax in a way that would allow us to add more flexible matching types if we need to do so in the future, I think I'd advocate for restricting our initial changes to just the cases that we need immediately. For minor version, I think that's range-based matching. And if we want to handle the patch version the same way, we might also want a simple wildcard match for that. I think anything else can wait for the future.

I would be happy to write up a proposal for how this would work, but I need something to build off of. Where did we leave things on plans for the initial xRFC for the new naming scheme? Are you still planning to do that at some point?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks this makes sense to me. I agree we should do the work to implement range syntax. It doesn't seem too terribly difficult.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a proposal here for a syntax which is future proof and can initially accommodate simple range would help unblock work here I think. Thanks Mark.

// API features, before the next API minor version is rolled out.
// If not included, the patch version defaults to 0, stating that the client
// supports API features up to the client's minor version.
uint32 patch_version = 7;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here instead of being in the Node message? As currently designed, it's by definition something that works only when directly talking to an authoritative server, because it's not part of the resource name, so caching proxies can't handle it.

Note that if we do add a mechanism for flexible range-based context param matching, then we can make this part of the resource name, so it would work basically the same way as the minor version. I think that would actually make this mechanism easier for people to understand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not federated, it can live in the Node message agreed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something but why isn't it effectively part of the resource name? Meaning, if we assume the patch level for a specific message is a recursive walk of all sub-messages, don't we know it on a per message/resource basis and then it might be different and should be set here? I guess maybe I'm then also confused why we don't need major/minor here also? Or we assume that those are sent in context params?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's related to the semantics of exact-match and of an optional patch support.
If this is part of the context-params, caches should only return resources with the patch, and caches will need to cache the same resource with different patch versions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, this makes sense. Both this convo and the range convo does make me wonder a bit if we are thinking about context params wrong and the server should actually be able to return a result with a subset of matching params? Does that make sense?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do range and move things to patch, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there was a special wildcard param value? (Just thinking out loud here). This would then allow intermediate caches to be quite intelligent about mappings. Taking the patch value this would allow the server to respond with patch_value=* or whatever.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with moving patch into Node. Actually Node already has a field ApiVersionNumber user_agent_latest_api_version where a client can add the patch version it supports.
Just wanted to note that at the moment there's no support for patch version in the xdstp naming scheme version negotiation, and that should be fine because patch version support is optional.

What if there was a special wildcard param value? (Just thinking out loud here). This would then allow intermediate caches to be quite intelligent about mappings. Taking the patch value this would allow the server to respond with patch_value=* or whatever.

IMO the patch field should actually be understood as maximal supported patch version, so if the server responds with patch_value=*, the intermediate servers will not be able to know if the resource is supported in max_patch_version=3 or in max_patch_version=6, and they'll need that to process future requests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, the intermediate servers will not be able to know if the resource is supported in max_patch_version=3 or in max_patch_version=6, and they'll need that to process future requests.

But isn't that effectively by design? The management server shouldn't send * unless it really means it. I think what I'm getting at is it would give a way for the management server to effectively ignore patch that is cache compatible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to introduce a notion of cache-ignored context params. This is analogous to the ability in CDNs to ignore a list of query params. We've discussed adding this a few times to the spec.. I think the point becomes moot if we have range params though, since instead of max supported version N we can just say [0..N] with the same semantics and caches basically ignore it.

// from a resource name to its status. A resource that doesn't appear in the map
// implies the resource response is valid. No status map implies that all the
// resources in the response are valid.
map<string, DiscoveryResponseStatus> resources_status = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that unlike the SotW protocol, which currently doesn't have a way to explicitly indicate that a subscribed resource does not exist, the delta protocol does have such a mechanism (the removed_resources field). So in this case, the only value of this field is to support the version-mismatch case. And if we handle the version via flexible range-based context parameter matching, then we wouldn't need this field at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we handle the version via flexible range-based context parameter matching, then we wouldn't need this field at all.

How does this relate? The management server can still be missing a version in the specified range.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if the version was just a context parameter, then it would effectively be part of the name, so NO_MATCHING_RESOURCE and NO_MATCHING_VERSION_RESOURCE would be the same thing. And at that point, we would not need this field, because the existing removed_resources field would do what we need.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we've actually strictly lost information, but I agree it's a more consistent and logical way of viewing things. I think @mattklein123 was interested in a specific "no matching version" feedback signal, but maybe this would be sufficient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the requirement here is that we have a clear way of providing status to the client about why a resource was not returned. The DiscoveryResponseStatus message seems to do that nicely. I don't have a strong preference on how we do it exactly but I agree with @htuch that we should not lose information on why something happened and should probably have it look similar for both SoTW and delta, whatever the best way of accomplishing that is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with whatever solution you all agree to on this as long as we maintain the data fidelity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if the version was just a context parameter, then it would effectively be part of the name, so NO_MATCHING_RESOURCE and NO_MATCHING_VERSION_RESOURCE would be the same thing. And at that point, we would not need this field, because the existing removed_resources field would do what we need.

Maybe I got it wrong, but I thought of the response status as a complimentary field to the removed resources, providing the reason why were they removed. It essentially allows the following differentiation: a requested resource name (no matter what version) doesn't exist in the management server; and a requested resource in a specific version range doesn't exist in the management server (the server only handles older vs newer versions).
At the moment, the client doesn't use this information (other than logging) but it does allow the flexibility to request the resource from other servers, or to notify the operator that there's an issue that needs to be addressed (the client needs to be updated).

Also, could we consider using google.rpc.Status instead of a custom DiscoveryResponseStatus enum? That seems a bit more flexible in the long run.

I agree it should be used, and stating that the code field in it will be one of the DiscoveryResponseStatusTypes that we define.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, but I tend to agree with Mark that resources_status could be a superset of removed_resources and we can just build it in a way that would allow differentiating the two cases?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the client doesn't use this information (other than logging) but it does allow the flexibility to request the resource from other servers, or to notify the operator that there's an issue that needs to be addressed (the client needs to be updated).

I suspect that there are no cases where the client will need to programmatically differentiate between these two cases. Knowing the difference is useful for humans but not really helpful for client implementations. Most clients will simply treat the resource as non-existing in both cases; I could imagine a client deciding to try another management server instead, but it seems unlikely that a client would choose to do so in one of these two cases but not the other.

That's why I favor simply deprecating the existing removed_resources field and replacing it with map<string, google.rpc.Status> resources_not_available, where the map value would be an error message indicating why the specified resource was not available. This approach would allow us to handle any arbitrary reason why the resource is not available, in case we run into cases other than the two we are currently considering.

I agree it should be used, and stating that the code field in it will be one of the DiscoveryResponseStatusTypes that we define.

I think the code in that proto needs to be one of the standard status code enum values. It would be very surprising to users for the code to be a new error space.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the primary requirement is debuggability; as long as we can include the version failure in the google.rpc.Status string it's all good.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting to see this happening. A few high level comments/questions.

The minor version enables API stability and support, and to safely deprecate old
features and reduce the tech debt. A client can support a range of minor API
versions, and can request xDS resources that match this range, using the resource
[context parameters](https://github.com/cncf/udpa/blob/master/xds/core/v3/context_params.proto).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the concern here that we don't want magic params that have prior agree to semantics? Meaning, if we have an oldest value and a newest value it seems not hard to just do the range matching manually.

// API features, before the next API minor version is rolled out.
// If not included, the patch version defaults to 0, stating that the client
// supports API features up to the client's minor version.
uint32 patch_version = 7;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something but why isn't it effectively part of the resource name? Meaning, if we assume the patch level for a specific message is a recursive walk of all sub-messages, don't we know it on a per message/resource basis and then it might be different and should be set here? I guess maybe I'm then also confused why we don't need major/minor here also? Or we assume that those are sent in context params?

// from a resource name to its status. A resource that doesn't appear in the map
// implies the resource response is valid. No status map implies that all the
// resources in the response are valid.
map<string, DiscoveryResponseStatus> resources_status = 8;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the requirement here is that we have a clear way of providing status to the client about why a resource was not returned. The DiscoveryResponseStatus message seems to do that nicely. I don't have a strong preference on how we do it exactly but I agree with @htuch that we should not lose information on why something happened and should probably have it look similar for both SoTW and delta, whatever the best way of accomplishing that is.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2021

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 4, 2021
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Apr 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api stale stalebot believes this issue/PR has not been touched recently waiting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants