remove gogo nullable extension from repeated fields#7669
remove gogo nullable extension from repeated fields#7669htuch merged 5 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
|
@junr03 FYI. |
|
Simple example in Go: import (
"github.com/gogo/protobuf/proto"
xdsapi "github.com/envoyproxy/go-control-plane/envoy/api/v2"
)
func main() {
temp1 := &xdsapi.Listener{Name: "l1"}
temp2 := &xdsapi.Listener{ListenerFilters: []listener.ListenerFilter{{Name: "foo"}}}
proto.Merge(temp1, temp2)
}this code will segfault |
|
cc @kyessenov .. |
|
Will this require more code changes in istio / especially in lock step with proxy? |
|
Though it looks like this functionality was not being used until istio filter patch implementation. |
|
@mandarjog it will require fixing usage points in istio but should have zero changes to proxy as we are simply fixing the generated go code. |
|
This is a breaking change in all the clients of go-control-plane. Are you sure you want to use reflection-based cloning? One solution is to specialize the functions like it was done in protoc-gen-validate depending on gogoproto annotations. |
|
Btw, proto3 semantics doesn't say much about presence of repeated fields. It's a valid goproto option to have a long slice of nils for repeated fields, and is also a cause of bugs. |
|
Another solution would be to merge gogo/protobuf#569 into gogoproto. Seems like a small fix that got lost in the noise. |
|
This is not just for cloning. Its also for merging. Also the current generated code is very inconsistent. http filters is a slice of pointers, where the http_filters field is nullable. network filters are non nullable. |
|
Yes, it is inconsistent. It was started to enforce correct-by-construction discipline, where a field was required to be set due to validation constraints. For repeated fields, there was some corner case with a nil in the slice that would break, but I don't remember the details. |
|
@kyessenov do you mind to review this? Thanks! |
|
I dont think this is a "breaking change" as the go-control-plane is simply a library. So, until people update their go-control-plane imports, they wont be affected. And when they do update their imports, their code wont compile - giving them ample opportunities to fix the reference points. |
|
/lgtm I'm fine changing repeated fields to pointers. Slice of structs are read-only in golang, so I can imagine the merge would never work unless you change data representation. |
|
fyi @lita @adilhafeez and others at Lyft working on control plane. |
|
@kyessenov @rshriram |
|
gogo.nonnullable is a way to:
Please me mindful of the projects that consume go-control-plane. We don't need to cause unnecessary work there unless there is a good reason, as these annotations do end up causing compilation issues. |
|
I dont want to remove all usages of nullable as it would cause a lot of code fixes for all consumers. The current one is sufficient to eliminate the gogo issue around lists. As for using the nullable as an indirect way to enforce always setting certain fields, I think it will lead to more confusion than help. One can always call envoyproto.Validate() to make sure the generated proto is valid. Also, FYI (from gogo proto: https://github.com/gogo/protobuf/blob/master/extensions.md). |
|
@rshriram Okay, I just want to understand if there is any strong reason for keeping the "gogo.nonnullable" or is this PR a temporary workaround until gogo/protobuf#569 is merged. Do you expect we will need to add the "gogo.nonnullable" back once it's fixed in gogoproto side or in the long term we don't need the "gogo.nonnullable" for new API? |
I would say we dont need the nullable for newer fields introduced in the API, or if we do use it, then we have ot use it with care especially for repeated fields. Its very useful to have an array of pointers and not an array of values. |
|
Hey @rshriram and @kyessenov, can I get more details on how this will break for services using go-control-plane? What is the upgrade path once this is merged and integrated into go-control-plane's library? |
|
@lita it shouldn't break until you update your go-cp version in go mod or dep or whatever you happen to use. You will certainly have compiler errors due to essentially moving from array of values to array of pointers. |
|
Yes, but we need to keep our go-cp up to date, otherwise, we will be out of sync with Envoy OSS. So any breaking library updates that require code changes are big deal, as it prevents us from upgrading Envoy. If the community believes this is the right approach, then I am good with the changes, as we currently only use RDS and starting to migrate to LDS. So the code changes are somewhat minimal from Lyft's perspective. But of course, I would prefer to be able to make this fix without code changes if possible. Thanks! |
|
We can revert the changes to route.proto if need be - then there won’t be any issues at Lyft. But I personally think that it’s easier to maintain array of pointers than array of values. sharing vhosts across listeners, etc becomes easier. |
|
ping |
|
Yeah, that sounds like a reasonable plan. Let's roll this out now and revert if the changes are way too much. |
htuch
left a comment
There was a problem hiding this comment.
My mind is a bit blown by this thread. Is it possible to summarize why we actually have these nullable fields at all in the APIs? We are working on the next generation xDS APIs right now, and I'm very wary of having language-specific annotations scattered and then updated in potentially breaking ways on-the-fly.
I can understand why we would have a PGV annotation for non-nullability, expressing the semantic nullability information. This would then surface in all languages. Why do we have gogoproto here?
I'm sorry if this is super obvious to Go folks, but my #1 concern is API stability and consistency, so please explain like I'm a 4 year old :)
|
So the whole gogo thing is geared towards generating usable proto (go) code over the one that the default proto generates. That said the nullable thing was a bit of overkill imo. The idea behind the nullable extension (which comes with a lot of warnings) is that instead of having a pointer to an object (array obj or a msg obj), you have a value object (array of values in case of filters, vhosts, listener filters, or value object like the address field in a listener). The advantage as explained to me by the go purists is that when you instantiate a listener proto object in go, it will have an empty value for the address field instead of a nil pointer in the field. If the programmer forgets to properly set fields in the address object, the listener with an empty address object (not nil) ends up in envoy and envoy rejects it. And according to these guys, the advantage in the case of arrays is that you won’t have array members with nil as the value because the nullable annotation forces the generated code to have arrays of values instead of arrays of pointers. As I mentioned earlier, this was first present in listeners. Then when people copy pasted stuff for newer proto additions, this annotation got carried over. From my side, I don’t by the nullable argument because at the code level, you should be able to say whether a (non primitive) field is set or not. Secondly at code level, this has all the usual disadvantages of dealing with values instead of pointers when you pass things around in code. The worse piece is that if you ever want to mutate these arrays (insert or delete) you end up copying objects fully during the process as opposed to copying pointers. The code gets clunkier as well in several ways, as it prevents you from sharing the same object across several listeners. IMO, we should not have added these nullable annotations in the first place. Instead of removing it from all places, I restricted it to just the things that are most painful. When people update their go control plane dependency, they would see compilation errors due to the array signature change. It certainly won’t have an impact at runtime because the final proto on the wire is still the same. |
|
Thanks @rshriram, that's a great explanation. @dfawley can you comment on your perspective as a Go client implementor and how you think the above changes (and the gogo nullability stuff) interacts with gRPC Go? I'm pretty much against having language specific tweaks like this in UDPA (this is the next gen xDS API). If there are semantic properties, e.g. non-nullability, that make sense at the API level, they should be expressed in PGV. There then needs to be a machinery to translate this into whatever Go stubs can understand. The cognitive load is too high and the challenge around breaking changes becomes much greater with language specific annotations. For v2 xDS, we probably need to formulate a sensible strategy, since this will be around for at least 2 more years. |
Huge +1 from me. IMO we need to get rid of all Go specific stuff from these files including the gogo annotations as well as the separate bazel definitions for the go libraries. I think we need someone to invest in figuring this out who is passionate about Go. |
|
There are few concerns that were not mentioned yet:
|
grpc-go will not be affected, as we do not use gogo proto. I also agree that as much as possible, language-specific settings -- and especially custom-codegen-specific settings -- should not be present in the API definition. (Obviously some things cannot be avoided, like the java package name.)
I agree. For UDPA, and even for the v2 xDS APIs if possible, it would be ideal if the envoy/UDPA repo would own the generated code, at least for Go. In Go, there is a global proto message registration system that will cause panics if multiple definitions of the same proto message exist. This means everyone needs to use a single canonical source for the generated code. This would also provide an opportunity to visualize how proto file changes affect the generated output and determine if API breakages would be caused. For grpc-go, we go through some amount of extra effort to strip the gogo proto annotations and to coerce bazel into working. See grpc/grpc-go#2775 and grpc/grpc-go#2896 if you are interested in understanding how grpc-go does its codegen of the xds APIs. cc @zhangkun83 |
|
we have a code freeze in 10 days and this PR will help fix a nil pointer issue . In the interest of time, given that there is consensus to move away from esoteric go bindings like the nullable thing, can we go ahead with this PR? I have kept the impact radius to a minimum so that Lyft’s internal control plane implementation is also not impacted too much. |
|
Ping @htuch ? |
|
I suggest let's just remove all of these annotations in one PR (rip the bandaid) if we're going to accept the fact that we are making Go-language specific binding breaking changes. It aligns directionally with where we want to go. Also, let's have @kyessenov while this seems less optimal for Go implementations, do you think it would be acceptable to help us move in the direction we want to go with the API? Are there any situations that ^^ would not work? |
|
@htuch This is going to cause a significant churn in the downstream control planes like istio. Some may cause performance changes (using protobuf time/duration instead of golang time/duration, references instead of embedded structs) which are not obvious. Unless there is a strong need, I'd leave them as they are, but put automation to prevent adding more language-specific annotations. |
FWIW I think we should rip the bandaid and delete them all. If other projects like go-control-plane want to carry patches, etc. for this type of stuff I think that would be an option. Will defer to @htuch though. |
|
15a19b9 is critical to preserve if we decide to rip annotations. Without it, we get CPU spikes due to flip-flopping bytes in the serialized output. |
Agreed we can't rip this right now, however I'm wondering if this can be achieved in other ways, e.g. setting them in protoc options (which is passed to protoc-gen-gogo), or a setting in global level. @htuch @mattklein123 I believe we all agreed on removing gogo annotation, though removing them once seems causing too much churn at this moment, (esp. for the reason ^^). |
|
it’s not as invasive or pervasive as you think @htuch. There are only two gogo annotations that are present in all proto files (except for the marshalling order one): gogo stdduration and gogo nullable. Stdduration is innocuous. It actually does the right thing that go protobuf should have done (representing time in the language native way). The second is the nullable one. This is the thing that needs to go away. |
|
@rshriram is it an option to fix gogo protobuf lib to deal with the nullability issue? I'm hesitant to recommend what seems to be arbitrary removal of some of the annotations for a specific project consuming Envoy without a clear idea of how this fits into how we're going to deal with these annotations. Can someone from Istio open an issue to track how gogo protobuf annotations are going to be handled going forward in Envoy and describe how this PR fits in there? We can accommodate some expedient changes like this for the benefit of a major consumer of Envoy like Istio, in particular as it's directionally aligned with where we want to go with the API, but we should try and have a clear plan and owners for following up, so that we don't end up with a bunch of other arbitrary add/remove PRs for these annotations in the future. |
…lable Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
…lable Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
htuch
left a comment
There was a problem hiding this comment.
Approved at my end, @kyessenov is this going to be acceptable for you folks?
|
I don't have any more concerns that what was discussed. In the context of istio, @rshriram is a lead of networking there, so if this is OK with him, it's OK with me. |
|
@kyessenov ack, merged. |
Description:
Due to a seg fault issue with the gogo protobuf library
[https://github.com/gogo/protobuf/issues/568], non nullable repeated
fields in a proto will cause proto.Merge(dst, src) to panic.
The nullable field setting was first added by @kyessenov when he was
re-organizing the protos. Unfortunately, people have been copy pasting it
across several areas in the Envoy proto. To keep the impact radius to a minimum,
I have updated only the fields that are currently causing the segfault
(in go-control-plane) for us.
Its also partly against proto principles. You should be able to determine if
a field is set or not. This non-nullable setting in gogo will insist on initializing
the field to default values.
Signed-off-by: Shriram Rajagopalan rshriram@tetrate.io
Risk Level: to go control plane users
Testing: N/A
Docs Changes: N/A
Release Notes: N/A
[Optional Fixes #Issue]
[Optional Deprecated:]