Skip to content

remove gogo nullable extension from repeated fields#7669

Merged
htuch merged 5 commits intoenvoyproxy:masterfrom
rshriram:nullable
Jul 30, 2019
Merged

remove gogo nullable extension from repeated fields#7669
htuch merged 5 commits intoenvoyproxy:masterfrom
rshriram:nullable

Conversation

@rshriram
Copy link
Copy Markdown
Member

Description:

Due to a seg fault issue with the gogo protobuf library
[https://github.com/gogo/protobuf/issues/568], non nullable repeated
fields in a proto will cause proto.Merge(dst, src) to panic.

The nullable field setting was first added by @kyessenov when he was
re-organizing the protos. Unfortunately, people have been copy pasting it
across several areas in the Envoy proto. To keep the impact radius to a minimum,
I have updated only the fields that are currently causing the segfault
(in go-control-plane) for us.

Its also partly against proto principles. You should be able to determine if
a field is set or not. This non-nullable setting in gogo will insist on initializing
the field to default values.

Signed-off-by: Shriram Rajagopalan rshriram@tetrate.io

Risk Level: to go control plane users
Testing: N/A
Docs Changes: N/A
Release Notes: N/A
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
@rshriram
Copy link
Copy Markdown
Member Author

@junr03 FYI.

@rshriram
Copy link
Copy Markdown
Member Author

Simple example in Go:

import (
"github.com/gogo/protobuf/proto"
xdsapi "github.com/envoyproxy/go-control-plane/envoy/api/v2"
)
func main() {
 temp1 := &xdsapi.Listener{Name: "l1"}
 temp2 := &xdsapi.Listener{ListenerFilters: []listener.ListenerFilter{{Name: "foo"}}}
 proto.Merge(temp1, temp2)
}

this code will segfault

@rshriram
Copy link
Copy Markdown
Member Author

cc @kyessenov ..

@mandarjog
Copy link
Copy Markdown
Contributor

Will this require more code changes in istio / especially in lock step with proxy?

@mandarjog
Copy link
Copy Markdown
Contributor

Though it looks like this functionality was not being used until istio filter patch implementation.

@rshriram
Copy link
Copy Markdown
Member Author

@mandarjog it will require fixing usage points in istio but should have zero changes to proxy as we are simply fixing the generated go code.

@kyessenov
Copy link
Copy Markdown
Contributor

This is a breaking change in all the clients of go-control-plane. Are you sure you want to use reflection-based cloning? One solution is to specialize the functions like it was done in protoc-gen-validate depending on gogoproto annotations.

@kyessenov
Copy link
Copy Markdown
Contributor

Btw, proto3 semantics doesn't say much about presence of repeated fields. It's a valid goproto option to have a long slice of nils for repeated fields, and is also a cause of bugs.

@kyessenov
Copy link
Copy Markdown
Contributor

Another solution would be to merge gogo/protobuf#569 into gogoproto. Seems like a small fix that got lost in the noise.

@rshriram
Copy link
Copy Markdown
Member Author

This is not just for cloning. Its also for merging. Also the current generated code is very inconsistent. http filters is a slice of pointers, where the http_filters field is nullable. network filters are non nullable.

@kyessenov
Copy link
Copy Markdown
Contributor

Yes, it is inconsistent. It was started to enforce correct-by-construction discipline, where a field was required to be set due to validation constraints. For repeated fields, there was some corner case with a nil in the slice that would break, but I don't remember the details.

@dio
Copy link
Copy Markdown
Member

dio commented Jul 21, 2019

@kyessenov do you mind to review this? Thanks!

@rshriram
Copy link
Copy Markdown
Member Author

I dont think this is a "breaking change" as the go-control-plane is simply a library. So, until people update their go-control-plane imports, they wont be affected. And when they do update their imports, their code wont compile - giving them ample opportunities to fix the reference points.

@kyessenov
Copy link
Copy Markdown
Contributor

/lgtm

I'm fine changing repeated fields to pointers. Slice of structs are read-only in golang, so I can imagine the merge would never work unless you change data representation.

@junr03
Copy link
Copy Markdown
Member

junr03 commented Jul 22, 2019

fyi @lita @adilhafeez and others at Lyft working on control plane.

@yangminzhu
Copy link
Copy Markdown
Contributor

@kyessenov
Would you mind to provide some more background about why do we need "gogoproto.nullable" in the first place?

@rshriram
I feel It's better to make the API consistent about the "gogoproto.nullable" option, so if this is not needed in the API, should we remove all of them in the API? https://github.com/envoyproxy/envoy/search?q=%22gogoproto.nullable%22&unscoped_q=%22gogoproto.nullable%22

@kyessenov
Copy link
Copy Markdown
Contributor

gogo.nonnullable is a way to:

  1. enforce required fields, an embedded struct is always set;
  2. relieve some pressure on GC; we had way too many pointers from protos and we have experienced chocking in GC not being able to keep up. A struct with embedded fields acts like a memory arena.

Please me mindful of the projects that consume go-control-plane. We don't need to cause unnecessary work there unless there is a good reason, as these annotations do end up causing compilation issues.

@rshriram
Copy link
Copy Markdown
Member Author

rshriram commented Jul 22, 2019

I dont want to remove all usages of nullable as it would cause a lot of code fixes for all consumers. The current one is sufficient to eliminate the gogo issue around lists.

As for using the nullable as an indirect way to enforce always setting certain fields, I think it will lead to more confusion than help. One can always call envoyproto.Validate() to make sure the generated proto is valid.

Also, FYI (from gogo proto: https://github.com/gogo/protobuf/blob/master/extensions.md).

Warning about nullable: according to the Protocol Buffer specification, 
you should be able to tell whether a field is set or unset. 
With the option nullable=false this feature is lost, since your 
non-nullable fields will always be set.

@yangminzhu
Copy link
Copy Markdown
Contributor

@rshriram Okay, I just want to understand if there is any strong reason for keeping the "gogo.nonnullable" or is this PR a temporary workaround until gogo/protobuf#569 is merged.

Do you expect we will need to add the "gogo.nonnullable" back once it's fixed in gogoproto side or in the long term we don't need the "gogo.nonnullable" for new API?

@rshriram
Copy link
Copy Markdown
Member Author

Do you expect we will need to add the "gogo.nonnullable" back once it's fixed in gogoproto side or in the long term we don't need the "gogo.nonnullable" for new API?

I would say we dont need the nullable for newer fields introduced in the API, or if we do use it, then we have ot use it with care especially for repeated fields. Its very useful to have an array of pointers and not an array of values.

@lita
Copy link
Copy Markdown

lita commented Jul 23, 2019

Hey @rshriram and @kyessenov, can I get more details on how this will break for services using go-control-plane? What is the upgrade path once this is merged and integrated into go-control-plane's library?

@rshriram
Copy link
Copy Markdown
Member Author

@lita it shouldn't break until you update your go-cp version in go mod or dep or whatever you happen to use. You will certainly have compiler errors due to
FilterChains []listener.FilterChain --> FilterChains []*listener.FilterChain
ListenerFilters []listener.ListenerFilter --> ListenerFilters []*listener.ListenerFilter
Filters []listener.Filter --> Filters []*listener.Filter
VirtualHosts []route.VirtualHost --> VirtualHosts []*route.VirtualHosts

essentially moving from array of values to array of pointers.

@lita
Copy link
Copy Markdown

lita commented Jul 23, 2019

Yes, but we need to keep our go-cp up to date, otherwise, we will be out of sync with Envoy OSS.

So any breaking library updates that require code changes are big deal, as it prevents us from upgrading Envoy.

If the community believes this is the right approach, then I am good with the changes, as we currently only use RDS and starting to migrate to LDS. So the code changes are somewhat minimal from Lyft's perspective. But of course, I would prefer to be able to make this fix without code changes if possible.

Thanks!

@rshriram
Copy link
Copy Markdown
Member Author

We can revert the changes to route.proto if need be - then there won’t be any issues at Lyft. But I personally think that it’s easier to maintain array of pointers than array of values. sharing vhosts across listeners, etc becomes easier.

@rshriram
Copy link
Copy Markdown
Member Author

ping

@lita
Copy link
Copy Markdown

lita commented Jul 24, 2019

Yeah, that sounds like a reasonable plan. Let's roll this out now and revert if the changes are way too much.

lizan
lizan previously approved these changes Jul 24, 2019
Copy link
Copy Markdown
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer merging to @htuch, IIUC this break go generated code compatibility, right?

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mind is a bit blown by this thread. Is it possible to summarize why we actually have these nullable fields at all in the APIs? We are working on the next generation xDS APIs right now, and I'm very wary of having language-specific annotations scattered and then updated in potentially breaking ways on-the-fly.

I can understand why we would have a PGV annotation for non-nullability, expressing the semantic nullability information. This would then surface in all languages. Why do we have gogoproto here?

I'm sorry if this is super obvious to Go folks, but my #1 concern is API stability and consistency, so please explain like I'm a 4 year old :)

@rshriram
Copy link
Copy Markdown
Member Author

So the whole gogo thing is geared towards generating usable proto (go) code over the one that the default proto generates. That said the nullable thing was a bit of overkill imo.

The idea behind the nullable extension (which comes with a lot of warnings) is that instead of having a pointer to an object (array obj or a msg obj), you have a value object (array of values in case of filters, vhosts, listener filters, or value object like the address field in a listener).

The advantage as explained to me by the go purists is that when you instantiate a listener proto object in go, it will have an empty value for the address field instead of a nil pointer in the field. If the programmer forgets to properly set fields in the address object, the listener with an empty address object (not nil) ends up in envoy and envoy rejects it.

And according to these guys, the advantage in the case of arrays is that you won’t have array members with nil as the value because the nullable annotation forces the generated code to have arrays of values instead of arrays of pointers.

As I mentioned earlier, this was first present in listeners. Then when people copy pasted stuff for newer proto additions, this annotation got carried over.

From my side, I don’t by the nullable argument because at the code level, you should be able to say whether a (non primitive) field is set or not.

Secondly at code level, this has all the usual disadvantages of dealing with values instead of pointers when you pass things around in code. The worse piece is that if you ever want to mutate these arrays (insert or delete) you end up copying objects fully during the process as opposed to copying pointers. The code gets clunkier as well in several ways, as it prevents you from sharing the same object across several listeners.

IMO, we should not have added these nullable annotations in the first place. Instead of removing it from all places, I restricted it to just the things that are most painful. When people update their go control plane dependency, they would see compilation errors due to the array signature change. It certainly won’t have an impact at runtime because the final proto on the wire is still the same.

@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 25, 2019

Thanks @rshriram, that's a great explanation. @dfawley can you comment on your perspective as a Go client implementor and how you think the above changes (and the gogo nullability stuff) interacts with gRPC Go?

I'm pretty much against having language specific tweaks like this in UDPA (this is the next gen xDS API). If there are semantic properties, e.g. non-nullability, that make sense at the API level, they should be expressed in PGV. There then needs to be a machinery to translate this into whatever Go stubs can understand.

The cognitive load is too high and the challenge around breaking changes becomes much greater with language specific annotations. For v2 xDS, we probably need to formulate a sensible strategy, since this will be around for at least 2 more years.

@mattklein123
Copy link
Copy Markdown
Member

The cognitive load is too high and the challenge around breaking changes becomes much greater with language specific annotations. For v2 xDS, we probably need to formulate a sensible strategy, since this will be around for at least 2 more years.

Huge +1 from me. IMO we need to get rid of all Go specific stuff from these files including the gogo annotations as well as the separate bazel definitions for the go libraries. I think we need someone to invest in figuring this out who is passionate about Go.

@kyessenov
Copy link
Copy Markdown
Contributor

There are few concerns that were not mentioned yet:

  1. gogoproto was primarily chosen for its better performance. Things may have changed as protoc golang support is being re-implemented.
  2. It is a standard practice in golang community to commit generated files. Thus, it is not strictly necessarily to have bazel files for go. The big difference is that go (OSS) uses one-package per proto directory rather than one-package per proto (Google practice). As such, it is possible to introduce a proto that has a symbol conflict in the same directory but not same proto.
  3. It should be possible to remove language-specific proto annotations, and let the language tooling patch the proto. The wire semantics is always preserved.

@dfawley
Copy link
Copy Markdown
Member

dfawley commented Jul 25, 2019

@dfawley can you comment on your perspective as a Go client implementor and how you think the above changes (and the gogo nullability stuff) interacts with gRPC Go?

grpc-go will not be affected, as we do not use gogo proto. I also agree that as much as possible, language-specific settings -- and especially custom-codegen-specific settings -- should not be present in the API definition. (Obviously some things cannot be avoided, like the java package name.)

The cognitive load is too high and the challenge around breaking changes
becomes much greater with language specific annotations.

I agree. For UDPA, and even for the v2 xDS APIs if possible, it would be ideal if the envoy/UDPA repo would own the generated code, at least for Go. In Go, there is a global proto message registration system that will cause panics if multiple definitions of the same proto message exist. This means everyone needs to use a single canonical source for the generated code. This would also provide an opportunity to visualize how proto file changes affect the generated output and determine if API breakages would be caused. For grpc-go, we go through some amount of extra effort to strip the gogo proto annotations and to coerce bazel into working. See grpc/grpc-go#2775 and grpc/grpc-go#2896 if you are interested in understanding how grpc-go does its codegen of the xds APIs.

cc @zhangkun83

@rshriram
Copy link
Copy Markdown
Member Author

we have a code freeze in 10 days and this PR will help fix a nil pointer issue . In the interest of time, given that there is consensus to move away from esoteric go bindings like the nullable thing, can we go ahead with this PR? I have kept the impact radius to a minimum so that Lyft’s internal control plane implementation is also not impacted too much.

@rshriram
Copy link
Copy Markdown
Member Author

Ping @htuch ?

@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 26, 2019

I suggest let's just remove all of these annotations in one PR (rip the bandaid) if we're going to accept the fact that we are making Go-language specific binding breaking changes. It aligns directionally with where we want to go. Also, let's have check_format block adding these back in.

@kyessenov while this seems less optimal for Go implementations, do you think it would be acceptable to help us move in the direction we want to go with the API? Are there any situations that ^^ would not work?

@kyessenov
Copy link
Copy Markdown
Contributor

@htuch This is going to cause a significant churn in the downstream control planes like istio. Some may cause performance changes (using protobuf time/duration instead of golang time/duration, references instead of embedded structs) which are not obvious. Unless there is a strong need, I'd leave them as they are, but put automation to prevent adding more language-specific annotations.

@mattklein123
Copy link
Copy Markdown
Member

mattklein123 commented Jul 26, 2019

@htuch This is going to cause a significant churn in the downstream control planes like istio. Some may cause performance changes (using protobuf time/duration instead of golang time/duration, references instead of embedded structs) which are not obvious. Unless there is a strong need, I'd leave them as they are, but put automation to prevent adding more language-specific annotations.

FWIW I think we should rip the bandaid and delete them all. If other projects like go-control-plane want to carry patches, etc. for this type of stuff I think that would be an option. Will defer to @htuch though.

@kyessenov
Copy link
Copy Markdown
Contributor

15a19b9 is critical to preserve if we decide to rip annotations. Without it, we get CPU spikes due to flip-flopping bytes in the serialized output.

@lizan
Copy link
Copy Markdown
Member

lizan commented Jul 26, 2019

15a19b9 is critical to preserve if we decide to rip annotations. Without it, we get CPU spikes due to flip-flopping bytes in the serialized output.

Agreed we can't rip this right now, however I'm wondering if this can be achieved in other ways, e.g. setting them in protoc options (which is passed to protoc-gen-gogo), or a setting in global level.

@htuch @mattklein123 I believe we all agreed on removing gogo annotation, though removing them once seems causing too much churn at this moment, (esp. for the reason ^^).

@rshriram
Copy link
Copy Markdown
Member Author

rshriram commented Jul 26, 2019

it’s not as invasive or pervasive as you think @htuch. There are only two gogo annotations that are present in all proto files (except for the marshalling order one): gogo stdduration and gogo nullable. Stdduration is innocuous. It actually does the right thing that go protobuf should have done (representing time in the language native way). The second is the nullable one. This is the thing that needs to go away.

@rshriram rshriram closed this Jul 26, 2019
@rshriram rshriram reopened this Jul 26, 2019
@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 29, 2019

@rshriram is it an option to fix gogo protobuf lib to deal with the nullability issue? I'm hesitant to recommend what seems to be arbitrary removal of some of the annotations for a specific project consuming Envoy without a clear idea of how this fits into how we're going to deal with these annotations. Can someone from Istio open an issue to track how gogo protobuf annotations are going to be handled going forward in Envoy and describe how this PR fits in there?

We can accommodate some expedient changes like this for the benefit of a major consumer of Envoy like Istio, in particular as it's directionally aligned with where we want to go with the API, but we should try and have a clear plan and owners for following up, so that we don't end up with a bunch of other arbitrary add/remove PRs for these annotations in the future.

rshriram added 2 commits July 29, 2019 09:00
…lable

Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #7669 was synchronize by rshriram.

see: more, trace.

@rshriram
Copy link
Copy Markdown
Member Author

allright. @htuch I have done the unspeakable and nuked all occurrences of gogo.nullable=false. Hopefully, no body else adds this annotation in future. @lita take note as well. This is going to be a one time annoyance to fix all the compilation errors.

rshriram added 2 commits July 29, 2019 20:15
Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
…lable

Signed-off-by: Shriram Rajagopalan <rshriram@tetrate.io>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #7669 was synchronize by rshriram.

see: more, trace.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved at my end, @kyessenov is this going to be acceptable for you folks?

@kyessenov
Copy link
Copy Markdown
Contributor

I don't have any more concerns that what was discussed. In the context of istio, @rshriram is a lead of networking there, so if this is OK with him, it's OK with me.

@htuch htuch merged commit b22d2b5 into envoyproxy:master Jul 30, 2019
@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 30, 2019

@kyessenov ack, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.