Skip to content

Polishing for messages, reporting, metrics#6946

Merged
costinm merged 30 commits intoistio:release-1.0from
costinm:10-version
Jul 14, 2018
Merged

Polishing for messages, reporting, metrics#6946
costinm merged 30 commits intoistio:release-1.0from
costinm:10-version

Conversation

@costinm
Copy link
Copy Markdown
Contributor

@costinm costinm commented Jul 9, 2018

  • added more info about the service, to report and possibly reject latest changes if they have conflicts
  • improved error messages
  • adding more metrics
    (I'll keep adding)

ResourceVersion string `json:"resourceVersion,omitempty"`

// CreationTimestamp records the creation time
CreationTimestamp meta_v1.Time `json:"resourceVersion,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json name is wrong ... looks like a copy & paste error from the ResourceVersion field. We should probably fix the original as well.

Resolution Resolution

// Name is a unique immutable identifier in a namespace
Name string `json:"name,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that Name refers to the name of the config ServiceEntry for this service? If that's the case, since a single ServiceEntry supports multiple hosts, would there potentially be multiple model.Services with the same Name?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ConfigMeta for this stuff?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigMeta is in a different package - I cut&pasted from there. It's not the host name - but the name of the K8S resource that created the object, so we can display meaningful messages ("your config foo in namespace bar is bad, please delete")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot. Because we are not processing service updates in order. We also have no guarantees on order of addition or removal. If you are doing this based on Karl’s suggestion it won’t work unless you rewrite all of networking logic to handle the service array as a versioned array. If you try to pre validate inside k8s adapter, it might work but for that you don't need to add k8s stuff into the service model.

@nmittler
Copy link
Copy Markdown
Contributor

nmittler commented Jul 9, 2018

@costinm would it make sense to just add a ConfigMeta to model.Service, rather than adding individual fields?

Resolution Resolution

// Name is a unique immutable identifier in a namespace
Name string `json:"name,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ConfigMeta for this stuff?

@ZackButcher
Copy link
Copy Markdown
Contributor

Hah, guess I should read the comments before I go and leave my own :)

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Jul 9, 2018

We can’t detect conflicts caused by new entries because we don’t have a strict log of services present currently.

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the purpose for adding name/namespace into the platform agnostic model.Service. We have strived since day 1 to keep that struct independent of the platform (hence the choice of hostname over name+namespace). The moment you introduce these things, all networking code starts skewing towards k8s, with hacks/specializations everywhere. Not having this info has helped us keep that code mostly platform independent, not to mention allowing other platforms to add istio support

// not been stored and assigned a revision.
ResourceVersion string `json:"resourceVersion,omitempty"`

// CreationTimestamp records the creation time
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had explicitly stayed away from adding namespace, name etc to the service object right from day 1, to keep the main code platform independent. What exactly is the need to add this info ?

Namespace: svc.Namespace,
Name: svc.Name,
CreationTimestamp: svc.CreationTimestamp,
ResourceVersion: svc.ResourceVersion,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the purpose ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging. Any platform we support can have a CreationTimestamp.

I can rename Namespace/Name to 'ConfigLocation' or similar, and remove ResourceVersion if you feel it's too k8s specific. And I agree, the intent was not that.

What we really need is:

  • a way to tell user where the config that breaks pilot is coming from
  • have a timestamp - file or most other registries support that, and if they don't we can set to 0.

The timestamp in turn can provide an indication of ordering - so in case of conflict we can drop newest configs. Or just provide the info for the user.

Happy to rename - the intent is certainly not to add k8s-specific stuff.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. In that case drop name and namespace as hostname already has this in fqdn form. All you need is creationtimestamp. And then in model.Services() Call, sort the array by creation time. That should be sufficient to ignore conflicting new services. We can still display useful info (name and namespace) as the fqdn has it. Service entries are global (no namespace). So in all cases, hostname and time stamp should suffice.

@rshriram
Copy link
Copy Markdown
Member

/hold

@istio-testing istio-testing added the do-not-merge/hold Block automatic merging of a PR. label Jul 10, 2018
@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Jul 10, 2018

BTW - I'll add more stuff to this, the intent is to have more visibility and better metrics around what is wrong, and possibly add some better way to select a config in case of conflicts.

@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Jul 10, 2018

Also: model.Config has all this info, and we are also hoping to make model.Config platform agnostic.
I think it is reasonable to expect any platform to have at least an ID for each config, and a timestamp.

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 10, 2018

Codecov Report

Merging #6946 into release-1.0 will decrease coverage by 1%.
The diff coverage is 43%.

Impacted file tree graph

@@             Coverage Diff              @@
##           release-1.0   #6946    +/-   ##
============================================
- Coverage           72%     72%   -<1%     
============================================
  Files              355     356     +1     
  Lines            30374   30186   -188     
============================================
- Hits             21769   21481   -288     
- Misses            7694    7790    +96     
- Partials           911     915     +4
Impacted Files Coverage Δ
pilot/pkg/networking/plugin/health/health.go 53% <ø> (ø) ⬆️
...ilot/pkg/networking/plugin/authn/authentication.go 74% <ø> (ø) ⬆️
pilot/pkg/networking/core/v1alpha3/tls.go 18% <ø> (ø) ⬆️
pilot/pkg/model/context.go 63% <ø> (ø) ⬆️
pilot/pkg/networking/core/v1alpha3/gateway.go 0% <0%> (ø) ⬆️
...t/pkg/networking/plugin/envoyfilter/envoyfilter.go 20% <0%> (ø) ⬆️
pilot/pkg/serviceregistry/kube/client.go 8% <0%> (-1%) ⬇️
pilot/pkg/proxy/envoy/discovery.go 4% <0%> (ø) ⬇️
pilot/pkg/config/kube/ingress/conversion.go 24% <0%> (ø) ⬆️
pilot/pkg/networking/plugin/authz/rbac.go 79% <0%> (ø) ⬆️
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed7a5b5...a6e345a. Read the comment docs.

func (cs *PushStatus) AfterPush() {
cs.Mutex.Lock()
defer cs.Mutex.Unlock()
LastPushStatus = cs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no mutex for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the mutex - probably need to do the RLock, but one step at a time.

}
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to a function and call this from both Addresses and NotReadyAddresses ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I confirm it does anything useful... Still testing.

ID string `json:"name,omitempty"`

// Time records the resource time (modified time if available, creation otherwise), if available.
Time time.Time `json:"mtime,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort the services in the model.Services() function implementation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort by time ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think @nmittler beat you to it and has a PR

}
}

func (cs *PushStatus) AfterPush() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docs for all of these methods

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, first validating if that's the proper interface.

ID string `json:"name,omitempty"`

// Time records the resource time (modified time if available, creation otherwise), if available.
Time time.Time `json:"mtime,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent with the json name. Either call them both modifiedTime or just time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreationTime is what we need for conflict resolution

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to CreationTime

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, technically modified time is the right one - at least for a filesystem based registry, and I suspect for a lot of other cases. We want to reject newest changes.

conflictingOutbound.Add(1)
pushStat := env.PushStatus
if pushStat != nil {
pushStat.Mutex.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just make this a method and call it for both HTTP and TCP?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is different - and I'm still trying to simplify the interface, will do this after.

What I'm thinking is to have a method taking the metric we want to update - as well as additional info to
add to a more detailed list of errors ( including the actual config object, proxy, etc ). But step 1 is to replace the
bad gauge with one that is reset on each push, step 2 is to add metrics for all error cases - step 3 to
refactor or merge methods.


// PushStatusHandler dumps the last PushStatus
func (s *DiscoveryServer) PushStatusHandler(w http.ResponseWriter, req *http.Request) {
if model.LastPushStatus == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock around accessing the push status?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object is going to be modified while the push is going, but once it gets copied to LastPushStatus it no loger
gets modified.

The pointer itself may change - and that may need a lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if c.Env != nil {
status := c.Env.PushStatus
if status != nil {
status.Mutex.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the mutex private and add a method that does the locking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefit ? I want to add a method that updates the map and does the locking - but not sure what are the right parameters and how to do it. In the end mutex will not be exposed.


}

type PushStatus struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not expose the mutex .... perhaps all of the fields should be private. Instead, add methods that do the appropriate locking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it in a follow-up, right now priority is to verify it works and get the basics out.
There are few ways to make it more generic and avoid the mutex, but can wait for 1.1.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No let's do it now. Otherwise this will be forgotten about. It's an easy enough fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a benefit at this point - will try to add an interface, but right now there are far bigger problems ( detecting when a push is over, the fact changes may happen independent of pushes, etc). Which may require even bigger changes to the interface. Once the basic infra works - and before we convert all warnings and errors to metrics - I'll have a better interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine, but I still expect this cleanup to be part of this PR before submission.

"strconv"
"strings"

"time"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete newline above this (to merge the stdlib import sections)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// ID represents the location of the config used to create this Service. Optional.
// For k8s, it is namespace:name
ID string `json:"name,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used ConfigMeta for all of this data then ConfigMeta.Key() would replace the need for this field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's adding a k8s dependency, I agree with Shriram we don't want that.

@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

"Pods not found in the endpoint table, possibly invalid.",
)

// METRIC_PROXY_UNREADY represents proxies found not be ready.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment out of date.. I think lint will complain

METRIC_NO_INSTANCES = newPushMetric(
"pilot_eds_no_instances",
"Number of clusters without instances.",
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be a metric that changes often? Because services could be autoscaled. So, this metric could be misleading unless we indicate that this is a point in time snapshot

// Internally, the computation will be optimized to ensure that listeners are computed only
// once and shared across multiple invocations of this function.
BuildListeners(env model.Environment, node model.Proxy) ([]*v2.Listener, error)
BuildListeners(env *model.Environment, node *model.Proxy, push *model.PushStatus) ([]*v2.Listener, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this extra argument? env already has the push status

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded on slack - it is cleaner to pass it instead of using a global push status.
We want to report stats about a specific push - how many endpoints, timing, etc. If config changes
too often we may have overlapping pushes - which might be a problem but are trickier to avoid
in 1.0.

( push status will add a push version - and comparing the version in env with the one in the param
will allow us to know if a push can be abandoned - but in a separate PR)

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest comment is to get rid of the unused extra arg added to all functions. That will cut down number of files changed.

Also, convert all ads.Infof(..) about xDS events to Debugf. Otherwise, we are going to see a ton of logs in Pilot.

listenerMapKey, node,
fmt.Sprintf("Port=%d Accepted=%s Rejected=%s Key=%s",
current.servicePort.Port,
current.service.Hostname,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no service variable in the listenerEntry. Multiple services share the same listener port. Thats the reason we don't store a service.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 498 - Nate added it. That's the first service added I suppose - if multiple tcp services share a tcp port the listenerEntry will not know all the services, but we can still report the first tcp service that got accepted and the http service that got rejected.

We could ( in a separate PR) collect the list of services for better reporting, and maybe the 'id' so users know what namespace/config to fix.

Port *model.Port

// Push holds stats and other information about the current push.
Push *model.PushStatus
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.. Its already in Environment struct

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the goal is to pass it explicitly - so we capture info about a specific push. The problem is that we can have overlapping pushes, and reconnects happening at the same time with a push, etc.

The one in Environment is needed short term - probably should add a comment that it's a workaround. That's why I added it as explicit param to the methods.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for remove. Please no more churn in the plugin interface.


s.modelMutex.RLock()
adsLog.Infof("XDS: Registry event, pushing. Services: %d, "+
"VirtualServices: %d, ConnectedEndpoints: %d", len(s.services), len(s.virtualServices), edsClientCount())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it's still very useful - I greatly reduced the verbosity, but it helps to see the timing.
This one in particular is super useful - I can get rid of the individual EDS pushes for example.

con.Routes = routes
adsLog.Infof("ADS:RDS: REQ %s %s (%s) routes: %d", peerAddr, con.ConID, con.modelNode, len(con.Routes))
err := s.pushRoute(con)
adsLog.Infof("ADS:RDS: REQ %s %s routes: %d", peerAddr, con.ConID, len(con.Routes))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// CDS REQ is the first request an envoy makes. This shows up
// immediately after connect. It is followed by EDS REQ as
// soon as the CDS push is returned.
adsLog.Infof("ADS:CDS: REQ %s %v raw: %s ", con.ConID, peerAddr, discReq.String())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first request from an endpoint - happy to move the other REQ to debug, but this one I would keep.

}
if periodicRefreshDuration > 0 {
go periodicRefresh()
go out.periodicRefresh()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Periodicrefreshduration is set to 60s in the beginning of this file. I remember you removing that (and the deleted comment above also states the same). Did you intend to set the global periodicRefreshDuration to 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, set to 0.

defer ticker.Stop()
for range ticker.C {
PushAll()
adsLog.Infof("ADS: periodic push")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also debug. And need more text here. Periodic push of envoy configuration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated text - log is needed for info, to differentiate the kinds.

versionMutex.Unlock()
// Push metrics are updated periodically (10s default)
func (s *DiscoveryServer) periodicRefreshMetrics() {
envOverride := os.Getenv("V2_METRICS")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be a CLI option? that would make things easy.
Also, PILOT_METRICS_PUSH_INTERVAL is better than V2_METRICS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not yet clear what the final solution will be - or if this really needs user customization.
I like to avoid too many visible options, there is also an expectation of stability/long term support for flags. Env are more forgiving.

var err error

// This is a gross hack but Costin will insist on supporting everything from ancient Greece
if strings.Index(clusterName, "outbound") == 0 ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are in modern world, you can also get rid of this entire if strings.index(clusterName..)..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe separate PR ? I think more than enough for this one...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


adsLog.Infof("EDS: PUSH for %s %q clusters %d endpoints %d empty %d %v",
con.ConID, con.PeerAddr, len(con.Clusters), endpoints, emptyClusters, empty)
adsLog.Infof("EDS: PUSH for %s clusters %d endpoints %d empty %d",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely debug!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Jul 13, 2018

@rshriram the extra arg is important, the global push status was a way to get things working, but as we convert more errors to stats we'll need to use the param, which will be associated with a version.

The info -> debug is unrelated to this PR - they have been info, some could be moved to debug but can be done by a separate PR. As I mentioned in comments, some of them need to be info so we can correlate logs.

@kyessenov
Copy link
Copy Markdown
Contributor

Rebase or you will break the build.

@rshriram
Copy link
Copy Markdown
Member

why?

@kyessenov
Copy link
Copy Markdown
Contributor

circle is not adequate to detect incompatible merges. we use pilot plugin interface in other parts of the code base, so it is a contract, and should not be constantly changing.

@rshriram
Copy link
Copy Markdown
Member

Regarding the plugin, yes I agree. I think we should probably remove the push status thing from the plugin callbacks @costinm. Plugins get mutable.InputParams.. We can definitely pile more into that struct

@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Jul 13, 2018

The plugin interface may become a contract - but it is still WIP. Passing env/proxy by value was not ideal,
and we need the status object param, using a global is not ideal either.

@rshriram - we should do a quick round to make all methods take InputParam if we want to provide backward compat after 1.0

@costinm
Copy link
Copy Markdown
Contributor Author

costinm commented Jul 13, 2018

/retest

@costinm costinm merged commit d455f80 into istio:release-1.0 Jul 14, 2018
@costinm costinm deleted the 10-version branch July 14, 2018 00:18
@ymesika ymesika mentioned this pull request Jul 14, 2018
istio-testing pushed a commit that referenced this pull request Jul 23, 2018
* Remove api.* attributes from accesslog as they are yet to be implemented (#7019)

* mixer: setup initial liveness probe (#7065)

* Setup initial liveness probe for Mixer

* Prefer /version to /metrics

* remove unused legacy v1alpha1 routing apis (#7091)

* Updating Kiali to v0.5.0 in Helm Installer (#7007)

* Updating Kiali to v0.5.0

* Adding quotaspecs and quotaspecbindings in clusterole

* Upgrading Kiali Versrion on Requirements File

* better logging for attribute bag (#7031)

Signed-off-by: Kuat Yessenov <kuat@google.com>

* Istio-remote helm: allow setup pilot svc & endpoint for controlPlaneSecurity (#7094)

When controlPlaneSecurityEnabled the spiffe configured by sidecar injector
assumes that the pilot discoveryAddress is a hostname.namespace formatted
string.  This change makes the istio-remote helm chart setup a selectorless
service and endpoint from the pilotAddress when the remotePilotCreateSvcEndpoint
arg is true and sets the MeshConfig discoveryAddress to be the hostname
for istio-pilot.

* remove MeshPolicy/DefaultDestRule when mtls isn't enabled (#7095)

* Change job name to indicate release version (#7092)

* Polishing for messages, reporting, metrics (#6946)

* Make version, timestamp available in service for k8s

* More error and metrics cleanup

* Address Shriram's comments

* Format

* Fix verbose log, format, init

* More work - the push is async, and things are trickier than they look

* Rename mixer job, so upgrade works

* Add back iperf3 without sidecar

* Format

* Switch to *env, no copy by value

* Cleanup, resolve conflicts

* Also convert node to pointer, and add explicit param

* Polish, cleanups

* Revert changes moved to other PRs

* Logging too much

* Polish, less log

* Extra safety, sometimes a stale message is stuck until config change

* fmt and lint errors

* Addres review comments

* Add the ttls test

* Revert the review fix - it breaks tests (some send old style).
Will update post 1.1

* Fix for ingress conflict with tracing ingress (#7039)

* force to use proxyv2 for istio-telemetry pod (#7111)

* delete unix socket, before calling listen (#7110)

* typo: connctionDuration -> connectionDuration (#7134)

* Add RbacConfig CRD to Helm chart (#7125)

* Fix two data races (#7119)

* TCP routing cleanups/bug fixes: part 1 (#7101)

* TCP routing cleanups: part 1

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* missing sort

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* sort destination cidrs

* proxy sha update

* undo istio proxy sha

* fixups

* compile fixes

* nil check

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* nits

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Update proxy sha based on Piotrs changes

* undo Piotrs changes

* add reporterKind attribute to accesslog and tcpaccesslog (#7147)

* Improve outbound listener conflict messages (#7113)

This is needed to better support e2e testing of conflicts.

* Update accesslog config (#7158)

* Update hub and tag in values.yaml in use daily releases (#7116)

* add latest tag to values.yaml

* switch latest in istioctl

* configurable mixer port (#7156)

Signed-off-by: Kuat Yessenov <kuat@google.com>

* Adding unit test for TCP over TCP conflicts. (#7160)

* Adding unit test for TCP over TCP conflicts.

* addressing comments.

* addressing comments.

* addressing comments.

* Assorted small fixes for setup (#7123)

* Assorted small fixes

* spaces

* Fix certmanager (#7141)

* Adding e2e test for outbound listener conflicts (#7173)

* remove static route generation. improve gateway validation (#7175)

* remove static route generation

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* more validation for gateways

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* validation fixes

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* fix test

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Updating Kiali to v0.5.0 in Ansible Installer (#7042)

* Multicluster make zipkin conditional (#7085)

This change makes the zipkin arguement in
the sidecar config map to be conditional
based on zipkin being used in remote clusters.

* Yet another place of the connctionDuration typo (#7177)

* Put back citadel templates and re-generate them when running updateVersion.sh (#7178)

* make bookinfo sample build repeatable (#7152)

* make bookinfo sample build repeatable

* update bookinfo image references to 1.8

* delete me later-- reference temporary images

* remove v1 registration from readiness probe and use debu/endpointz (#7189)

* bookinfo fixes (#7109)

* update bookinfo productpage service

* update bookinfo reviews service

* update routing rules

* fix me after-- update image references

* update e2e tests

* fix buildinfo build bug from #6988

* Cleaning up inject test files (#7188)

There are a lot of files and its confusing which files are used by
inject_test vs webhook_test. Splitting out the files into two
subdirectories under testdata/ to make it clear.  Making copies of
files that are shared.

* add TCP mixerfilter test (#7172)

Signed-off-by: Kuat Yessenov <kuat@google.com>

* Fix broken ingress (#7144)

* Fix ingress

* Move merging ingress into VirtualService and dedup to separate PR. Add back the previously and still broken gateway generation

* Another attempt to avoid duplicated vhosts and merge ingress

* Missed a pointer

* Remove again mergeIngress

* Improved messages. Will need to be turned into metrics soon

* Merge, assorted fixes

* Leave the push param in gateay, it will be needed later

* deps: update proxy (#7196)

* update sha

Signed-off-by: Kuat Yessenov <kuat@google.com>

* merge

Signed-off-by: Kuat Yessenov <kuat@google.com>

* fix tests

Signed-off-by: Kuat Yessenov <kuat@google.com>

* typo

Signed-off-by: Kuat Yessenov <kuat@google.com>

* rbac: add e2e tests for envoy based rbac. (#7193)

See #6377.

* Cleanup default metrics configuration (#7034)

* WIP: Cleanup default metrics configuration

* Remove last vestige of destination_namespace

* Update accesslog config

* Update syntax used to configure xDS servers. (#7201)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* Update go-control-plane to 8cef83f9. (#7202)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* bump bookinfo version (#7195)

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Add contents of install/ansible to release script (#7194)

Fixes: #4359, istio/old_issues_repo#223

* update for 1.0.0 path (#7088)

* update api sha (#7204)

* Delete namespace after tests when using helm (#7051)

* Delete namespace after tests when using helm

Signed-off-by: Neil Hickey <nhickey@pivotal.io>

* Do not return error if namespace deletion fails

If the helm chart being deleted has the namespace manifest then the HelmDelete would've already deleted the namespace

* Add an e2e test for RedisQuota Adapter  (#6720)

* Add an e2e test for RedisQuota Adapter

Fix RedisQuota Unit Test and lint error in kuberntes.go

Remove additional log added in redisquota adapter
delete the route rule after test finishes in  mixer_test.go

Fix vectorValue function and remove local port forwarding to redisquota server

Add an e2e test for RedisQuota Adapter

Fix RedisQuota Unit Test and lint error in kuberntes.go

Remove additional log added in redisquota adapter
delete the route rule after test finishes in  mixer_test.go

Add an e2e test for RedisQuota Adapter

Fix RedisQuota Unit Test and lint error in kuberntes.go

Remove additional log added in redisquota adapter
delete the route rule after test finishes in  mixer_test.go

Fix vectorValue function and remove local port forwarding to redisquota server

Add an e2e test for RedisQuota Adapter

Fix RedisQuota Unit Test and lint error in kuberntes.go

Remove additional log added in redisquota adapter
delete the route rule after test finishes in  mixer_test.go

Remove destination version label for quota test

Make calculations more tight for RedisQuota

Format test

Format test

* Fix tests based on  metrics change

* Scale istio-policy pod to 2 for redisquota test

* Fix spelling mistake in comment in kube_utils.go

* Fix lint error with mixer_test.go

* Fix lint error with mixer_test.go

* Fix path for networking and policy rules

* Move Kubernetes Ansible to proper location (#7207)

* Fixing a wrong Json key for the CreationTimestamp field (#7206)

* rename key for istio-remote chart repository->hub (#7209)

* Move "repository" to "hub" and hardcode image name (#7187)

* Move "repository" to "hub" and hardcode image name

The imagename hard code comes from the pattern laid down in certmanager.
I'm not super keen on this as it makes determining the image names for airgapped
installs a bit more difficult.  The good news is we can always add an "image"
field for each service if we want to expose that information in values.yaml
or just document it appropriately.

* Move hyperkube back to globals

* Fix a typo

* Fix error in change

* Sort configs by creation time before returning (#7216)

* Sort configs by creation time before returning

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* format

* Moved RBAC CRDs to group rbac.istio.io from config.istio.io (#6874)

* Moved RBAC CRDs to group rbac.istio.io from config.istio.io

* Review comments addressed

* Modify global RbacConfig group too

* Updated new files added in recent PRs

* Updated a recently added RbacConfig CRD too

* rbac.istio.io/v1alpha2 -> rbac.istio.io/v1alpha1

* Change protos version to v1alpha1

* Updated a test file that was recently added

* Add useful warnings when quota specs do not match (#7226)

* Add named logging scope for model

* Add warning and tests to QuotaSpecByDestination

1. Add warning messages under correct conditions to inform
  user that some common error conditions were encountered.
2. Add test coverage for QuotaSpecByDestination.
3. Add previously omitted support for simple wildcard match.

* review comments

* update envoy sha - fix memory leak (#7232)

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* More debug/troubleshooting (#7227)

* More debug/troubleshooting

* Build error

* Fix istio-remote sidecar-injector-configmap rendered yaml parse error (#7245)

Fixes: #7244

* use receive only channel (#6960)

* Tool to aggregate third party licenses (#7220)

* Tool to aggregate third party licenses

* Clean up and add match info as an option

* Add readme, change binary name

* Update readme, make --branch optional

* Don't run licensee if not required

* Lint

* part 2 TCP routing cleanups/bug fixes for header matcher (#7236)

* update envoy sha - fix memory leak

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Fixing assorted bugs in service entries using filter chain matches

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* format

* bug fix

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* lint

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* lint

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* multicluster bug fixes

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* backward compatibility

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* use new header match specifier

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* mixer: add sidecar health probe (#7102)

* sidecar health

Signed-off-by: Kuat Yessenov <kuat@google.com>

* typos

Signed-off-by: Kuat Yessenov <kuat@google.com>

* remove tracing of health checks

Signed-off-by: Kuat Yessenov <kuat@google.com>

* change to 15093

Signed-off-by: Kuat Yessenov <kuat@google.com>

* Add proxy version to the proxy-status command (#7269)

* use new websocket option and remove deprecated one (#7247)

* use new websocket option and remove deprecated one

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* backward compatibility

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* consistency

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* cleanup

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* dont append upgrade configs

* remove requestedServerName attribute (#7278)

it does not work, requires debugging and possibly implementation, will not be in time before the release

* Handle virtual service sni_hosts matches in gateway. (#7192)

* dirty poc

* working poc

* comments + cleanup

* lint

* add simple e2e test to egress

* minor refactoring

* cleanups, lots of comments

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* more duplication for clarity

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* simplifying gateway opaque tcp logic

* final cleanups

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* nil pointer check

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Add add operation into expression language (#6776)

* add op.

* fix duplicated test.

* Mixer Conditional Quota (#7265)

* add tests for conditional and unknown quota

* add conditional and unknown quota

* Option to disable pilot sidecar (#7280)

* Remove whitespace between host: and port in 'istioctl authn tls-check' (#7084)

* Remove whitespace between host: and port

* Typo

err := s.pushAll(con, pushEv)
if err != nil {
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reading through code and randomly noticed this) just to make sure, is it intentional that this returns nil instead of err?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Block automatic merging of a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants