Skip to content

Add logic for pilot to read ConfigMap to override MeshConfig#3916

Closed
andraxylia wants to merge 40 commits intomasterfrom
andra-configmap
Closed

Add logic for pilot to read ConfigMap to override MeshConfig#3916
andraxylia wants to merge 40 commits intomasterfrom
andra-configmap

Conversation

@andraxylia
Copy link
Copy Markdown
Contributor

Fixes #3826

@andraxylia andraxylia requested a review from a team March 2, 2018 21:49
// Config file either wasn't specified or failed to load - use a default mesh.
defaultMesh := model.DefaultMeshConfig()
mesh = &defaultMesh
if _, mesh, err = GetMeshConfig(s.kubeClient, kube.IstioNamespace, kube.IstioConfigMap); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the config file not specified (see below for current pilot deployment)? And when would that file read fail?

volumeMounts:
- name: config-volume
mountPath: /etc/istio/config

volumes:
- name: config-volume
configMap:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the configFile is used for VMs. In a K8s setup, it always fails, I tested it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pilot discovery is not run for VMs, only proxy.

You mean it can fail if the config map is not mounted ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't see how this works - if the config map is not yet ready, but the mount and the load would fail.

I agree we shouldn't use the defaults silently - but fail if config map is missing, but I'm not sure this (reading using the API) solves much.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.Mesh.ConfigFile should default to /etc/istio/config/mesh per

discoveryCmd.PersistentFlags().StringVar(&serverArgs.Mesh.ConfigFile, "meshConfig", "/etc/istio/config/mesh",
. Is that path not available in pilot-discovery container?

Copy link
Copy Markdown
Contributor Author

@andraxylia andraxylia Mar 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the file was not available when running pilot from my mac. When running in a pod, the configMap gets mounted under /etc/istio/config, so there is no need to re-read the map.

The problem with the connectTimeout was in fact there are two of them in the config, one at the top level, and one in defaultConfig. Only the one at the top level is taken into account for CDS.

I can close this PR, unless we want to keep the ability of falling back on reading from a cluster config map when running pilot as a process locally. It seems to work fine.

@andraxylia andraxylia requested a review from a team March 2, 2018 22:07
@rshriram
Copy link
Copy Markdown
Member

rshriram commented Mar 2, 2018

Are you sure you need an entire config map reading function ? We already read the global mesh config from a config map, and we are trying to clean up that bit (cc @loewenstein / @xiaolanz ).

If the issue is only about connect timeout not being respected, then this should already be covered as part of the new v1alpha2 destination rules (https://github.com/istio/api/blob/master/networking/v1alpha3/destination_rule.proto#L276) and the connect timeout option must be removed from the global config map.

I think that is what is happening (we stopped reading the connect timeout from the config map and use the one in DestRule - it has a default value). [ cc @ijsnellf / @GregHanson ]

@andraxylia
Copy link
Copy Markdown
Contributor Author

This is the global config map not being read by pilot. It may have got lost in the refactoring, or maybe it was never there?

It is not just about the timeout, it is also about anything in pilot that needs to be overritten. For instance, I will use this to enable or disable validate_clusters.

Until or if we get rid of the configmap, this PR is fixing a bug.

@andraxylia
Copy link
Copy Markdown
Contributor Author

andraxylia commented Mar 2, 2018

For the moment, connectTimeout is also used for v1, and as I said in my previous comment this is needed in general.

For v2, we still need a value in the config map, that the user may overwrite on a per service basis, but if I need to change something globally, I do not want to create 1000 destination rules just for this. Nor do I want to change the code. BTW, inability to tune this easily stopped me from easily debugging the 503s issues.

xiaolanz and others added 10 commits March 2, 2018 15:39
* WIP - codecov package support

* Enabling package check

* Update codecov range

* Update codecov requirement
* Use schema value for group and version for test.

* Remove obsolete comment.
* A few enhancements

* Make --output to be a directory
Add --clear_cache_stats option

* add port-forward support so that it's even easier
to use this command outside the k8s cluster

* use port-forward to download envoy config
Copy link
Copy Markdown
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This is the first step to also watch it - and it may help for pilot on VMs, where it can pull the config map from apiserver instead of having to copy it to each VM.

@istio-merge-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2018

Codecov Report

Merging #3916 into master will increase coverage by 1%.
The diff coverage is 77%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #3916    +/-   ##
=======================================
+ Coverage      76%     76%    +1%     
=======================================
  Files         297     295     -2     
  Lines       27075   26961   -114     
=======================================
+ Hits        20407   20468    +61     
+ Misses       5358    5184   -174     
+ Partials     1310    1309     -1
Impacted Files Coverage Δ
pilot/pkg/serviceregistry/kube/controller.go 69% <ø> (+5%) ⬆️
security/pkg/workload/server.go 74% <ø> (ø) ⬆️
mixer/pkg/config/crd/admit.go 59% <ø> (ø) ⬆️
pilot/pkg/serviceregistry/consul/conversion.go 91% <ø> (ø) ⬆️
pilot/pkg/kube/inject/inject.go 80% <ø> (ø) ⬆️
pilot/pkg/serviceregistry/kube/client.go 30% <ø> (ø) ⬆️
pilot/pkg/model/config.go 61% <ø> (ø) ⬆️
pilot/pkg/config/kube/crd/types.go 62% <ø> (+5%) ⬆️
pilot/pkg/serviceregistry/kube/queue.go 86% <ø> (ø) ⬆️
mixer/pkg/runtime/init.go 28% <100%> (ø) ⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 720be7a...6ac5d99. Read the comment docs.

@istio-merge-robot
Copy link
Copy Markdown

/lgtm cancel //PR changed after LGTM, removing LGTM. @andraxylia @costinm

@andraxylia andraxylia added the do-not-merge/hold Block automatic merging of a PR. label Mar 5, 2018
@andraxylia andraxylia requested a review from a team March 5, 2018 23:34
@andraxylia andraxylia closed this Mar 6, 2018
@istio-testing
Copy link
Copy Markdown
Collaborator

istio-testing commented Mar 6, 2018

@andraxylia: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-unit-tests.sh 6ac5d99 link /test istio-unit-tests
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ldemailly
Copy link
Copy Markdown
Member

can you link the replacement PR here or in #3826, tia ? (with just the change you want to make, this one seemed like it had unrelated stale commits...)

@andraxylia
Copy link
Copy Markdown
Contributor Author

@ldemailly Closed this because the set of files went into a bad state after the rebase. Opened instead #4024 from exactly the same branch, but the set of files changed is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. do-not-merge/hold Block automatic merging of a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.