Skip to content

Fix Pilot port number config for Ingress and ConfigMap#4066

Merged
ldemailly merged 1 commit intoistio:masterfrom
myidpt:fixit2
Mar 9, 2018
Merged

Fix Pilot port number config for Ingress and ConfigMap#4066
ldemailly merged 1 commit intoistio:masterfrom
myidpt:fixit2

Conversation

@myidpt
Copy link
Copy Markdown

@myidpt myidpt commented Mar 8, 2018

Partially fixes #3757.
Pilot port number in mesh config and ingress are wrongly configured as 8080 when mTLS is configured. It should be 15005.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change it to optional: false instead of removing it fully. ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"s/optional: \(.*\)/optional: false/"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor

@wattli wattli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@istio-merge-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wattli

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@mandarjog
Copy link
Copy Markdown
Contributor

/test e2e-suite-rbac-auth

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 8, 2018

Codecov Report

Merging #4066 into master will increase coverage by 1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #4066    +/-   ##
=======================================
+ Coverage      75%     75%    +1%     
=======================================
  Files         305     305            
  Lines       27871   27962    +91     
=======================================
+ Hits        20726   20940   +214     
+ Misses       5827    5699   -128     
- Partials     1318    1323     +5
Impacted Files Coverage Δ
mixer/adapter/stackdriver/helper/common.go 70% <0%> (-8%) ⬇️
mixer/adapter/prometheus/prometheus.go 80% <0%> (-3%) ⬇️
mixer/adapter/servicecontrol/testhelper.go 72% <0%> (-2%) ⬇️
pilot/pkg/config/kube/crd/controller.go 77% <0%> (-1%) ⬇️
mixer/adapter/kubernetesenv/cache.go 85% <0%> (-1%) ⬇️
mixer/adapter/servicecontrol/servicecontrol.go 62% <0%> (ø) ⬇️
mixer/adapter/list/list.go 100% <0%> (ø) ⬇️
mixer/adapter/stdio/zap.go 100% <0%> (ø) ⬆️
mixer/adapter/dogstatsd/dogstatsd.go 100% <0%> (ø) ⬆️
mixer/adapter/list/ipList.go 100% <0%> (ø) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90ed026...aa7dd93. Read the comment docs.

@istio-merge-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@mandarjog
Copy link
Copy Markdown
Contributor

mandarjog commented Mar 8, 2018

2018-03-08T01:35:12.369997Z	info	Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 2 --parent-shutdown-time-s 3 --service-cluster istio-ingress --service-node ingress~~istio-ingress-66fb78d8c4-tdmj5.simple-auth-test-2a76e7f23f334f1f922f~simple-auth-test-2a76e7f23f334f1f922f.svc.cluster.local --max-obj-name-len 189 -l info --v2-config-only]
[2018-03-08 01:35:12.396][18][info][main] external/envoy/source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.256.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-03-08 01:35:12.404][18][critical][main] external/envoy/source/server/server.cc:71] error initializing configuration '/etc/istio/proxy/envoy-rev0.json': unable to read file: /etc/certs/root-cert.pem
2018-03-08T01:35:12.405985Z	warn	Epoch 0 terminated with an error: exit status 1

@myidpt @wattli do not merge ?
eventually

018-03-08T01:36:24.249865Z	info	Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 2 --parent-shutdown-time-s 3 --service-cluster istio-ingress --service-node ingress~~istio-ingress-66fb78d8c4-tdmj5.simple-auth-test-2a76e7f23f334f1f922f~simple-auth-test-2a76e7f23f334f1f922f.svc.cluster.local --max-obj-name-len 189 -l info --v2-config-only]
[2018-03-08 01:36:24.261][50][info][main] external/envoy/source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.256.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-03-08 01:36:24.265][50][info][config] external/envoy/source/server/configuration_impl.cc:52] loading 0 listener(s)
[2018-03-08 01:36:24.265][50][info][config] external/envoy/source/server/configuration_impl.cc:92] loading tracing configuration
[2018-03-08 01:36:24.265][50][info][config] external/envoy/source/server/configuration_impl.cc:106]   loading tracing driver: envoy.zipkin
[2018-03-08 01:36:24.265][50][info][config] external/envoy/source/server/configuration_impl.cc:119] loading stats sink configuration
[2018-03-08 01:36:24.265][50][info][main] external/envoy/source/server/server.cc:359] starting main dispatch loop
[2018-03-08 01:36:24.288][50][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:128] cm init: initializing cds
[2018-03-08 01:36:24.291][50][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-03-08 01:36:24.291][50][info][main] external/envoy/source/server/server.cc:343] all clusters initialized. initializing init manager
[2018-03-08 01:36:24.292][50][info][config] external/envoy/source/server/listener_manager_impl.cc:579] all dependencies initialized. starting workers
[2018-03-08 01:36:24.292][50][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error
[2018-03-08 01:36:25.771][50][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error
[2018-03-08 01:36:26.813][50][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error

@istio-merge-robot
Copy link
Copy Markdown

/lgtm cancel //PR changed after LGTM, removing LGTM. @myidpt @wattli

@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 8, 2018

@mandarjog How did you get the Envoy log?

@ldemailly
Copy link
Copy Markdown
Member

@myidpt it has not changed (and thankfully is still working well on prow... for now...): go to the failed build link, click artifact and click on the namespacename and then the pod's log

@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 8, 2018

@ldemailly I found it on prow, thanks.

@myidpt myidpt changed the title Pilot always starts with cert when auth is enabled. [WIP] Pilot always starts with cert when auth is enabled. Mar 8, 2018
@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 8, 2018

I saw the TestSimpleIngress still exhaust the retries. I will try to flip "optional: true" for secret mount on ingress (and maybe also others) to see the result.

@myidpt myidpt requested a review from a team March 9, 2018 00:03
@myidpt myidpt changed the title [WIP] Pilot always starts with cert when auth is enabled. [DO NOT MERGE] Pilot always starts with cert when auth is enabled. Mar 9, 2018
@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 9, 2018

@kyessenov
Copy link
Copy Markdown
Contributor

kyessenov commented Mar 9, 2018 via email

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be a wait for the cert not a sleep

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just experimental :) I fixed it.

@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 9, 2018

Thanks to Kuat. We found the issue is due to --discoveryAddress is set to 8080 on ingress:
https://storage.googleapis.com/istio-prow/pull/istio_istio/4122/e2e-simpleTests/837/artifacts/simple-auth-test-4fc511534e9c42c7955e/yaml/istio-one-namespace-auth.yaml

It's a problem in updateVersion.sh

@ldemailly
Copy link
Copy Markdown
Member

ldemailly commented Mar 9, 2018

the cert and startup order still is an issue, glad you found both, let's fix both
(so the ingress is ready in less than 60s in addition to talking to pilot)

@myidpt myidpt changed the title [DO NOT MERGE] Pilot always starts with cert when auth is enabled. Fix Pilot port number config for Ingress and ConfigMap Mar 9, 2018
@mandarjog
Copy link
Copy Markdown
Contributor

That is great sleuth work @Kuat and @myipdt.
More reasons to move off updateVersion to helm!

@mandarjog
Copy link
Copy Markdown
Contributor

/test e2e-suite-rbac-auth

@kyessenov
Copy link
Copy Markdown
Contributor

kyessenov commented Mar 9, 2018 via email

@ldemailly
Copy link
Copy Markdown
Member

While the port number is clearly the most egregious bug and it's fantastic to solve it asap

I'd like to not lose the fact that #3757 is how that bug sneaked in undetected behind the "oh ingress fails already anyway" - so we do need to start pilot's envoy only when the certs are there and get those certs there faster by starting CA + service accounts first.

@myidpt
Copy link
Copy Markdown
Author

myidpt commented Mar 9, 2018

@ldemailly I will have a PR to fix the proxy bootstrapping issue as well.

@mandarjog
Copy link
Copy Markdown
Contributor

@cmluciano
Copy link
Copy Markdown
Member

/retest

@istio-testing
Copy link
Copy Markdown
Collaborator

istio-testing commented Mar 9, 2018

@myidpt: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-unit-tests.sh aa7dd93 link /test istio-unit-tests
prow/e2e-suite-rbac-auth.sh aa7dd93 link /test e2e-suite-rbac-auth
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ldemailly ldemailly merged commit 80c38f0 into istio:master Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regression/change: certs now take > 1min to show up (causing lds: fetch failure: network error and ingress tests failure)

10 participants