Skip to content

Conversation

@teemuteemu
Copy link

Fixes #4531, and likely containerd/cri#1419 (comment) is related as well.

Seems like retry currently works only for 404. Not sure if this is the best way to fix it but I tried to follow previous discussions (#3850 & #3868) and understood that retry always on error would be the way to go.

Let me know what you think.

… error code.

Signed-off-by: Teemu Kallio <teemu.kallio@pm.me>
@k8s-ci-robot
Copy link

Hi @teemuteemu. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Sep 18, 2020

Build succeeded.

@mxpv
Copy link
Member

mxpv commented Sep 19, 2020

/ok-to-test

@dmcgowan
Copy link
Member

More error handling logic needs to be done here to if expanding the status code which are handled. The logic of the for loop assumes that lastErr == nil at the end meant that either no registries were given or each registry returned a 404. For non-404 cases, lastErr should at least be set and used to indicate the first unexpected registry response.

@teemuteemu
Copy link
Author

I see, I'll take a better look at the error handling.

@JrCs
Copy link

JrCs commented Dec 9, 2020

@teemuteemu any news ?

@ggilley
Copy link

ggilley commented Jan 15, 2021

@teemuteemu @dmcgowan How about something like:

			if resp.StatusCode > 299 {
				// in case of error try next host
				if resp.StatusCode > 399 {
					if resp.StatusCode != http.StatusNotFound {
						lastErr = errors.Errorf("unexpected status code %v: %v", u, resp.Status)
					}
					continue
				}
				return "", ocispec.Descriptor{}, errors.Errorf("unexpected status code %v: %v", u, resp.Status)
			}

@mariuskiessling
Copy link

@teemuteemu @dmcgowan How about something like:

			if resp.StatusCode > 299 {
				// in case of error try next host
				if resp.StatusCode > 399 {
					if resp.StatusCode != http.StatusNotFound {
						lastErr = errors.Errorf("unexpected status code %v: %v", u, resp.Status)
					}
					continue
				}
				return "", ocispec.Descriptor{}, errors.Errorf("unexpected status code %v: %v", u, resp.Status)
			}

We run this patch in production for a few weeks now and it works great without any issues. We run Kraken as a p2p caching layer for image pulls within clusters that sometimes has some hiccups. This patch allows a much more reliant failover in these cases to directly contact our internal registry.

@teemuteemu Can you include the error handling changes in your PR? I would love to see this change in upstream containerd and stop maintaining the patch for new releases.

@ggilley
Copy link

ggilley commented Apr 3, 2021

@mariuskiessling Maybe you should create a new PR since @teemuteemu doesn't seem to be responding. And you have proof of it working :-)

@mariuskiessling
Copy link

@ggilley I would but it seems like that @haslersn has already done that. Thus, I am for closing this PR in favour of #5275.

@dmcgowan
Copy link
Member

dmcgowan commented Apr 5, 2021

agreed, closing in favor of #5275

@dmcgowan dmcgowan closed this Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Registry mirror fallbacks immediately errors if response code is 5xx instead of connection timeout/fail

7 participants