Conversation
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
| response.Ctx.Put(retryKey, strconv.Itoa(retries+1)) | ||
| retryAfter, convErr := strconv.Atoi(response.Headers.Get("Retry-After")) | ||
| if convErr == nil { | ||
| time.Sleep(time.Duration(retryAfter) * time.Second) |
There was a problem hiding this comment.
hmmmmm I think we need to have global retry. You see that will not work if others for same host will try in the same time 🤔
There was a problem hiding this comment.
Oh! I didn't think of this before! Will try to implement global retry!
There was a problem hiding this comment.
I thought about this and tested it a bit. I think we never make requests to the same host again or at the same time due to our present caching layer and retries can only occur after request! So this should work. 🙂
There was a problem hiding this comment.
I think it's fine for now, let's iterate (global retry will be hard to implement), but I think it's needed, paths are cached not hosts
| time.Sleep(time.Duration(retryAfter) * time.Second) | ||
| } | ||
|
|
||
| retryErr := response.Request.Retry() |
There was a problem hiding this comment.
I might think we need default back of maybe?
There was a problem hiding this comment.
Should this be in default?
There was a problem hiding this comment.
well default if no retry-after is set? maybe something like 1s
There was a problem hiding this comment.
Oh okay! Will try to implement this!
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
| switch response.StatusCode { | ||
| case http.StatusTooManyRequests: | ||
| if retries <= 0 { | ||
| var retryAfter int |
There was a problem hiding this comment.
| var retryAfter int | |
| var retryAfterSeconds int |
If it's int not time.Duration we need to mention unit in var name
| if convErr != nil { | ||
| retryAfter = 1 | ||
| } | ||
| time.Sleep(time.Duration(retryAfter) * time.Second) |
There was a problem hiding this comment.
| time.Sleep(time.Duration(retryAfter) * time.Second) | |
| select { | |
| case <- time.After(time.Duration(retryAfter) * time.Second): | |
| case <- response.Context.Done(): | |
| return | |
| } |
Let's also mock time.After so we can inject different times
There was a problem hiding this comment.
Colly allows us to pass in context and use that, but it isn't released yet. So pinning to needed commit for now! 🙂https://pkg.go.dev/github.com/gocolly/colly/v2@v2.1.1-0.20201013153555-8252c346cfb0#Collector
| v.remoteLinks[response.Ctx.Get(originalURLKey)] = errors.Wrapf(err, "%q rate limited even after retry; status code %v", response.Request.URL.String(), response.StatusCode) | ||
| } | ||
| case http.StatusMovedPermanently, http.StatusTemporaryRedirect, http.StatusServiceUnavailable: | ||
| if retries <= 0 { |
There was a problem hiding this comment.
Let's make more important block being left indendented
| retries, _ := strconv.Atoi(retriesStr) | ||
| switch response.StatusCode { | ||
| case http.StatusTooManyRequests: | ||
| if retries <= 0 { |
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
bwplotka
left a comment
There was a problem hiding this comment.
I think retrying will be annoying if we cannot interrupt it. Do you mind hooking the context properly. It will also show if we put it in right place... (:
| // MustNewValidator returns mdformatter.LinkTransformer that crawls all links. | ||
| func MustNewValidator(logger log.Logger, linksValidateConfig []byte, anchorDir string) mdformatter.LinkTransformer { | ||
| v, err := NewValidator(logger, linksValidateConfig, anchorDir) | ||
| v, err := NewValidator(context.TODO(), logger, linksValidateConfig, anchorDir) |
There was a problem hiding this comment.
We can currently interrupt retry. For instance, testing a file with envoyproxy link and setting retryAfterSeconds to 10, gives the following results. First is when I let it run and retry, second is when I interrupt,
time mdox fmt -l test.md
error: test.md: "https://envoyproxy.io" not accessible even after retry; status code 301: Moved Permanently
mdox fmt -l test.md 0.14s user 0.06s system 1% cpu 14.995 total
ime mdox fmt -l test.md
^Cinfo: caught signal. Exiting.: interrupt
mdox fmt -l test.md 0.12s user 0.05s system 2% cpu 6.762 total
So we can cancel retry without having to wait. But I'm not sure what would be the correct way to hook context?
There was a problem hiding this comment.
Looks like it works but we don't know why, right? That's not the best case too ;p Anyway, let's merge and think later (: I think we pass context to colly somewhere else?
There was a problem hiding this comment.
Yes, we pass it here! That's why we can cancel without waiting. Also passing it is sort of weird as colly also has a context package.

This PR adds retries to link checking based on status code. For status code 429, it uses the
Retry-Afterheader to wait before retrying and retries normally for status code 301, 307 and 503. Can try a maximum of three times before returning error. This can increase the time of processing docs if rate limit is encountered but not too excessively.Fixes #11.
Some questions:
--retryflag so that user can en/disable and set number of retries?Retry-Afterheader works well for sites like GitHub which returns 429 withRetry-Afterset to 60s. But other sites might make value excessive, so should a limit be placed?