chore(influxv2plugin): Increase accepted retry-after header values.#9619
chore(influxv2plugin): Increase accepted retry-after header values.#9619ssoroka merged 3 commits intoinfluxdata:masterfrom
Conversation
plugins/outputs/influxdb_v2/http.go
Outdated
| // take the highest value of backoff and retry-after. | ||
| retry := math.Max(backoff, retryAfterHeader) | ||
| retry = math.Min(retry, defaultMaxWait) | ||
| return time.Duration(retry) * time.Second |
There was a problem hiding this comment.
Because of the rounding that happens here when converting to time.Duration (int64), the first 6 retries have a retryDuration of 0s meaning there's no backoff at all.
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
|
Do we have a use case where 60s is not enough time for the back-off, or is this more academic? I have a lot of reservations about respecting really high retry-after durations. What's the max retry-after that influxdb is going to respond with? how does it calculate the duration? |
I didn't change the max backoff (still 60s) but yes, today, InfluxCloud2 may respond with a retry-after of Most customers don't use telegraf for deletes (none?) but the same logic applies for the other cloud2 rate limits - the retry-after may be as large as 5 minutes. It's just wasted effort all around to resend a message at 60s if telegraf already knows it's going to be rejected until e.g. 120s have gone by. |
reimda
left a comment
There was a problem hiding this comment.
The influxdb service won't return a large Retry-After header value without good reason. It's not the client's place to second guess the reason and ignore the header value. Looks good to me.
|
I believe this PR fixes: #9353 |
Updates the influxdata v2 output plugin that is used for influx cloud2 to respect larger retry-after header values. The current value of
60sis well below the possible values that influx cloud2 may provide. The current largest value seen is 5 minutes.This PR increases accepted
Retry-Afterheader values to 10 minutes.I know there is a limited cache size for buffered/delayed write requests and that this cache will begin to drop the oldest writes when the cache is full. However, if the server says that it will not accept a failed request until after 4 minutes (for example) in a
retry-after, there is no benefit from trying it every 60s. That scenario creates unnecessary network and server load to reject a request that the client already knows will be rejected based on theretry-afterheader.retry-afterheader is specifically used in cloud2 by rate limiters. If a write/read/delete request is denied because the customer has exceeded a rate limit, the 429 response will include theretry-afterheader specifying when the resource will be available again.Required for all PRs:
resolves #