Skip to content

agent/cache: stop refreshing a cache entry after some number of failed attempts #9588

@dnephin

Description

@dnephin

Related to #8541 (and other issues I can't find right now).

We've had a few reports of the logs being filled with errors caused by the background refresh of agent/cache entries. The agent/cache has a TTL of 72 hours. So when something is removed it can produce a lot of logging noise for days. This problem can also cause the agent to OOM. If many requests for non-existent data are made, then we start a goroutine for each. Eventually all those goroutine can OOM the agent.

Instead of attempting to refreshing the entry for 72 hours , and getting an error every time, we should stop refreshing after either some number of attempts, or after some period of time without a success.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions