Whenever we fail to connect or disconnect before getting a response from Fleet on any API endpoint we should:
- Include more detail about the connection
- How long was the connection open
- When there's a DNS failure, log the DNS transaction
- Include more detail about how Agent is going to handle the failure
- Is the agent going to retry, and if so, how long from now?
We should also consider making these logs less scary to users since they are expected to happen from time to time. I propose that we move the current logging from warn to info with language to explain it's intermittent and only log at the warn level if we've encountered the same issue 3 out of the last 5 attempts. Today it will skip the first instance and log only if it happens twice in a row
Whenever we fail to connect or disconnect before getting a response from Fleet on any API endpoint we should:
We should also consider making these logs less scary to users since they are expected to happen from time to time. I propose that we move the current logging from
warntoinfowith language to explain it's intermittent and only log at thewarnlevel if we've encountered the same issue 3 out of the last 5 attempts. Today it will skip the first instance and log only if it happens twice in a row