Skip to content

Log ingestion error on 206 response#2065

Merged
trask merged 11 commits into
mainfrom
log-206-partial-failure-messages
Jan 25, 2022
Merged

Log ingestion error on 206 response#2065
trask merged 11 commits into
mainfrom
log-206-partial-failure-messages

Conversation

@trask

@trask trask commented Jan 21, 2022

Copy link
Copy Markdown
Member

Related to #2064, which we never noticed since we didn't log it.

Now the error in #2064 would be logged as:

2022-01-21 15:30:10.108-08:00 WARN  c.m.a.a.i.telemetry.TelemetryChannel - Sending telemetry to the ingestion service (telemetry will be stored to disk on failure and retried later): 109: Field 'message' on type 'ExceptionDetails' is required but missing or empty. Expected: string, Actual: undefined (future warnings will be aggregated and logged once every 5 minutes)

@trask trask requested review from heyams and kryalama as code owners January 21, 2022 23:33
@trask

trask commented Jan 21, 2022

Copy link
Copy Markdown
Member Author

if you hide whitespace in the diff it will help

case 500: // INTERNAL SERVER ERROR
case 503: // SERVICE UNAVAILABLE
case 429: // TOO MANY REQUESTS
// TODO (heya) should we write to disk on any of these response codes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure... if we don't write to disk, we should consider tracking how many requests we drop. Let's think about it.. need to come up with new property in customdimension.. it can be part of network statsbeat..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the todo to track data drop via statsbeat?

@trask trask changed the title Fix ingestion error on missing exception message Log ingestion error on 206 response Jan 22, 2022
@trask trask requested a review from heyams January 24, 2022 23:07
Comment on lines +305 to +314
case 408: // REQUEST TIMEOUT
case 429: // TOO MANY REQUESTS
case 500: // INTERNAL SERVER ERROR
case 503: // SERVICE UNAVAILABLE
operationLogger.recordFailure(
"received response code "
+ statusCode
+ " (telemetry will be stored to disk and retried later)");
onFailure.accept(true);
break;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the list of response codes that 2.x used to retry on, I think it's a good list, not sure when we diverged from it

@trask trask merged commit fbb276e into main Jan 25, 2022
@trask trask deleted the log-206-partial-failure-messages branch January 25, 2022 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants