We have a case where the GuardDuty datastream is collecting documents more than once. This appears to be due to the API returning the same pagination cursor more than once in a pagination sequence.
For example in the pagination sequence that starts from this seed request (identifying details obfuscated in all quoted logs):
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:12:21.233Z",
"message": "HTTP request",
"transaction.id": "... -391",
"url.original": "https://guardduty. ... .amazonaws.com/detector/.../findings",
"url.scheme": "https",
"url.path": "/detector/.../findings",
"url.domain": "guardduty. ... .amazonaws.com",
"url.port": "",
"url.query": "",
"http.request.method": "POST",
"user_agent.original": "...",
"http.request.body.content": "{\"findingCriteria\":{\"criterion\":{\"updatedAt\":{\"greaterThan\":\"1701140400000\",\"lessThan\":\"1701144000000\"}}},\"maxResults\":50,\"sortCriteria\":{\"attributeName\":\"updatedAt\",\"orderBy\":\"ASC\"}}",
"http.request.body.bytes": 183,
"http.request.mime_type": "application/json",
"ecs.version": "1.6.0"
}
We get the following request/response/request sequence (note response's http.response.body.content and http.response.body.bytes, and the http.request.body.content in the final request):
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:12:22.216Z",
"message": "HTTP request",
"transaction.id": "... -395",
"url.original": "https://guardduty. ... .amazonaws.com/detector/.../findings",
"url.scheme": "https",
"url.path": "/detector/.../findings",
"url.domain": "guardduty. ... .amazonaws.com",
"url.port": "",
"url.query": "",
"http.request.method": "POST",
"user_agent.original": "...",
"http.request.body.content": "{\"findingCriteria\":{\"criterion\":{\"updatedAt\":{\"greaterThan\":\"1701140400000\",\"lessThan\":\"1701144000000\"}}},\"maxResults\":50,\"nextToken\":\"1701142787842-1701142787000-md5sum(4b32102b95b3986057cc16a660419069)\",\"sortCriteria\":{\"attributeName\":\"updatedAt\",\"orderBy\":\"ASC\"}}",
"http.request.body.bytes": 258,
"http.request.mime_type": "application/json",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:12:22.316Z",
"message": "HTTP response",
"transaction.id": "... -395",
"http.response.status_code": 200,
"http.response.body.content": md5sum(1d166755ca9ad65949bf55eee7e9b181),
"http.response.body.bytes": 1841,
"http.response.mime_type": "application/json",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:12:22.518Z",
"message": "HTTP request",
"transaction.id": "... -396",
"url.original": "https://guardduty. ... .amazonaws.com/detector/.../findings",
"url.scheme": "https",
"url.path": "/detector/.../findings",
"url.domain": "guardduty. ... .amazonaws.com",
"url.port": "",
"url.query": "",
"http.request.method": "POST",
"user_agent.original": "...",
"http.request.body.content": "{\"findingCriteria\":{\"criterion\":{\"updatedAt\":{\"greaterThan\":\"1701140400000\",\"lessThan\":\"1701144000000\"}}},\"maxResults\":50,\"nextToken\":\"1701142787857-1701142787000-md5sum(fcc045188185e1c27e14fdc6b66e2493)\",\"sortCriteria\":{\"attributeName\":\"updatedAt\",\"orderBy\":\"ASC\"}}",
"http.request.body.bytes": 258,
"http.request.mime_type": "application/json",
"ecs.version": "1.6.0"
}
In the same pagination sequence (i.e. not after a new seed request), we get the following set of three logs that show exactly the same http.response.body.content and http.request.body.content.
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:27:22.110Z",
"message": "HTTP request",
"transaction.id": "... -401",
"url.original": "https://guardduty. ... .amazonaws.com/detector/.../findings",
"url.scheme": "https",
"url.path": "/detector/.../findings",
"url.domain": "guardduty. ... .amazonaws.com",
"url.port": "",
"url.query": "",
"http.request.method": "POST",
"user_agent.original": "...",
"http.request.body.content": "{\"findingCriteria\":{\"criterion\":{\"updatedAt\":{\"greaterThan\":\"1701140400000\",\"lessThan\":\"1701144000000\"}}},\"maxResults\":50,\"nextToken\":\"1701142787842-1701142787000-md5sum(4b32102b95b3986057cc16a660419069)\",\"sortCriteria\":{\"attributeName\":\"updatedAt\",\"orderBy\":\"ASC\"}}",
"http.request.body.bytes": 258,
"http.request.mime_type": "application/json",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:27:22.219Z",
"message": "HTTP response",
"transaction.id": "... -401",
"http.response.status_code": 200,
"http.response.body.content": md5sum(1d166755ca9ad65949bf55eee7e9b181),
"http.response.body.bytes": 1841,
"http.response.mime_type": "application/json",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2023-11-28T04:27:22.417Z",
"message": "HTTP request",
"transaction.id": "... -402",
"url.original": "https://guardduty. ... .amazonaws.com/detector/.../findings",
"url.scheme": "https",
"url.path": "/detector/.../findings",
"url.domain": "guardduty. ... .amazonaws.com",
"url.port": "",
"url.query": "",
"http.request.method": "POST",
"user_agent.original": "...",
"http.request.body.content": "{\"findingCriteria\":{\"criterion\":{\"updatedAt\":{\"greaterThan\":\"1701140400000\",\"lessThan\":\"1701144000000\"}}},\"maxResults\":50,\"nextToken\":\"1701142787857-1701142787000-md5sum(fcc045188185e1c27e14fdc6b66e2493)\",\"sortCriteria\":{\"attributeName\":\"updatedAt\",\"orderBy\":\"ASC\"}}",
"http.request.body.bytes": 258,
"http.request.mime_type": "application/json",
"ecs.version": "1.6.0"
}
In the case where this is observed, this result it duplicated documents in the index. This should not be happening due to fingerprinting, but that is independent of this. We should not be receiving/collecting duplicated documents. If this is a limitation of the API we need to find a work around and if we are holding the API incorrectly, we need to fix that.
Steps to address
We have a case where the GuardDuty datastream is collecting documents more than once. This appears to be due to the API returning the same pagination cursor more than once in a pagination sequence.
For example in the pagination sequence that starts from this seed request (identifying details obfuscated in all quoted logs):
We get the following request/response/request sequence (note response's
http.response.body.contentandhttp.response.body.bytes, and thehttp.request.body.contentin the final request):In the same pagination sequence (i.e. not after a new seed request), we get the following set of three logs that show exactly the same
http.response.body.contentandhttp.request.body.content.In the case where this is observed, this result it duplicated documents in the index. This should not be happening due to fingerprinting, but that is independent of this. We should not be receiving/collecting duplicated documents. If this is a limitation of the API we need to find a work around and if we are holding the API incorrectly, we need to fix that.
Steps to address
blocked