[AWS][Cloudfront Logs] - Implemented GROK processor based ipv6/v4 parsing in AWS Cloudfront Logs data stream#11829
Conversation
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
kcreddy
left a comment
There was a problem hiding this comment.
There are several IPV6 addresses that are failing with current change, but are successfully matched by previous regex.
Examples:
1::
1:2:3:4:5:6:7::
1::7:8
1:2:3:4:5::7:8
1::6:7:8
1:2:3:4::6:7:8
1::5:6:7:8
1:2:3::5:6:7:8
1::4:5:6:7:8
1:2::4:5:6:7:8
1::3:4:5:6:7:8
::2:3:4:5:6:7:8
Examples taken from here
|
@ShourieG , would be it possible to replace the script processor with |
@kcreddy, should we try it for both ipv4/v6 or only ipv6 and leave ipv4 as it is currently ? |
🚀 Benchmarks reportTo see the full report comment with |
|
@kcreddy, have updated the logic with a relevant GROK processor but still some scripts were required for cleanup and handling the localhost edge-case scenario. There are some removals in the ip section based on the GROK patterns. |
There was a problem hiding this comment.
We could have the same conditional thats used in original script processor.
if: ctx._tmp?.x_forwarded_for != null && ctx._tmp.x_forwarded_for != '-'
packages/aws/data_stream/cloudfront_logs/_dev/test/pipeline/test-cloudfront.log-expected.json
Outdated
Show resolved
Hide resolved
kcreddy
left a comment
There was a problem hiding this comment.
Just a nit, LGTM 👍🏼
Could you also add the exact error to the PR.
There was a problem hiding this comment.
Could possibly take the array-existence check to the base pipeline so that it is not run for every entry.
Or could also change script to append with allow_duplicates: false to be cleaner.
Same for invalid IPs.
efd6
left a comment
There was a problem hiding this comment.
Suggested commit message body.
The regex pattern used for the ipv6 was overly complex and caused errors in
Elasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4
parsing by adding a new helper pipeline invoked iteratively over ip fields
via a foreach. Also add a painless script to handle 'localhost' literal
addresses.
Also run elastic-package format.
I think that probably the format should have been in a separate PR.
There was a problem hiding this comment.
| description: handle 'localhost' edge case scenario, currently not handled via grok | |
| description: Handle 'localhost' edge case, currently not handled via grok. |
|
LGTM! Just a suggestion: Can we consider adding a sample or samples to the pipeline test to capture the pattern observed in the reported issue, handling of localhost etc? |
agithomas
left a comment
There was a problem hiding this comment.
Approved with comments.
💚 Build Succeeded
History
cc @ShourieG |
|
Current tests already contain the localhost pattern @agithomas. The reported issue on the other hand was reporting errors with the existing regex pattern when compiling in elasticsearch. |
|
Package aws - 2.32.0 containing this change is available at https://epr.elastic.co/package/aws/2.32.0/ |
…sing in AWS Cloudfront Logs data stream (elastic#11829) * The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses. * Also run elastic-package format.
…sing in AWS Cloudfront Logs data stream (elastic#11829) * The regex pattern used for the ipv6 was overly complex and caused errors inElasticsearch. This change simplifies and fixes the grok-based IPv4/IPv4 parsing by adding a new helper pipeline invoked iteratively over ip fields via a foreach. Also add a painless script to handle 'localhost' literal addresses. * Also run elastic-package format.


Type of change
Please label this PR with one of the following labels, depending on the scope of your change:
Proposed commit message
WHAT: Implemented GROK processor based ipv6/v4 parsing. Cleaned up formatting issues across all data streams via elastic-package format.
WHY: The regex pattern used for the ipv6 was overtly complex and caused errors in Elasticsearch. The cleanup was a byproduct of running elastic-package format.
HOW: A new helper pipeline was added containing this new grok processor and helper scripts which are invoked in a foreach processor from the parent pipeline. This approach had to be taken because with existing limitations of one sub processor per foreach and grok being unable to append to existing lists/arrays.
The Elasticsearch error produced is as follows:-
NOTE
The latest commit - elastic-packge build cleaned up a lot of formatting issues, hence there are added changes across many data streams. I think these are good in the long run hence kept them in.
Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots