Update: Add support for x_forwarded_for headers in apaches access logs#3251
Update: Add support for x_forwarded_for headers in apaches access logs#3251redcinelli merged 17 commits intoelastic:mainfrom redcinelli:apache/support_x_forwarded_for_headers
Conversation
🌐 Coverage report
|
|
I believe my pr to be good for review 🤞 At least open for discussion. Btw seeing the whole mess the |
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/_dev/test/pipeline/test-access-vhost.log-expected.json
Outdated
Show resolved
Hide resolved
|
@andrewkroh I believe it is worth taking a look again ^^ Thanks a lot for helping me. |
andrewkroh
left a comment
There was a problem hiding this comment.
I was reviewing this from technical perspective on the code/config in the pipeline. I didn't really consider the big picture with respect to ECS mappings, the field meanings, and whether this is a breaking change.
Taking a step back and looking at the approach here, this looks like a change in behavior w.r.t. to how source is populated. IMO the source.address should be the IP (or hostname) where traffic came from (the last hop). That is %{c}h or %h or (barring any changes from things like mod_remoteip) in mod_log_config format. This would be the same source address observed if we looked at a packet capture.
The ECS network.forwarded_ip would be populated only when X-Forwarded-For is used because this field is only used when the source address represents a proxy. I think the first address in the X-Forwarded-For list would always be used to fill network.forwarded_ip.
I think the new apache.access.remote_ip_list field should be the x-forwarded-for list plus the remote hostname (%h value). This would mean that all the hops from the client through the last proxy are accounted for in the list. And given the reference info you pointed me to, perhaps the name of this field is better as apache.access.remote_addresses (data type keyword) so that we can take the raw values without caring if they are IPs.
In order to have both %h and %{X-Forwarded-For}i available in the log I think we should prescribe formats in the docs that it will accept. We want to continue accepting the current format as is and we also want to allow X-Forwarded-For to be added by users. So perhaps we should allow another header to be tacked on to the end like:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" X-Forwarded-For=\"%{X-Forwarded-For}i\"
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
andrewkroh
left a comment
There was a problem hiding this comment.
Overall LGTM, but I think there is one thing missing -- docs. Users need to know how to configure there HTTPD service to output one of the formats the integration expects. Can you take a look at the readme (should be in packages/apache/_dev/...) and see if you can add a blurb about the accepted formats.
andrewkroh
left a comment
There was a problem hiding this comment.
The doc additions look good. The package needs to be build to update the generated readme file (e.g. elastic-package build). Then there is a conflict with the changelog and version that need resolved.
andrewkroh
left a comment
There was a problem hiding this comment.
LGTM.
This will require a review from @elastic/obs-service-integrations.
|
@elastic/obs-service-integrations Hi guys ? have you had the time to look at this PR ? I'll fix the conflict asap |
This Pr is heavily inpired by the work done in [this PR](elastic/beats#4417) It is adressing [this ER](elastic/enhancements#14402). `Grok` pattern has been updated to match logs starting with a list of IP adresses and store all those ip in `apache.access.remote_ip`. This pattern is heavily insipred by the one in the nginx integration. I also decided to fill a new field `network.forwarded_ip` as it seems to be the perfect fit.
It want the expression ot be as lean as possible.
I removed the ?: non capturing group
- we want to capture it
- It was ignored somehow
I removed the %{NOTSPACE:source.address} match :
- I am not sure why it is was there in the first place
As x-forwarded-for mostly send ip, grok pattern is updated, to match only ip. To improve searchability remote_ip type is set to ip
apache.access.remote_adresses contain all IP/HOST the request rebounded of X-Forwarded-For is optional, should be tacked a the end of the logs network.forwarded_ip exist only when X-Forwarded_for is found Merge 2 similar pattern in GROK processor
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
| - external: ecs | ||
| name: message | ||
| - external: ecs | ||
| name: network.forwarded_ip |
There was a problem hiding this comment.
We are setting the value of network.forwarded_ip in the ingest pipeline.
Do we need to use the ecs version of this here ?
There was a problem hiding this comment.
I believe we have to.
If I do not I get this issue
FAILURE DETAILS:
apache/access test-access-basic.log:
[0] field "network.forwarded_ip" is undefined
while running this command elastic-package build && elastic-package stack up -d -v --services=elasticsearch && eval "$(elastic-package stack shellinit)" && elastic-package test pipeline --generate
There was a problem hiding this comment.
Let's move it to fields.yml then as assignment is happening under ingest pipeline
There was a problem hiding this comment.
done in ddaed22
There was a problem hiding this comment.
We decided to undo ddaed22
What does this PR do?
Tldr;
Enable support for apache logs with the x_forwarded_for header.
Heavily inspired by the work done in this PR
Addressing this Enhancement Request.
Grokpattern has been updated to match logs starting with a list of IP, storing it inapache.access.remote_ip.This pattern is heavily inspired by the nginx integration.
A painless scrip extract the first public ip, and store the result in
source.address.network.forwarded_ipfield is being filled bysource.ip, as it seems to be the perfect fit.Checklist
changelog.ymlfile.Author's Checklist
x_forward_for?httpd.conf?How to test this PR locally
Following the steps in the documentation
Related issues
None
Screenshots
None