Hello,
The following URL path is valid in a cloud front log ingested via Elastic's pipeline: /my, and this one isn't: /en(test).
Here's a way to test this with a Cloudfront log:
POST /_ingest/pipeline/cloudfront/_simulate
{
"docs": [
{
"_source": {
"message": "2024-07-13\t15:29:45\tEWR53-C1\t198083\t127.0.0.1\tGET\txxxxxxxxxxxxx.cloudfront.net\t/en(test)\t404\thttps://domain.tld/\tUser-Agent:%20Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.1;%20360SE)\t-\t-\tError\tsomevalidbase64==\tdomain.tld\thttps\t609\t0.318\t-\tTLSv1.3\tTLS_AES_128_GCM_SHA256\tError\tHTTP/1.1\t-\t-\t50294\t0.318\tError\ttext/html\t-\t-\t-"
}
}
]
}
One would get the following message upon failure:
Provided Grok expressions do not match field value
or
grok pattern matching was interrupted after [1000] ms
Both being bad. The second case also slows down writes significantly.
The thing is these URLs are technically valid and also can be seen in the wild.
Do you think it'd be better to switch field 8 Grok pattern on this line from UNIXPATH to URLPATH?
Thanks upfront.
Hello,
The following URL path is valid in a cloud front log ingested via Elastic's pipeline:
/my, and this one isn't:/en(test).Here's a way to test this with a Cloudfront log:
One would get the following message upon failure:
or
Both being bad. The second case also slows down writes significantly.
The thing is these URLs are technically valid and also can be seen in the wild.
Do you think it'd be better to switch field 8 Grok pattern on this line from
UNIXPATHtoURLPATH?Thanks upfront.