Skip to content

[aws] Cloudfront logs Ingest pipeline faulty uri stem grok pattern #10507

@georgivalentinov

Description

@georgivalentinov

Hello,

The following URL path is valid in a cloud front log ingested via Elastic's pipeline: /my, and this one isn't: /en(test).
Here's a way to test this with a Cloudfront log:

POST /_ingest/pipeline/cloudfront/_simulate
{
  "docs": [
    {
      "_source": {
        "message": "2024-07-13\t15:29:45\tEWR53-C1\t198083\t127.0.0.1\tGET\txxxxxxxxxxxxx.cloudfront.net\t/en(test)\t404\thttps://domain.tld/\tUser-Agent:%20Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.1;%20360SE)\t-\t-\tError\tsomevalidbase64==\tdomain.tld\thttps\t609\t0.318\t-\tTLSv1.3\tTLS_AES_128_GCM_SHA256\tError\tHTTP/1.1\t-\t-\t50294\t0.318\tError\ttext/html\t-\t-\t-"
      }
    }
  ]
}

One would get the following message upon failure:

Provided Grok expressions do not match field value

or

grok pattern matching was interrupted after [1000] ms

Both being bad. The second case also slows down writes significantly.

The thing is these URLs are technically valid and also can be seen in the wild.
Do you think it'd be better to switch field 8 Grok pattern on this line from UNIXPATH to URLPATH?

Thanks upfront.

Metadata

Metadata

Assignees

Labels

Integration:awsAWSTeam:Obs-InfraObsObservability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions