Skip to content

[Custom AWS Logs] Add multiline support for aws-s3 input#6081

Merged
kaiyan-sheng merged 2 commits intoelastic:mainfrom
kaiyan-sheng:multiline
May 16, 2023
Merged

[Custom AWS Logs] Add multiline support for aws-s3 input#6081
kaiyan-sheng merged 2 commits intoelastic:mainfrom
kaiyan-sheng:multiline

Conversation

@kaiyan-sheng
Copy link
Copy Markdown

@kaiyan-sheng kaiyan-sheng commented May 3, 2023

What does this PR do?

This PR is to add multiline support for custom AWS logs integration. Here is what the policy looks like after the multiline parser is added:

    streams:
      - id: aws-s3-aws_logs.generic-4cc5036f-5f6b-4040-8e4c-ca2720f1481e
        data_stream:
          dataset: aws_logs.generic
        bucket_arn: test_arn
        number_of_workers: 1
        bucket_list_interval: 120s
        max_bytes: 10MiB
        max_number_of_messages: 5
        sqs.max_receive_count: 5
        sqs.wait_time: 20s
        file_selectors: null
        access_key_id: aa
        secret_access_key: bb
        tags:
          - forwarded
        publisher_pipeline.disable_host: true
        parsers:
          - multiline:
              pattern: ^<Event
              negate: true
              match: after

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

Step1: create a s3 bucket and upload a file with multiline log sample. For example what I'm using is:

kaiyansheng ~/Documents  $ cat test_multiline.log 
<Event><Data>
	A
	B
	C</Data></Event>
<Event><Data>
	D
	E
	F</Data</Event>

Step2: setup agent and agent policy to use custom AWS logs integration with the proper authentication for AWS and S3 bucket ARN. The agent policy looks like this:

inputs:
  - id: aws-s3-aws_logs-a6d98725-c4ee-4292-acb7-558e747ac909
    name: aws_logs-1
    revision: 1
    type: aws-s3
    use_output: default
    meta:
      package:
        name: aws_logs
        version: 0.4.0
    data_stream:
      namespace: default
    package_policy_id: a6d98725-c4ee-4292-acb7-558e747ac909
    streams:
      - id: aws-s3-aws_logs.generic-a6d98725-c4ee-4292-acb7-558e747ac909
        data_stream:
          dataset: aws_logs.generic
        bucket_arn: 'arn:aws:s3:::test-cloudfront-ks'
        number_of_workers: 1
        bucket_list_interval: 120s
        max_bytes: 10MiB
        max_number_of_messages: 5
        sqs.max_receive_count: 5
        sqs.wait_time: 20s
        file_selectors: null
        access_key_id: foo
        secret_access_key: boo
        tags:
          - preserve_original_event
          - forwarded
        publisher_pipeline.disable_host: true
        parsers:
          - multiline:
              pattern: ^<Event
              negate: true
              match: after

Step3: Once the agent starts running, you should see the test_multiline.log get read and stored into Elasticsearch:
Screenshot 2023-05-05 at 11 01 57 AM

Screenshots

Screenshot 2023-05-03 at 3 45 58 PM

@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner May 3, 2023 21:46
@kaiyan-sheng kaiyan-sheng self-assigned this May 3, 2023
@elasticmachine
Copy link
Copy Markdown

elasticmachine commented May 3, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-05-05T17:13:17.639+0000

  • Duration: 14 min 33 sec

Test stats 🧪

Test Results
Failed 0
Passed 2
Skipped 0
Total 2

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link
Copy Markdown

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚
Classes 100.0% (0/0) 💚
Methods 66.667% (2/3) 👎 -29.487
Lines 100.0% (0/0) 💚
Conditionals 100.0% (0/0) 💚

@kaiyan-sheng kaiyan-sheng added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label May 5, 2023
@tdancheva tdancheva self-requested a review May 9, 2023 16:32
Copy link
Copy Markdown
Contributor

@tdancheva tdancheva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, looks good to me, works as expected!

P.S. Just a sidenote related to the parsers, I noticed they are still in beta for a long time now and after being used in many places. After a short chat with @kaiyan-sheng, she explained it was due to pending security checks at the time and that we should open an issue to call on GA. Putting it here as a reminder to do so.

@kaiyan-sheng
Copy link
Copy Markdown
Author

Thanks @tdancheva ! @elastic/security-external-integrations Do you guys have a timeline on making the parser GA in Beats?

@kaiyan-sheng kaiyan-sheng merged commit d06406c into elastic:main May 16, 2023
@kaiyan-sheng kaiyan-sheng deleted the multiline branch May 16, 2023 02:03
@elasticmachine
Copy link
Copy Markdown

Package aws_logs - 0.4.0 containing this change is available at https://epr.elastic.co/search?package=aws_logs

@andrewkroh andrewkroh added the Integration:aws_logs Custom AWS Logs label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration:aws_logs Custom AWS Logs Team:Cloud-Monitoring Label for the Cloud Monitoring team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants