[O11y][AWS] Rally benchmark aws.vpcflow#9242
Conversation
🚀 Benchmarks reportTo see the full report comment with |
| }, | ||
| "log": { | ||
| "file": { | ||
| "path": "https://elastic-package-aws-bucket-63461.s3.us-east-1.amazonaws.com/extra-samples.log" |
There was a problem hiding this comment.
| "name": "{{ $aws_s3_bucket_arn }}-{{ div $long_num 10000 }}" | ||
| }, | ||
| "object": { | ||
| "key": "extra-samples.log" |
There was a problem hiding this comment.
it's more likely that the s3 object changes, rather than the bucket
There was a problem hiding this comment.
Agree. I have kept some ranges which will limit the generation of bucket names. Also generated the object values.
| {{- $network_direction := generate "network_direction" }} | ||
| {{- $duration_start := generate "duration_start" }} | ||
| {{- $duration_end := generate "duration_end" }} | ||
| {{- $aws_vpcflow_start := generate "timestamp" | date_modify (print $duration_start) }} |
There was a problem hiding this comment.
beware calling generate multiple times on the same field.
for example here you have timestamp that's period: -24h. you could expect to have 20000 timestamp evenly distributed across 24 hours, but since you are calling generate for it 3 times in the template you end up with 60000 (20000 x 3). they will evenly distributed across 24 hours, each 3 of them sequentially will belong to the same document.
in general, every generate call will apply the generation logic of a new value.
btw, I remember having worked on a vpcflow template before: it was for schema-A (data coming from logs source, ie: the effective vpcflow logs files), rather than for schema-B (data sent from Agent as generated by the integration, ie: what you have here).
You can still reuse most of the thing and merge what you have added here that's not covered
There was a problem hiding this comment.
look at elastic/elastic-package#984 (comment) for Data Schemas (schema-A, schema-B, etc)
There was a problem hiding this comment.
for example here you have timestamp that's period: -24h. you could expect to have 20000 timestamp evenly distributed across 24 hours, but since you are calling generate for it 3 times in the template you end up with 60000 (20000 x 3). they will evenly distributed across 24 hours, each 3 of them sequentially will belong to the same document.
Updated to reuse the timestamp
|
/test benchmark fullreport |
|
/test |
… into aws_benchmark_vpcflow
… into aws_benchmark_vpcflow
… into aws_benchmark_vpcflow
💚 Build Succeeded
History
cc @aliabbas-elastic |
|



Proposed commit message
vpcflowdata stream ofAWSSample Response
sample_event.json
{ "agent": { "name": "aws-scale-123456", "id": "de42127b-4db8-4471-824e-a7b14f478663", "ephemeral_id": "22ed892c-43bd-408a-9121-65e2f5b6a56e", "type": "filebeat", "version": "8.8.0" }, "benchmark_metadata": { "info": { "run_id": "bdb49b49-8b44-4088-8c8c-ccb37a9f62a6", "benchmark": "vpcflow-benchmark" } }, "log": { "file": { "path": "https://elastic-package-aws-bucket-63461.s3.us-east-1.amazonaws.com/extra-samples.log" }, "offset": 338 }, "elastic_agent": { "id": "de42127b-4db8-4471-824e-a7b14f478663", "version": "8.8.0", "snapshot": false }, "destination": { "address": "176.195.180.251", "port": 33232, "ip": "176.195.180.251" }, "source": { "address": "79.27.140.59", "port": 52081, "bytes": 335, "ip": "79.27.140.59", "packets": 90 }, "tags": [ "preserve_original_event", "forwarded", "aws-vpcflow" ], "network": { "community_id": "1:yIcWQWM3N3+OxH8joWOdJgiZusE=", "bytes": 335, "transport": "tcp", "type": "ipv4", "iana_number": "6", "packets": 90, "direction": "ingress" }, "cloud": { "availability_zone": "us-east-1e", "instance": { "id": "i-101502913101502913" }, "provider": "aws", "region": "ap-northeast-1", "account": { "id": "295670701461" } }, "input": { "type": "aws-s3" }, "@timestamp": "2024-02-26T13:09:57.000Z", "ecs": { "version": "8.0.0" }, "related": { "ip": [ "79.27.140.59", "176.195.180.251" ] }, "data_stream": { "namespace": "ep", "type": "logs", "dataset": "aws.vpcflow" }, "aws": { "s3": { "bucket": { "name": "goat-bone-kicker-zirconviper-10150", "arn": "arn:aws:s3:::goat-bone-kicker-zirconviper-10150" }, "object": { "key": "extra-samples.log" } }, "vpcflow": { "vpc_id": "vpc-flameservant101502913", "pkt_srcaddr": "249.157.214.239", "pkt_src_service": "AMAZON_APPFLOW", "type": "IPv4", "traffic_path": "1", "tcp_flags": "1", "action": "ACCEPT", "pkt_dstaddr": "33.66.229.152", "tcp_flags_array": [ "fin" ], "version": "2", "instance_id": "i-101502913101502913", "account_id": "295670701461", "log_status": "SKIPDATA", "pkt_dst_service": "DYNAMODB", "interface_id": "bear-thunderpython", "subnet_id": "subnet-copperspirit101502913", "sublocation": { "id": "gingerleopard", "type": "wavelength" } } }, "event": { "agent_id_status": "auth_metadata_missing", "ingested": "2024-02-26T13:20:00Z", "original": "2 295670701461 bear-thunderpython 79.27.140.59 176.195.180.251 52081 33232 6 90 335 1708949997 1708952997 ACCEPT SKIPDATA vpc-flameservant101502913 subnet-copperspirit101502913 i-101502913101502913 1 IPv4 249.157.214.239 33.66.229.152 ap-northeast-1 us-east-1e wavelength gingerleopard AMAZON_APPFLOW DYNAMODB ingress 1", "kind": "event", "start": "2024-02-26T12:19:57.000Z", "end": "2024-02-26T13:09:57.000Z", "type": [ "connection" ], "category": [ "network" ], "dataset": "aws.vpcflow", "outcome": "success" } } }Checklist
How to test this PR locally
Run this command from package root
elastic-package benchmark rally --benchmark vpcflow-benchmark -velastic-package benchmark stream --benchmark vpcflow-benchmark -vRelated issues
Screenshots