Skip to content

[AWS] [EMR] Metrics data stream #6290

@gpop63

Description

@gpop63

Overview

Amazon EMR, formerly known as Amazon Elastic MapReduce, is an advanced cluster platform designed to simplify, scale, and optimize the processing and analysis of massive data volumes. With its user-friendly interface, flexibility, and affordability, EMR offers an ideal solution for organizations seeking to leverage big data. The platform boasts a wide range of data processing engines, such as Apache Hadoop, Apache Spark, Apache Hive, Apache Flink, and many others, enabling users to harness the power of these tools for efficient data manipulation and insights generation.

AWS EMR clusters send metrics to CloudWatch in 5 minute intervals (1 minute if detailed monitoring is enabled).

Collect EMR Hadoop 2.x metrics from AWS EMR clusters.

Steps/Tasks

  • Setup AWS EMR Cluster
    • Set up an EMR cluster with Hadoop 2.x.
  • Fetch metrics using the generic cloudwatch metricbeat module
    • Utilize the cloudwatch metricbeat module to fetch the required EMR Hadoop 2.x metrics.
  • Create AWS EMR metrics integration
    • Integrate the collected metrics into our existing monitoring infrastructure.
    • Ensure proper handling and processing of EMR Hadoop 2.x metrics.
  • Create AWS EMR metrics data stream documentation
    • Create documentation detailing the newly created data stream for EMR Hadoop 2.x metrics.
    • Include comprehensive information about metric names, dimensions, and their meanings.
  • Add pipeline & pipeline tests (if needed)
    • Implement a pipeline, if necessary, to process the collected metrics.
    • Develop tests to verify the correctness of the pipeline.
  • Add systems tests using terraform
    • Use terraform to create automated systems tests that validate the end-to-end metrics collection.
    • Verify the accurate collection and ingestion of the metrics.

Docs

Metrics and dimensions that should be collected for Hadoop 2.x clusters:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions