[AWS] [EMR] Metrics data stream

## Overview

Amazon EMR, formerly known as Amazon Elastic MapReduce, is an advanced cluster platform designed to simplify, scale, and optimize the processing and analysis of massive data volumes. With its user-friendly interface, flexibility, and affordability, EMR offers an ideal solution for organizations seeking to leverage big data. The platform boasts a wide range of data processing engines, such as Apache Hadoop, Apache Spark, Apache Hive, Apache Flink, and many others, enabling users to harness the power of these tools for efficient data manipulation and insights generation.

AWS EMR clusters send metrics to CloudWatch in 5 minute intervals (1 minute if detailed monitoring is enabled).

Collect EMR Hadoop 2.x metrics from AWS EMR clusters.

## Steps/Tasks

- Setup AWS EMR Cluster
    - Set up an EMR cluster with Hadoop 2.x.
- Fetch metrics using the generic `cloudwatch` metricbeat module
    - Utilize the cloudwatch metricbeat module to fetch the required EMR Hadoop 2.x metrics.
- Create AWS EMR metrics integration
    - Integrate the collected metrics into our existing monitoring infrastructure.
    - Ensure proper handling and processing of EMR Hadoop 2.x metrics.
- Create AWS EMR metrics data stream documentation
    - Create documentation detailing the newly created data stream for EMR Hadoop 2.x metrics.
    - Include comprehensive information about metric names, dimensions, and their meanings.
- Add pipeline & pipeline tests (if needed)
    - Implement a pipeline, if necessary, to process the collected metrics.
    - Develop tests to verify the correctness of the pipeline.
- Add systems tests using terraform
    - Use terraform to create automated systems tests that validate the end-to-end metrics collection.
    - Verify the accurate collection and ingestion of the metrics.

## Docs

Metrics and dimensions that should be collected for Hadoop 2.x clusters:
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS] [EMR] Metrics data stream #6290

Overview

Steps/Tasks

Docs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[AWS] [EMR] Metrics data stream #6290

Description

Overview

Steps/Tasks

Docs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions