With Dataproc 2.2 image version, we recommend installing Google Cloud Ops Agent to obtain system metrics.
This initialization action will install the Ops Agent on a Google Cloud Dataproc cluster and provide similar metrics as the --metric-sources=monitoring-agent-defaults setting which was supported until Dataproc 2.1.
This page highlights differences in metric collection between the Ops Agent and the legacy monitoring agent.
We provide two variants of this initialization action:
opsagent.shinstalls the Ops Agent. By default, it collects syslogs and system (node) metrics.opsagent_nosyslog.shinstalls the Ops Agent and also specifies a user configuration in order to skip syslogs collection from your cluster nodes. If the user configuration is not specified, Ops Agent will collect syslogs besides the system (node) metrics. You can further customize this configuration to collect logs and metrics from other third-party applications.
dataproc.logging.syslog.enabled set to true. This new default behavior can lead to log duplication if the Ops Agent is also configured to collect syslogs.
To prevent duplicate logs, we recommend using opsagent_nosyslog.sh. If you need to disable cluster-level syslog collection entirely, you can set the dataproc.logging.syslog.enabled property to false during cluster creation. For more details, refer to the Dataproc Release Notes and Dataproc Logs documentation.
If you are looking to match the behavior of Dataproc image versions up to 2.1 with --metric-sources=monitoring-agent-defaults without ingesting syslogs, please use opsagent_nosyslog.sh and additionally set the dataproc.logging.syslog.enabled property to false during cluster creation.
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--properties dataproc:dataproc.logging.syslog.enabled=false \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent_nosyslog.shThis approach is not recommended from August 18, 2025 as the cluster-level syslog collection is enabled by default for newly created clusters.
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent.sh