Skip to content

Extend metrics schema#273

Merged
THardy98 merged 10 commits intomainfrom
extend-metrics-schema
Dec 15, 2025
Merged

Extend metrics schema#273
THardy98 merged 10 commits intomainfrom
extend-metrics-schema

Conversation

@THardy98
Copy link
Contributor

@THardy98 THardy98 commented Dec 12, 2025

What was changed

Minor cleanup/code movement and adding additional fields to MetricLine.

Added fields still need to be plumbed through (will be empty for now):

  • Environment: environment that the scenario was run from
  • BuildID: build ID of the worker
  • Scenario: scenario name
  • RunConfigProfile: profile name of the run configuration
  • WorkerConfigProfile: profile name of the worker configuration

where a profile maps to a configuration.

Want some feedback to establish the metrics schema. The goal is to have a schema we anticipate to not have to change.

Plumbing to come in a subsequent PR (in progress, will link).

Why?

Allows us to filter queries on more parameters that will be useful.
Establish the schema for the data team to create omni topics for.

  1. How was this tested:
go run ./cmd run-scenario-with-worker \
    --scenario throughput_stress \
    --run-id test-$(date +%s) \
    --language go \
    --duration 10m \
    --prom-listen-address 127.0.0.1:9091 \
    --worker-prom-listen-address 127.0.0.1:9092 \
--prom-instance-addr 127.0.0.1:9090 \
--prom-instance-config \
--prom-snapshot \
--prom-export-worker-metrics go-10m.parquet \
--prom-export-worker-job omes-worker \
--prom-export-metrics-step 3

…, would be nice to avoid having to strip the "worker-" prefix when passing args to worker start command
@THardy98 THardy98 requested a review from a team as a code owner December 12, 2025 17:26
@THardy98 THardy98 requested a review from Sushisource December 12, 2025 18:22
@THardy98
Copy link
Contributor Author

THardy98 commented Dec 12, 2025

FWIW @Sushisource , we can't just pass flags through to plumb this, the metric line contains state from both the scenario runner & worker, but they'll be running in separate processes, and in cloud, separate vms.

The prom instance already has the address of the worker, I want to add an /info endpoint to the worker's prom metrics server, which we can query to get the worker's run config at metrics export time.

I also want to separate MetricsOptions into distinct ClientMetricsOptions and WorkerMetricsOptions. They're distinct enough that they don't make sense to share the same logic or flags:

  • PromInstanceOptions will only be in ClientMetricsOptions (tie the lifetime of the prom instance to the client, which means its scoped to a scenario. I don't seen any benefit to running a prom instance tied to the worker's lifetime)
  • options to setup the /metrics server to WorkerMetricsOptions (I don't see any benefit from capturing the metrics of the scenario runner)

LMK if this makes sense and I'll have the PR up

@THardy98 THardy98 force-pushed the extend-metrics-schema branch from 1faa3c1 to 201fc64 Compare December 12, 2025 18:43
@Sushisource
Copy link
Member

@THardy98

, I want to add an /info endpoint to the worker's prom metrics server, which we can query to get the worker's run config at metrics export time.

Is this just for reporting it with the tests? Shouldn't we know already because we set it? (I'm not necessarily opposed, just asking).

The separation of types makes sense to me

@THardy98
Copy link
Contributor Author

@THardy98

, I want to add an /info endpoint to the worker's prom metrics server, which we can query to get the worker's run config at metrics export time.

Is this just for reporting it with the tests? Shouldn't we know already because we set it? (I'm not necessarily opposed, just asking).

The separation of types makes sense to me

Yes - it'd be for reporting, but I can see the client having worker config/info being useful generally.

If we spawn the worker and client in separate processes, their configurations are not provided to each other. run-scenario-with-worker is an exception because we spawn both from a single process, so the client has complete information

@THardy98 THardy98 merged commit 1145d9b into main Dec 15, 2025
52 of 53 checks passed
@THardy98 THardy98 deleted the extend-metrics-schema branch December 15, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants