Skip to content

feat[batch-submitter]: add metrics#722

Merged
annieke merged 6 commits intomasterfrom
feat/bs-metrics
May 4, 2021
Merged

feat[batch-submitter]: add metrics#722
annieke merged 6 commits intomasterfrom
feat/bs-metrics

Conversation

@annieke
Copy link
Copy Markdown
Contributor

@annieke annieke commented May 1, 2021

Description
Adds key metrics to the batch submitter so we can monitor them on a dashboard.

Additional context
See issue for metric list

Metadata

@annieke annieke requested a review from karlfloersch as a code owner May 1, 2021 01:23
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented May 1, 2021

🦋 Changeset detected

Latest commit: 9a26456

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@eth-optimism/batch-submitter Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@annieke annieke requested a review from snario May 1, 2021 01:23
if (!wasBatchTruncated && !this._shouldSubmitBatch(batchSizeInBytes)) {
return
}
this.bsMetrics.numTxPerBatch.observe(endBlock - startBlock)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this only be getting measured after the batch is successfully appended?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my original plan, but when we get the transaction receipt we no longer have access to the endBlock and startBlock variables.

passing it through seems heavy for observing this value, and so did using the abi to parse the logs from the receipt. adding this here felt like a decent tradeoff since at this point logically we should be submitting the batch. let me know if you think differently though!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have seen many errors related to submission of batches, I worry this will give us false data.

Couldn't this be on line 265?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^that function may be called multiple times to retry for gas and thus also push the metric multiple times; i'll look into retrieving this value from the logs or elsewhere

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check for the receipt, have an in-memory check to indicate it only runs upon 100% inclusion, etc.

Also, you just made me think that batchSubmissionRetriesCount may be a useful metric as well. Ideally we're getting transactions included immediately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i played around with the logs and there isn't a light weight to parse, so ruling that out for now. i think we could potentially filter for numTxPerBatch that are observed alongside a submission timestamp metric, so we know which ones actually make it through?

i'll think about the retries count metrics - might be easier to observe when it attempts to submit and when it actually does, which we can deduce from this metric vs the actual submission timestamp 👀

Comment on lines +247 to +250
private _registerMetrics(metrics: Metrics): BatchSubmitterMetrics {
metrics.registry.clear()

return {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snario i'm not familiar yet with how registries work, but i have a feeling that the failing integration test might be due to registering the same metric so i added this line - is this standard practice or a hack? i could also only reset these specific metrics i'm registering here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should ever be registering the same metrics within the same process. It either indicates we ought to be mocking out metrics in the test or using different processes to test the behaviour of multiple batch submitters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it actually may be when we initialize both transaction and state batch submitters here:

const txBatchSubmitter = new TransactionBatchSubmitter(
sequencerSigner,
l2Provider,
parseInt(requiredEnvVars.MIN_L1_TX_SIZE, 10),
parseInt(requiredEnvVars.MAX_L1_TX_SIZE, 10),
parseInt(requiredEnvVars.MAX_TX_BATCH_COUNT, 10),
parseInt(requiredEnvVars.MAX_BATCH_SUBMISSION_TIME, 10) * 1_000,
parseInt(requiredEnvVars.NUM_CONFIRMATIONS, 10),
parseInt(requiredEnvVars.RESUBMISSION_TIMEOUT, 10) * 1_000,
requiredEnvVars.ADDRESS_MANAGER_ADDRESS,
parseFloat(requiredEnvVars.SAFE_MINIMUM_ETHER_BALANCE),
MIN_GAS_PRICE_IN_GWEI,
MAX_GAS_PRICE_IN_GWEI,
GAS_RETRY_INCREMENT,
GAS_THRESHOLD_IN_GWEI,
logger.child({ name: TX_BATCH_SUBMITTER_LOG_TAG }),
new Metrics({
prefix: TX_BATCH_SUBMITTER_LOG_TAG,
labels: { environment, release, network },
}),
DISABLE_QUEUE_BATCH_APPEND,
autoFixBatchOptions
)
const stateBatchSubmitter = new StateBatchSubmitter(
sequencerSigner,
l2Provider,
parseInt(requiredEnvVars.MIN_L1_TX_SIZE, 10),
parseInt(requiredEnvVars.MAX_L1_TX_SIZE, 10),
parseInt(requiredEnvVars.MAX_STATE_BATCH_COUNT, 10),
parseInt(requiredEnvVars.MAX_BATCH_SUBMISSION_TIME, 10) * 1_000,
parseInt(requiredEnvVars.NUM_CONFIRMATIONS, 10),
parseInt(requiredEnvVars.RESUBMISSION_TIMEOUT, 10) * 1_000,
parseInt(requiredEnvVars.FINALITY_CONFIRMATIONS, 10),
requiredEnvVars.ADDRESS_MANAGER_ADDRESS,
parseFloat(requiredEnvVars.SAFE_MINIMUM_ETHER_BALANCE),
MIN_GAS_PRICE_IN_GWEI,
MAX_GAS_PRICE_IN_GWEI,
GAS_RETRY_INCREMENT,
GAS_THRESHOLD_IN_GWEI,
logger.child({ name: STATE_BATCH_SUBMITTER_LOG_TAG }),
new Metrics({
prefix: STATE_BATCH_SUBMITTER_LOG_TAG,
labels: { environment, release, network },
}),
FRAUD_SUBMISSION_ADDRESS
)

@annieke annieke requested a review from snario May 3, 2021 18:03
protected chainContract: Contract
protected l2ChainId: number
protected syncing: boolean
protected lastBatchSubmissionTimestamp: number = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could setting this to 0 potentially lead to edge cases as opposed to explicitly leaving it undefined until we have fetched this information?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^i think undefined might leave more edge cases since we're trying to do a subtraction!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I guess it's just a matter of how we prefer to treat "un-fetched data". If we just treat it as 0 then we check for 0 in the code, otherwise we check for undefined. TypeScript has some typing around undefined values being potentially operated on, which is why I recommended undefined, but I'm not super opionated.


afterEach(() => {
sinon.restore()
testMetrics.registry.clear()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just be mocking the library.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't find a library around this and the graph's use was much cleaner than ours so didn't need to do this 😭 i've moved this temporarily into the constructor; i will make a note of this in my bs refactor doc, let's observe if this surfaces any problems when we have dashboards running?

Copy link
Copy Markdown
Contributor

@snario snario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional approval but I left a few last things I think ought to be revised

@annieke annieke merged commit 12dbd81 into master May 4, 2021
@annieke annieke deleted the feat/bs-metrics branch May 4, 2021 22:53
InoMurko pushed a commit to omgnetwork/optimism that referenced this pull request May 25, 2021
* add ethBalance gauge

* add rest of bs metrics

* add changeset
theochap pushed a commit that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants