Skip to content

Vdb 1154 transform on demand#32

Merged
elizabethengelman merged 13 commits intostagingfrom
vdb-1154-transform-on-demand
Feb 4, 2020
Merged

Vdb 1154 transform on demand#32
elizabethengelman merged 13 commits intostagingfrom
vdb-1154-transform-on-demand

Conversation

@elizabethengelman
Copy link
Copy Markdown

@elizabethengelman elizabethengelman commented Jan 31, 2020

to run ./vulcanizedb transformEventsOnDemand --block-to-transform 9254881 --config=../vdb-mcd-transformers/environments/mcdTransformers.toml --log-level trace

@elizabethengelman elizabethengelman force-pushed the vdb-1154-transform-on-demand branch from 1f9a9d6 to d895b3f Compare January 31, 2020 15:51
@elizabethengelman elizabethengelman force-pushed the vdb-1154-transform-on-demand branch from d895b3f to c11b94a Compare January 31, 2020 20:38

func earlierStartingBlockNumber(transformerBlock, watcherBlock int64) bool {
return transformerBlock < watcherBlock
func (extractor *LogExtractor) OverrideStartingAndEndingBlocks(startingBlock, endingBlock *int64) {
Copy link
Copy Markdown
Author

@elizabethengelman elizabethengelman Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these Override... methods, was thinking of passing these arguments into the Initializer. But we're not currently using the initializer in the tests, and if we were to rework the tests to do that, we'd potentially need to pass in ~4 more things as arguments that we're currently mocking in the tests. Maybe an ExtractorConfig struct to pass in would solve this?

Thought I'd get the PR up as is and get some feedback/think on it a bit.

@elizabethengelman elizabethengelman force-pushed the vdb-1154-transform-on-demand branch from c11b94a to 07473a1 Compare January 31, 2020 20:51
@elizabethengelman elizabethengelman force-pushed the vdb-1154-transform-on-demand branch from 07473a1 to 79e4c83 Compare January 31, 2020 20:53
Comment on lines +103 to +106
// Overriding the normal behavior to just transform one block
extractor.OverrideStartingAndEndingBlocks(&blockNumber, &blockNumber)
extractor.OverrideRecheckHeaderCap(currentCheckCount)
ew.UnsetExpectedExtractorError()
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 lines are the crux of the changes that make this command work for just one block.

  • extractor.OverrideStartingAndEndingBlocks allows us to set the starting and ending blocks for the extractor, regardless of the ones that are configured for the transformers so that we can narrow down the scope of what this command does - just transforms one block's worth of logs.
  • extractor.OverrideRecheckHeaderCap allows us to check a header more than the constants recheckHeaderCap of 5. It can be set to anything, but currently I'm fetching the check_count on the current header, and then checking it one more time.
  • ew.UnsetExpectedExtractorError() removes the expected NoUncheckedHeaders error so that once the header is checked that one more time (or however we choose to configure the recheckHeaderCap) the watcher will return an error and the command exits. This was done to make sure that the command doesn't loop forever without doing anything productive.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig this approach, but am also wondering whether we might want to do some deeper refactoring to facilitate graceful rechecking + exiting. Seems like UnsetExpectedExtractorError could create some unnecessary noise if it means that there are several retries and logged errors. Personally I'd be fine with introducing some duplication if writing new but very similar functions enables us to recheck and then exit with a success status, but also don't mind if we want to postpone while getting a first version live.

@elizabethengelman elizabethengelman changed the title [WIP] Vdb 1154 transform on demand Vdb 1154 transform on demand Jan 31, 2020
Copy link
Copy Markdown

@rmulhol rmulhol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Accidentally slow rolled the 💡 that went off while reviewing - what if this command just set the header's check count to zero?

@@ -0,0 +1,134 @@
// Copyright © 2020 NAME HERE <EMAIL ADDRESS>
//
// Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to include an Apache license

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, not sure why my editor added the apache license. 🤔 I'll switch it to GNU GPL.

Long: ``,
Run: func(cmd *cobra.Command, args []string) {
SubCommand = cmd.CalledAs()
LogWithCommand = *logrus.WithField("SubCommand", SubCommand)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - do we need to initialize this here for the logging to work? I may need to update the extractDiffs command... 😅

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do - I was getting an error (below) when trying to run the command without initializing the LogWithCommand.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2c pc=0x442dd8c]

goroutine 1 [running]:
github.com/sirupsen/logrus.(*Logger).level(...)


currentCheckCount, err := getCurrentCheckCount(blockNumber, &db)
if err != nil {
panic("Failed to get current check_count from the headers table")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to panic here.

As an aside, I'm wondering if we might want to have executeOnDemand (and similar functions in the cmd directory) return an error so that we can use LogWithCommand.Fatal only at the highest level. Based on this, it seems like that might best facilitate clean shutdown by allowing in progress tasks to wrap up (and defers to be called, etc)

Copy link
Copy Markdown
Author

@elizabethengelman elizabethengelman Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, great call. That article is really helpful. I'll probably add a story to the top of the backlog for this sort of cleanup in other commands. But in the mean time, I've switched to using Cobra.RunE instead of Cobra.Run for resetHeaderCheckCount which eventually does return Fatal.log:
b6362a1

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate story to handle other commands: https://makerdao.atlassian.net/browse/VDB-1185

Comment on lines +103 to +106
// Overriding the normal behavior to just transform one block
extractor.OverrideStartingAndEndingBlocks(&blockNumber, &blockNumber)
extractor.OverrideRecheckHeaderCap(currentCheckCount)
ew.UnsetExpectedExtractorError()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig this approach, but am also wondering whether we might want to do some deeper refactoring to facilitate graceful rechecking + exiting. Seems like UnsetExpectedExtractorError could create some unnecessary noise if it means that there are several retries and logged errors. Personally I'd be fine with introducing some duplication if writing new but very similar functions enables us to recheck and then exit with a success status, but also don't mind if we want to postpone while getting a first version live.

}
wg := sync.WaitGroup{}
wg.Add(1)
go transformEthEvents(&ew, &wg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we might be able to execute this synchronously and remove the wait group stuff

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good call!


func getCurrentCheckCount(blockNumber int64, db *postgres.DB) (int64, error) {
var count int64
err := db.Get(&count, `SELECT check_count from headers where block_number = $1`, blockNumber)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be cool to have this live in a repository to avoid including sql in the cmd directory, but I'm not sure if that concern merits initializing a repo 🤔


// Setup bc and db objects
blockChain := getBlockChain()
db := utils.LoadPostgres(databaseConfig, blockChain.Node())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing that a lot of this code is similar to executeTransformers and wondering if we want to either extract shared code to a helper function or perhaps even configure this behavior as a CLI option for that command (rather than introducing a new command)

I know we've got medium/long term plans to enable extracting logs to run in a separate command from transforming logs - maybe it'd be worthwhile to add a note to that story/create a new linked story to address some of this stuff then?

return true
}

return *extractorBlock != int64(-1) && currentTransformerBlock > *extractorBlock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through this code to reset numbers got me thinking... if we're introducing a new command to recheck a given header on demand, maybe that command could just zero out the check_count for the header in question? I think that would enable the standard execute command to automatically recheck that block, without needing to actually load/run the transformers here

Copy link
Copy Markdown
Author

@elizabethengelman elizabethengelman Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, so you're thinking that if we have a command to reset check_count to zero, then execute will automatically eventually pick up that header to fetch logs for, transform, etc? Great idea! I'll dig into that a bit further.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up some changes to remove the transformEventsOnDemand command in favor of a resetHeaderCheckCount command. Left some of the refactoring that was done perviously, like creating NewLogDelegator and NewLogExtractor method and passing those both into NewEventWatcher just in case we want to make some config changes to those in the future. But would be happy to remove those changes as well.

SubCommand = cmd.CalledAs()
LogWithCommand = *logrus.WithField("SubCommand", SubCommand)
LogWithCommand.Infof("Updating check_count for header %v set to 0.", blockNumber)
return resetHeaderCount(int64(blockNumber))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the downsides to using RunE over Run is that I haven't figured out how to include a SubCommand. I am thinking of just wrapping this error with SubCommand information for now.

Copy link
Copy Markdown

@rmulhol rmulhol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Can probably do this in a separate story, but I wonder if we want to create a Dockerfile for this command

var resetHeaderCheckCountCmd = &cobra.Command{
Use: "resetHeaderCheckCount",
Short: "Resets header check_count for the given block number",
Long: `Resets check_count to zero for the given header so that the execute command may rechecked that header's logs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:%s/rechecked/recheck ?

return true
}

return *extractorBlock != int64(-1) && currentTransformerBlock > *extractorBlock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a minor aesthetic thing (and maybe I've been looking at too much dss code 😂), but I'd be tempted to give these predicates names and combine them - something like:

isCurrentEndingBlockNil := extractorBlock == nil
isTransformerEndingBlockNil := currentTransformerBlock == int64(-1)
isCurrentEndingBlockAssigned := *extractorBlock != int64(-1)
isTransformerEndingBlockGreater := currentTransformerBlock > *extractorBlock

return isCurrentEndingBlockNil || isTransformerEndingBlockNil || (isCurrentEndingBlockAssigned && isTransformerEndingBlockGreater)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! The one thing is that we have to return from the method before setting isCurrentEndingBlockAssigned and isTransformerEndingBlockGreater if either of the blocks it's checking are nil. Otherwise we end up with a nil point exception.

return nil
}

func resetStartingBlockNumber(currentTransformerBlock int64, extractorBlock *int64) bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly very minor but I wonder if we should name these in a way that indicates they're checking a condition rather than actually resetting anything?

@@ -28,6 +28,7 @@ type AddressRepository interface {
type CheckedHeadersRepository interface {
MarkHeaderChecked(headerID int64) error
MarkHeadersUnchecked(startingBlockNumber int64) error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new function has me thinking maybe we should name this MarkHeadersUncheckedSince to communicate how much it does - wouldn't want to call this one by accident!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep definitely! That's partially why I named this new method MarkSingleHeaderUnchecked. But I'll update this method too, just to be explicit.

})

Describe("MarkHeadersUnchecked", func() {
It("removes rows for headers <= starting block number", func() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more accurate to say "marks header with matching block number unchecked" or something?

markHeaderOneCheckedErr := repo.MarkHeaderChecked(headerIdOne)
Expect(markHeaderOneCheckedErr).NotTo(HaveOccurred())
markHeaderTwoCheckedErr := repo.MarkHeaderChecked(headerIdTwo)
Expect(markHeaderTwoCheckedErr).NotTo(HaveOccurred())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reluctant to suggest this bc of the amount of setup already required, but I wonder if we want a third block added to verify it doesn't mark subsequent headers unchecked

Fetcher: fetcher.NewLogFetcher(bc),
LogRepository: repositories.NewEventLogRepository(db),
Syncer: transactions.NewTransactionsSyncer(db, bc),
RecheckHeaderCap: constants.RecheckHeaderCap,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to moving this to assignment in the initializer - would be cool to be able to configure it via the CLI

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely. I think that being able to se this, and the Starting and Ending blocks via the CLI will help make it more flexible in the future. Especially for development. 🚀

- Name starting and ending block predicates in Extractor
- Rename methods to determine if an extractor's block should be updated
- Fix a typo in resetHeaderCheckCount command
- Fix uncheck header repo tests
- Rename MarkHeadersUnchecked -> MarkHeadersUncheckedSince
@elizabethengelman elizabethengelman merged commit 9973fed into staging Feb 4, 2020
@elizabethengelman elizabethengelman deleted the vdb-1154-transform-on-demand branch February 4, 2020 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants