-
Notifications
You must be signed in to change notification settings - Fork 2.2k
graph/db+sqldb: Make the SQL migration retry-safe/idempotent #10161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on making the graph SQL migration process retry-safe and idempotent. This is a critical improvement as the source database may execute retries, and the migration's use of callbacks necessitates robust handling of these scenarios. The changes involve introducing new idempotent database operations, implementing state reset mechanisms within migration callbacks, and expanding test coverage to thoroughly validate the retry-safe behavior. The overall goal is to ensure that the migration can be re-executed without errors or data corruption, even in the presence of retries.
Highlights
- Enhanced Migration Idempotency: The SQL migration process for nodes, channels, and policies has been modified to be retry-safe. This ensures that if a migration step is re-executed (e.g., due to a retry from the source database), it will not cause errors or duplicate entries, maintaining data integrity.
- New Idempotent Database Operations: New database operations (
InsertNodeMig,InsertChannelMig,InsertEdgePolicyMig) and corresponding SQL queries have been introduced. These queries utilizeON CONFLICT DO UPDATE SETorDO NOTHINGclauses to gracefully handle existing records during migration, making the insertion process idempotent. - State Reset for Callbacks: Reset logic has been implemented within migration callbacks to ensure that internal state variables (such as counters and batches) are correctly reset when a migration step is retried. This prevents inconsistencies and ensures accurate progress tracking during retries.
- Expanded Test Coverage: New test cases have been added, and existing ones modified, to explicitly verify the retry-safety and idempotency of the graph SQL migration. This includes running migrations multiple times to confirm consistent and correct results, providing strong assurance of the changes.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively makes the SQL migration process retry-safe and idempotent by introducing new migration-specific SQL queries with ON CONFLICT clauses and adding reset logic to the iteration helpers. The tests have been appropriately expanded to verify this idempotency by running the migration twice. The refactoring of some helper functions to improve code reuse is also a good improvement. I've found a few minor style guide violations related to line length in comments, which are detailed below.
guggero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Have a couple of questions and suggestions, but nothing major or blocking.
LGTM 🎉
| -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because | ||
| -- otherwise, the "RETURNING id" part does not work. | ||
| DO UPDATE SET | ||
| alias = EXCLUDED.alias, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does update these fields. I assume that's what we want? We could also turn this into a no-op by only updating pub_key = EXCLUDED.pub_key and version = EXCLUDED.version. Since these are the conflicting fields, we know they are identical to what's already there. So nothing happens but we still get the ID returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the decision above, if we decide to make this an upsert and not a no-op, I also suggest renaming this to UpsertNodeMig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool yeah - going with keeping it just for correctness of the query & since it is only used for the migration
| ) VALUES ( | ||
| $1, $2, $3, $4 | ||
| ); | ||
| ) ON CONFLICT (node_id, type, position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here: Do we want this to become an upsert or should it be a no-op? If upsert is what we want, I suggest we rename the query to UpsertNodeAddress to make it more clear.
And I assume that's also the correct way to function in normal operation (meaning outside of the migration logic, in sql_store.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool yes, will rename 👍
| VALUES ($1, $2, $3) | ||
| ON CONFLICT (channel_id, type) | ||
| -- Update the value if a conflict occurs on channel_id and type. | ||
| DO UPDATE SET value = EXCLUDED.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to UpsertChannelExtraType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed 👍
| $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12 | ||
| ) ON CONFLICT (scid, version) | ||
| -- If a conflict occurs, we have already migrated this channel. However, we | ||
| -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here: make it an upsert or be a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question goes with queries in next commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say maybe keeping these as is is also on the table, as this is a special migration insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes keeping as is given the *Mig tag
bhandras
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, really nice! 🌟
|
|
||
| // Write the node to the SQL database. | ||
| id, err := upsertNode(ctx, sqlDB, node) | ||
| id, err := insertNodeSQLMig(ctx, sqlDB, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (in commit message) is updates => is updated
| -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because | ||
| -- otherwise, the "RETURNING id" part does not work. | ||
| DO UPDATE SET | ||
| alias = EXCLUDED.alias, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.
| $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12 | ||
| ) ON CONFLICT (scid, version) | ||
| -- If a conflict occurs, we have already migrated this channel. However, we | ||
| -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say maybe keeping these as is is also on the table, as this is a special migration insert.
| ON CONFLICT (channel_policy_id, type) | ||
| -- If a conflict occurs on channel_policy_id and type, then we update the | ||
| -- value. | ||
| DO UPDATE SET value = EXCLUDED.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'd say it makes sense to rename to upsert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed 👍
This will help us test idempotency later on, but it also ensures that TestMigrateGraphToSQL is properly testing writes to the graph_channel_policy_extra_types table.
Currently, the graph SQL migration is not retry safe. Meaning that if the source DB exeutes a retry under the hood, this could result in the migration failing. In preparation for fixing this, we adust the migration test accordingly.
In preparation for making the channel & policy migration logic idempotent in a step-by-step manner, we add a test here that only tests the migration of channels _without_ policies so that we can first focus on just making the channel migration idempotent.
In preparation for handling retries on the source DB side, we thread through the `reset` call-backs properly so that we can reset appropriate variables.
In this commit, the graph SQL migration is updated so that the node migration step is retry-safe. This is done by using migration specific logic & queries that do not use the same node-update-constraint as the normal node upsert logic. For normal "run-time" logic, we always expect a node update to have a newer timestamp than any previously stored one. But for the migration, we will only ever be dealing with a single announcement for a given node & to make things retry-safe, we dont want the query to error if we re-insert the exact same node.
There is no need to use the "collect-then-update" pattern for node insertion during the SQL migration since if we do have any previously persisted data for the node and happen to re-run the insertion for that node, the data will be exactly the same. So we can make use of "On conflict, no nothing" here too.
In this commit, we make the channel part of the graph SQL migration idempotent (retry-safe!). We do this by adding a migration-only channel insert query that will not error out if a the query is called and a chanenl with the given scid&version already exists. We also ensure that errors are not thrown if existing channel features & extra types are re-added.
Finally, we make the channel-policy part of the SQL migration idempotent by adding a migration-only policy insert query which will not error out if the policy already exists and does not have a timestamp that is newer than the existing records timestamp. To keep the commit simple, a insertChanEdgePolicyMig function is added which is basically identical to the updateChanEdgePolicy function except for the fact that it uses the newly added query. In the next commit, it will be simplified even more.
This commit simplifies insertChanEdgePolicyMig. Much of the logic can be removed given that this method is only used in the context of the graph SQL migration. This should improve the performance of the migration quite a lot since it removes the extra GetChannelAndNodesBySCID call.
5c41339 to
ce1df9d
Compare
In this PR, we make sure that the graph SQL migration is retry safe. This is necessary since the source DB in the migration may execute retries & since we make use of call-backs to execute our migration to the destination DB, we need to make sure that the migration does not error if a retry occurs on the source DB.
See the commit messages for details on how this has been done. Appropriate unit tests have been expanded to cover retry-safety.
Part of #9795