Skip to content

Conversation

@ellemouton
Copy link
Collaborator

In this PR, we make sure that the graph SQL migration is retry safe. This is necessary since the source DB in the migration may execute retries & since we make use of call-backs to execute our migration to the destination DB, we need to make sure that the migration does not error if a retry occurs on the source DB.

See the commit messages for details on how this has been done. Appropriate unit tests have been expanded to cover retry-safety.

Part of #9795

@ellemouton ellemouton self-assigned this Aug 15, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on making the graph SQL migration process retry-safe and idempotent. This is a critical improvement as the source database may execute retries, and the migration's use of callbacks necessitates robust handling of these scenarios. The changes involve introducing new idempotent database operations, implementing state reset mechanisms within migration callbacks, and expanding test coverage to thoroughly validate the retry-safe behavior. The overall goal is to ensure that the migration can be re-executed without errors or data corruption, even in the presence of retries.

Highlights

  • Enhanced Migration Idempotency: The SQL migration process for nodes, channels, and policies has been modified to be retry-safe. This ensures that if a migration step is re-executed (e.g., due to a retry from the source database), it will not cause errors or duplicate entries, maintaining data integrity.
  • New Idempotent Database Operations: New database operations (InsertNodeMig, InsertChannelMig, InsertEdgePolicyMig) and corresponding SQL queries have been introduced. These queries utilize ON CONFLICT DO UPDATE SET or DO NOTHING clauses to gracefully handle existing records during migration, making the insertion process idempotent.
  • State Reset for Callbacks: Reset logic has been implemented within migration callbacks to ensure that internal state variables (such as counters and batches) are correctly reset when a migration step is retried. This prevents inconsistencies and ensures accurate progress tracking during retries.
  • Expanded Test Coverage: New test cases have been added, and existing ones modified, to explicitly verify the retry-safety and idempotency of the graph SQL migration. This includes running migrations multiple times to confirm consistent and correct results, providing strong assurance of the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively makes the SQL migration process retry-safe and idempotent by introducing new migration-specific SQL queries with ON CONFLICT clauses and adding reset logic to the iteration helpers. The tests have been appropriately expanded to verify this idempotency by running the migration twice. The refactoring of some helper functions to improve code reuse is also a good improvement. I've found a few minor style guide violations related to line length in comments, which are detailed below.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Have a couple of questions and suggestions, but nothing major or blocking.
LGTM 🎉

-- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
-- otherwise, the "RETURNING id" part does not work.
DO UPDATE SET
alias = EXCLUDED.alias,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does update these fields. I assume that's what we want? We could also turn this into a no-op by only updating pub_key = EXCLUDED.pub_key and version = EXCLUDED.version. Since these are the conflicting fields, we know they are identical to what's already there. So nothing happens but we still get the ID returned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the decision above, if we decide to make this an upsert and not a no-op, I also suggest renaming this to UpsertNodeMig.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool yeah - going with keeping it just for correctness of the query & since it is only used for the migration

) VALUES (
$1, $2, $3, $4
);
) ON CONFLICT (node_id, type, position)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here: Do we want this to become an upsert or should it be a no-op? If upsert is what we want, I suggest we rename the query to UpsertNodeAddress to make it more clear.
And I assume that's also the correct way to function in normal operation (meaning outside of the migration logic, in sql_store.go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool yes, will rename 👍

VALUES ($1, $2, $3)
ON CONFLICT (channel_id, type)
-- Update the value if a conflict occurs on channel_id and type.
DO UPDATE SET value = EXCLUDED.value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to UpsertChannelExtraType?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed 👍

$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12
) ON CONFLICT (scid, version)
-- If a conflict occurs, we have already migrated this channel. However, we
-- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here: make it an upsert or be a no-op?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question goes with queries in next commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say maybe keeping these as is is also on the table, as this is a special migration insert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes keeping as is given the *Mig tag

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really nice! 🌟


// Write the node to the SQL database.
id, err := upsertNode(ctx, sqlDB, node)
id, err := insertNodeSQLMig(ctx, sqlDB, node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (in commit message) is updates => is updated

-- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
-- otherwise, the "RETURNING id" part does not work.
DO UPDATE SET
alias = EXCLUDED.alias,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.

$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12
) ON CONFLICT (scid, version)
-- If a conflict occurs, we have already migrated this channel. However, we
-- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say maybe keeping these as is is also on the table, as this is a special migration insert.

ON CONFLICT (channel_policy_id, type)
-- If a conflict occurs on channel_policy_id and type, then we update the
-- value.
DO UPDATE SET value = EXCLUDED.value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd say it makes sense to rename to upsert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed 👍

This will help us test idempotency later on, but it also ensures that
TestMigrateGraphToSQL is properly testing writes to the
graph_channel_policy_extra_types table.
Currently, the graph SQL migration is not retry safe. Meaning that if
the source DB exeutes a retry under the hood, this could result in the
migration failing. In preparation for fixing this, we adust the
migration test accordingly.
In preparation for making the channel & policy migration logic
idempotent in a step-by-step manner, we add a test here that only tests
the migration of channels _without_ policies so that we can first focus
on just making the channel migration idempotent.
In preparation for handling retries on the source DB side, we thread
through the `reset` call-backs properly so that we can reset appropriate
variables.
In this commit, the graph SQL migration is updated so that the node
migration step is retry-safe. This is done by using migration specific
logic & queries that do not use the same node-update-constraint as the
normal node upsert logic. For normal "run-time" logic, we always expect
a node update to have a newer timestamp than any previously stored one.
But for the migration, we will only ever be dealing with a single
announcement for a given node & to make things retry-safe, we dont want
the query to error if we re-insert the exact same node.
There is no need to use the "collect-then-update" pattern for node
insertion during the SQL migration since if we do have any previously
persisted data for the node and happen to re-run the insertion for that
node, the data will be exactly the same. So we can make use of "On
conflict, no nothing" here too.
In this commit, we make the channel part of the graph SQL migration
idempotent (retry-safe!). We do this by adding a migration-only channel
insert query that will not error out if a the query is called and a
chanenl with the given scid&version already exists. We also ensure that
errors are not thrown if existing channel features & extra types are
re-added.
Finally, we make the channel-policy part of the SQL migration idempotent
by adding a migration-only policy insert query which will not error out
if the policy already exists and does not have a timestamp that is newer
than the existing records timestamp. To keep the commit simple, a
insertChanEdgePolicyMig function is added which is basically identical
to the updateChanEdgePolicy function except for the fact that it uses
the newly added query. In the next commit, it will be simplified even
more.
This commit simplifies insertChanEdgePolicyMig. Much of the logic can be
removed given that this method is only used in the context of the graph
SQL migration.

This should improve the performance of the migration quite a lot since
it removes the extra GetChannelAndNodesBySCID call.
@ellemouton ellemouton merged commit cd6971e into lightningnetwork:master Sep 1, 2025
36 of 39 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in lnd v0.20 Sep 1, 2025
@ellemouton ellemouton deleted the graphRetry branch September 1, 2025 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants