graph/db+sqldb: Make the SQL migration retry-safe/idempotent #10161

ellemouton · 2025-08-15T09:02:39Z

In this PR, we make sure that the graph SQL migration is retry safe. This is necessary since the source DB in the migration may execute retries & since we make use of call-backs to execute our migration to the destination DB, we need to make sure that the migration does not error if a retry occurs on the source DB.

See the commit messages for details on how this has been done. Appropriate unit tests have been expanded to cover retry-safety.

Part of #9795

gemini-code-assist

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on making the graph SQL migration process retry-safe and idempotent. This is a critical improvement as the source database may execute retries, and the migration's use of callbacks necessitates robust handling of these scenarios. The changes involve introducing new idempotent database operations, implementing state reset mechanisms within migration callbacks, and expanding test coverage to thoroughly validate the retry-safe behavior. The overall goal is to ensure that the migration can be re-executed without errors or data corruption, even in the presence of retries.

Highlights

Enhanced Migration Idempotency: The SQL migration process for nodes, channels, and policies has been modified to be retry-safe. This ensures that if a migration step is re-executed (e.g., due to a retry from the source database), it will not cause errors or duplicate entries, maintaining data integrity.
New Idempotent Database Operations: New database operations (InsertNodeMig, InsertChannelMig, InsertEdgePolicyMig) and corresponding SQL queries have been introduced. These queries utilize ON CONFLICT DO UPDATE SET or DO NOTHING clauses to gracefully handle existing records during migration, making the insertion process idempotent.
State Reset for Callbacks: Reset logic has been implemented within migration callbacks to ensure that internal state variables (such as counters and batches) are correctly reset when a migration step is retried. This prevents inconsistencies and ensures accurate progress tracking during retries.
Expanded Test Coverage: New test cases have been added, and existing ones modified, to explicitly verify the retry-safety and idempotency of the graph SQL migration. This includes running migrations multiple times to confirm consistent and correct results, providing strong assurance of the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively makes the SQL migration process retry-safe and idempotent by introducing new migration-specific SQL queries with ON CONFLICT clauses and adding reset logic to the iteration helpers. The tests have been appropriately expanded to verify this idempotency by running the migration twice. The refactoring of some helper functions to improve code reuse is also a good improvement. I've found a few minor style guide violations related to line length in comments, which are detailed below.

graph/db/sql_migration.go

sqldb/sqlc/queries/graph.sql

guggero

Very nice! Have a couple of questions and suggestions, but nothing major or blocking.
LGTM 🎉

guggero · 2025-08-19T05:53:22Z

sqldb/sqlc/queries/graph.sql

+    -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
+    -- otherwise, the "RETURNING id" part does not work.
+    DO UPDATE SET
+        alias = EXCLUDED.alias,


This does update these fields. I assume that's what we want? We could also turn this into a no-op by only updating pub_key = EXCLUDED.pub_key and version = EXCLUDED.version. Since these are the conflicting fields, we know they are identical to what's already there. So nothing happens but we still get the ID returned.

Depending on the decision above, if we decide to make this an upsert and not a no-op, I also suggest renaming this to UpsertNodeMig.

I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.

cool yeah - going with keeping it just for correctness of the query & since it is only used for the migration

guggero · 2025-08-19T05:58:36Z

sqldb/sqlc/queries/graph.sql

 ) VALUES (
    $1, $2, $3, $4
- );
+) ON CONFLICT (node_id, type, position)


Same question here: Do we want this to become an upsert or should it be a no-op? If upsert is what we want, I suggest we rename the query to UpsertNodeAddress to make it more clear.
And I assume that's also the correct way to function in normal operation (meaning outside of the migration logic, in sql_store.go?

cool yes, will rename 👍

guggero · 2025-08-19T06:01:04Z

sqldb/sqlc/queries/graph.sql

+VALUES ($1, $2, $3)
+    ON CONFLICT (channel_id, type)
+    -- Update the value if a conflict occurs on channel_id and type.
+    DO UPDATE SET value = EXCLUDED.value;


nit: rename to UpsertChannelExtraType?

renamed 👍

guggero · 2025-08-19T06:01:35Z

sqldb/sqlc/queries/graph.sql

+    $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12
+) ON CONFLICT (scid, version)
+    -- If a conflict occurs, we have already migrated this channel. However, we
+    -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because


Same question here: make it an upsert or be a no-op?

Same question goes with queries in next commits.

I'd say maybe keeping these as is is also on the table, as this is a special migration insert.

yes keeping as is given the *Mig tag

bhandras

LGTM, really nice! 🌟

bhandras · 2025-08-25T11:36:24Z

graph/db/sql_migration.go


 		// Write the node to the SQL database.
-		id, err := upsertNode(ctx, sqlDB, node)
+		id, err := insertNodeSQLMig(ctx, sqlDB, node)


nit: (in commit message) is updates => is updated

bhandras · 2025-08-25T11:40:41Z

sqldb/sqlc/queries/graph.sql

+    -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because
+    -- otherwise, the "RETURNING id" part does not work.
+    DO UPDATE SET
+        alias = EXCLUDED.alias,


I'd say overwriting existing values is expected given the name of this query, so I'd keep it as is. It's also only ever used during migration time.

bhandras · 2025-08-25T11:44:23Z

sqldb/sqlc/queries/graph.sql

+    $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12
+) ON CONFLICT (scid, version)
+    -- If a conflict occurs, we have already migrated this channel. However, we
+    -- still need to do an "UPDATE SET" here instead of "DO NOTHING" because


I'd say maybe keeping these as is is also on the table, as this is a special migration insert.

bhandras · 2025-08-25T11:45:47Z

sqldb/sqlc/queries/graph.sql

+ON CONFLICT (channel_policy_id, type)
+    -- If a conflict occurs on channel_policy_id and type, then we update the
+    -- value.
+    DO UPDATE SET value = EXCLUDED.value;


Here I'd say it makes sense to rename to upsert.

indeed 👍

This will help us test idempotency later on, but it also ensures that TestMigrateGraphToSQL is properly testing writes to the graph_channel_policy_extra_types table.

Currently, the graph SQL migration is not retry safe. Meaning that if the source DB exeutes a retry under the hood, this could result in the migration failing. In preparation for fixing this, we adust the migration test accordingly.

In preparation for making the channel & policy migration logic idempotent in a step-by-step manner, we add a test here that only tests the migration of channels _without_ policies so that we can first focus on just making the channel migration idempotent.

In preparation for handling retries on the source DB side, we thread through the `reset` call-backs properly so that we can reset appropriate variables.

In this commit, the graph SQL migration is updated so that the node migration step is retry-safe. This is done by using migration specific logic & queries that do not use the same node-update-constraint as the normal node upsert logic. For normal "run-time" logic, we always expect a node update to have a newer timestamp than any previously stored one. But for the migration, we will only ever be dealing with a single announcement for a given node & to make things retry-safe, we dont want the query to error if we re-insert the exact same node.

There is no need to use the "collect-then-update" pattern for node insertion during the SQL migration since if we do have any previously persisted data for the node and happen to re-run the insertion for that node, the data will be exactly the same. So we can make use of "On conflict, no nothing" here too.

In this commit, we make the channel part of the graph SQL migration idempotent (retry-safe!). We do this by adding a migration-only channel insert query that will not error out if a the query is called and a chanenl with the given scid&version already exists. We also ensure that errors are not thrown if existing channel features & extra types are re-added.

Finally, we make the channel-policy part of the SQL migration idempotent by adding a migration-only policy insert query which will not error out if the policy already exists and does not have a timestamp that is newer than the existing records timestamp. To keep the commit simple, a insertChanEdgePolicyMig function is added which is basically identical to the updateChanEdgePolicy function except for the fact that it uses the newly added query. In the next commit, it will be simplified even more.

This commit simplifies insertChanEdgePolicyMig. Much of the logic can be removed given that this method is only used in the context of the graph SQL migration. This should improve the performance of the migration quite a lot since it removes the extra GetChannelAndNodesBySCID call.

ellemouton self-assigned this Aug 15, 2025

ellemouton added the no-changelog label Aug 15, 2025

gemini-code-assist bot reviewed Aug 15, 2025

View reviewed changes

ellemouton mentioned this pull request Aug 14, 2025

[epic]: graph db: SQL implementation & migration #9795

Closed

84 tasks

gemini-code-assist bot reviewed Aug 15, 2025

View reviewed changes

ellemouton requested review from bhandras and guggero August 15, 2025 10:50

ellemouton mentioned this pull request Aug 15, 2025

multi: switch on graph SQL migration #10163

Merged

saubyk added this to lnd v0.20 Aug 15, 2025

saubyk moved this to In review in lnd v0.20 Aug 15, 2025

guggero approved these changes Aug 19, 2025

View reviewed changes

bhandras approved these changes Aug 25, 2025

View reviewed changes

ellemouton added 10 commits September 1, 2025 07:56

graph/db: let test policy have some extra opaque data

9019bca

This will help us test idempotency later on, but it also ensures that TestMigrateGraphToSQL is properly testing writes to the graph_channel_policy_extra_types table.

graph/db: let migration test test retry safety

68e4970

Currently, the graph SQL migration is not retry safe. Meaning that if the source DB exeutes a retry under the hood, this could result in the migration failing. In preparation for fixing this, we adust the migration test accordingly.

graph/db: thread through reset call-backs

2c8ac0c

In preparation for handling retries on the source DB side, we thread through the `reset` call-backs properly so that we can reset appropriate variables.

graph/db: let the rapid migration test also tests idempotency

ce1df9d

ellemouton force-pushed the graphRetry branch from 5c41339 to ce1df9d Compare September 1, 2025 06:08

ellemouton merged commit cd6971e into lightningnetwork:master Sep 1, 2025
36 of 39 checks passed

github-project-automation bot moved this from In review to Done in lnd v0.20 Sep 1, 2025

ellemouton deleted the graphRetry branch September 1, 2025 07:23

graph/db+sqldb: Make the SQL migration retry-safe/idempotent #10161

graph/db+sqldb: Make the SQL migration retry-safe/idempotent #10161

Uh oh!

Conversation

ellemouton commented Aug 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guggero left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhandras left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants