Skip to content

Handle missing graph nodes in DB#2279

Closed
halseth wants to merge 8 commits intolightningnetwork:masterfrom
halseth:edge-policy-cleanups
Closed

Handle missing graph nodes in DB#2279
halseth wants to merge 8 commits intolightningnetwork:masterfrom
halseth:edge-policy-cleanups

Conversation

@halseth
Copy link
Contributor

@halseth halseth commented Dec 5, 2018

This commit attempts to handle a case of inconsistent database status, namely where an edge is in the database, but one of the corresponding nodes is not. This would lead to migration failures.

  1. We make sure the router won't attempt to add channel policies for unknown edges. This would previously (probably) not be done anyway, since the gossiper will also do this check. But hey, belts and suspenders, and we are able to remove some code. Unit tests for this behavior is added.

  2. We amend the failing migration to add the missing nodes. A unit test that ensures we can recover from all cases of missing information is added.

Fixes #1929.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using graph functions is risky in the context of migrations. They may change in the future. Also first adding all data and then removing data again is quite indirect.

I think it may be better to set up the database using as low level functions as possible/still convenient. Using putLightningNode, putChanEdgeInfo and putChanEdgePolicy it shouldn't be necessary to do the add/remove combination either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the test though, do you still think we should add the data in a more low-level way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale for doing it this way was that it made the code shorter to use the high level methods, then remove the specific data I wanted to test missing, than add all the non-missing data bit by bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a more general question of how to test database migrations. To test a migration, you'd need to have a database with the old structure. In a test, this database needs to be set up. But setting that old db up with functions that assume the new structure already, is it robust?

Suppose one of those graph functions changes. It will impact this test too, but the test may still pass. Only at that point, it isn't testing the intended migration anymore.

I think it is better to setup the old database in a way that cannot change with future commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question becomes "how low-level" is enough? The high-level methods again invoke sub-methods that invoice other methods etc. I see your concern, but I'm afraid if we start requiring this, then we end up writing code over and over again that potentially will never change.

I think a better approach is to handle this at time of change: If database formats are altered, then we must make sure all callers either 1) supports the new format or 2) calls a version of the method that is not altered.

(2) is what was done in 149a8ce

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think normally, callers don't need to be aware of a new format. That is what the high level function is for. Calls from migrations are a special case, because they rely on the specific implementation of the function. It may help to add a comment to those high level functions pointing to the migration relying on it.

I still have the feeling that this test code would be better when the key/values were serialized in the test in the very specific format that is required for this db version migration without using the high level functions.

Interested to hear a second reviewer's opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are going through all combinations anyway, can we assert here too that there are no policies without a corresponding edgeinfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but I think that would increase the scope of the migration. Currently it checks all edgeinfos and add corresponding policies if they are missing.

In this case we would need to iterate over all policies, and delete any where we cannot find the info. Do you think this is worthwhile to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just thinking that we now do a "fix" migration that still leaves some undesired state in the database. If we don't do it now, it may need another fix migration and db version number later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added the cleanup 👍

Note that still will only happen for users with this db version. Users already past this migration won't get this cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Yes, users already past the migration won't get it, but hopefully the code is now in such a state that we don't leave things behind anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a node that already upgraded the db to the latest version, but still has a bad policy requiring a strategic continue. This pr is not going to help me there. I was wondering, shouldn't this cleanup be a new db version so that it is beneficial for everyone?

@Roasbeef Roasbeef added database Related to the database/storage of LND graph channels P2 should be fixed if one has time needs review PR needs review by regular contributors bug fix labels Dec 11, 2018
@halseth halseth force-pushed the edge-policy-cleanups branch from a2c754a to ad1ec1c Compare December 21, 2018 08:45
@halseth halseth force-pushed the edge-policy-cleanups branch from ad1ec1c to 08c100f Compare January 2, 2019 14:10
Fetching edge policies from the DB requires the corresponding node to be
found in the DB. We have seen cases of old, inconsistent DBs where these
are not present, which will make the policy migration fail.

This commit attempts to restore such DBs by checking for the existence
of an edge's two nodes before migrating the policy, and adding the node
if missing.
This commit adds a test for the edge policy migration logic, assuring
that we properly migrate the database to the format where missing
policies for unknown edges are marked as "unknown".

To make sure we also handle cases of missing information from the
database, we generate all combinations of missing information and
asserts that we end up in a consistent db state after migration.
@halseth halseth force-pushed the edge-policy-cleanups branch from 42ddcad to 3643466 Compare January 28, 2019 08:31
@Roasbeef
Copy link
Member

This still relevant?

@halseth
Copy link
Contributor Author

halseth commented Jan 14, 2020

We haven't seen this problem in a while (I think), so I think this can be closed. There are a few commits are still relevant, I'll make a new PR for those and close this one.

@halseth
Copy link
Contributor Author

halseth commented Jan 14, 2020

The commits I had in mind has already been merged: #2549

@halseth halseth closed this Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix channels database Related to the database/storage of LND graph needs review PR needs review by regular contributors P2 should be fixed if one has time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to apply migration #4; unable to open channeldb: unable to update edge policies: unable to fetch node: ****, unable to find node

3 participants