Handle missing graph nodes in DB#2279
Handle missing graph nodes in DB#2279halseth wants to merge 8 commits intolightningnetwork:masterfrom
Conversation
channeldb/migrations_test.go
Outdated
There was a problem hiding this comment.
Using graph functions is risky in the context of migrations. They may change in the future. Also first adding all data and then removing data again is quite indirect.
I think it may be better to set up the database using as low level functions as possible/still convenient. Using putLightningNode, putChanEdgeInfo and putChanEdgePolicy it shouldn't be necessary to do the add/remove combination either.
There was a problem hiding this comment.
This is just the test though, do you still think we should add the data in a more low-level way?
There was a problem hiding this comment.
My rationale for doing it this way was that it made the code shorter to use the high level methods, then remove the specific data I wanted to test missing, than add all the non-missing data bit by bit.
There was a problem hiding this comment.
It is a more general question of how to test database migrations. To test a migration, you'd need to have a database with the old structure. In a test, this database needs to be set up. But setting that old db up with functions that assume the new structure already, is it robust?
Suppose one of those graph functions changes. It will impact this test too, but the test may still pass. Only at that point, it isn't testing the intended migration anymore.
I think it is better to setup the old database in a way that cannot change with future commits.
There was a problem hiding this comment.
Then the question becomes "how low-level" is enough? The high-level methods again invoke sub-methods that invoice other methods etc. I see your concern, but I'm afraid if we start requiring this, then we end up writing code over and over again that potentially will never change.
I think a better approach is to handle this at time of change: If database formats are altered, then we must make sure all callers either 1) supports the new format or 2) calls a version of the method that is not altered.
(2) is what was done in 149a8ce
There was a problem hiding this comment.
I think normally, callers don't need to be aware of a new format. That is what the high level function is for. Calls from migrations are a special case, because they rely on the specific implementation of the function. It may help to add a comment to those high level functions pointing to the migration relying on it.
I still have the feeling that this test code would be better when the key/values were serialized in the test in the very specific format that is required for this db version migration without using the high level functions.
Interested to hear a second reviewer's opinion.
channeldb/migrations_test.go
Outdated
There was a problem hiding this comment.
Now that we are going through all combinations anyway, can we assert here too that there are no policies without a corresponding edgeinfo?
There was a problem hiding this comment.
Good idea, but I think that would increase the scope of the migration. Currently it checks all edgeinfos and add corresponding policies if they are missing.
In this case we would need to iterate over all policies, and delete any where we cannot find the info. Do you think this is worthwhile to add?
There was a problem hiding this comment.
I am just thinking that we now do a "fix" migration that still leaves some undesired state in the database. If we don't do it now, it may need another fix migration and db version number later.
There was a problem hiding this comment.
Ok, added the cleanup 👍
Note that still will only happen for users with this db version. Users already past this migration won't get this cleanup.
There was a problem hiding this comment.
Nice. Yes, users already past the migration won't get it, but hopefully the code is now in such a state that we don't leave things behind anymore.
There was a problem hiding this comment.
I have a node that already upgraded the db to the latest version, but still has a bad policy requiring a strategic continue. This pr is not going to help me there. I was wondering, shouldn't this cleanup be a new db version so that it is beneficial for everyone?
a2c754a to
ad1ec1c
Compare
ad1ec1c to
08c100f
Compare
08c100f to
42ddcad
Compare
Fetching edge policies from the DB requires the corresponding node to be found in the DB. We have seen cases of old, inconsistent DBs where these are not present, which will make the policy migration fail. This commit attempts to restore such DBs by checking for the existence of an edge's two nodes before migrating the policy, and adding the node if missing.
This commit adds a test for the edge policy migration logic, assuring that we properly migrate the database to the format where missing policies for unknown edges are marked as "unknown". To make sure we also handle cases of missing information from the database, we generate all combinations of missing information and asserts that we end up in a consistent db state after migration.
42ddcad to
3643466
Compare
|
This still relevant? |
|
We haven't seen this problem in a while (I think), so I think this can be closed. There are a few commits are still relevant, I'll make a new PR for those and close this one. |
|
The commits I had in mind has already been merged: #2549 |
This commit attempts to handle a case of inconsistent database status, namely where an edge is in the database, but one of the corresponding nodes is not. This would lead to migration failures.
We make sure the router won't attempt to add channel policies for unknown edges. This would previously (probably) not be done anyway, since the gossiper will also do this check. But hey, belts and suspenders, and we are able to remove some code. Unit tests for this behavior is added.
We amend the failing migration to add the missing nodes. A unit test that ensures we can recover from all cases of missing information is added.
Fixes #1929.