Treat superseded indexes correctly by aawsome · Pull Request #2839 · restic/restic

aawsome · 2020-07-17T20:01:05Z

What is the purpose of this change? What does it change?

Treat superseded index files correctly, if they still exist.
This means, they are now ignored for most commands and check prints a hint if they exist.

Was the change discussed in an issue or in the forum before?

closes #2824

Is also used within #2718.

Checklist

I have read the Contribution Guidelines
I have enabled maintainer edits for this PR
I have added tests for all changes in this PR - tests are added within Reimplementation of prune #2718
I have added documentation for the changes (in the manual)
There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
I have run gofmt on the code in all commits
All commit messages are formatted in the same style as the other commits in the repo
I'm done, this Pull Request is ready for review

MichaelEischer · 2020-07-25T13:22:16Z

This seemingly simple change has far reaching implications:

Different restic versions will interpret the index differently. This could become a problem if obsolete indexes still exist. A new restic version would then ignore an obsolete index, but older versions will still read those as before. If these obsolete indexes refer to no longer existing blobs, we now have a recipe for data loss.

I'm not sure I completely understand in what regard this PR treats the index files correcter than before. Is it just not-ignoring the "supersedes" field or also something else? The only scenario where the current behavior could causes data loss is when prune starts to delete pack files without successfully deleting old index files. But that case can be easily checked for during prune.

Using the supersedes field has a major downside: An interrupted index upload now leaves the repository with an incomplete index as the old one is superseded and the new one not yet complete (that could be avoided by precisely converting one or more old index files into a new one, but that would be rather complex as the new index would have to completely cover the contained old indexes; and the problem with incompatibilities remains; and an incremental index update would also work without supersedes). This in turn requires an index rebuild to repair the repository. So this moves the uncertainty from maybe too many index files to possibly too few index files. And in that regard I think the current behavior is the better trade-off.

Another benefit of this PR is that as obsolete indexes are dropped while loading, there's no memory usage overhead due to duplicate indexes. When no using the supersedes field, that could also be achieved by immediately merging indexes and in that process skipping duplicate entries.

aawsome · 2020-07-25T16:53:15Z

I'm not sure I completely understand in what regard this PR treats the index files correcter than before. Is it just not-ignoring the "supersedes" field or also something else?

In fact it is only about not-ignoring "supersedes" and in fact using and interpreting it (if given) correctly. IMO the only correct handling if a supersedes is given is to ignore this index file, if it is still present.

As you also pointed out, NOT ignoring it during backup may cause data loss: A still-existing superseded index file may make backup think that a needed blob exists in the repo which in fact doesn't.

The only scenario where the current behavior could causes data loss is when prune starts to delete pack files without successfully deleting old index files.

It is during the rebuilding of the index which is run within prune and rebuild-index. Currently, if deletion of an old index file fails, restic prints out a warning and continues (deleting the other index files and in case of prune also deleting the obsolete pack files, which may still be referenced within this specific index file). And of course there is no better way to do at this point, as the new index files have already been written...

But that case can be easily checked for during prune.

See above. There is also the problem, that identifying an error when deleting a file relies on the backend. When using a eventually consistent backend, a nil error may just mean that the delete command was accepted. In edge cases where the storage has to decide in an unclear situation whether to delete a file or not I do not think that any implementation may think that NOT deleting in this case may cause data loss...

Using the supersedes field has a major downside: An interrupted index upload now leaves the repository with an incomplete index as the old one is superseded and the new one not yet complete (that could be avoided by precisely converting one or more old index files into a new one, but that would be rather complex as the new index would have to completely cover the contained old indexes; and the problem with incompatibilities remains; and an incremental index update would also work without supersedes). This in turn requires an index rebuild to repair the repository. So this moves the uncertainty from maybe too many index files to possibly too few index files. And in that regard I think the current behavior is the better trade-off.

You are right about the possibility that this PR may result in too less index entries. However, the conclusion is really simple: Too less index entries may cause problems with restoring (which can be repaired by rebuilding the index) while too much index entries may cause data loss during backup...

aawsome · 2020-08-02T05:37:22Z

When no using the supersedes field, that could also be achieved by immediately merging indexes and in that process skipping duplicate entries.

Actually this can be implemented independently from this PR; I'll tackle this shortly.

Let me please explain, why I'm proposing this PR in the scope of optimizing prune:
In the actual prune implementation the index is rebuilt at the end by reading all "finally exisiting" (i.e. the state after all rewrites and pack deletion) pack headers and writes them to the index. The current index files are not used at all, so current prune heals the situation of superseded
However, this re-reading of all pack headers is one of the things that slows down current prune.
So I propose to use the in-memory index loaded from the index files to use in prune. In #2718 it is actually used twice:

The first place is when determining what to do in prune, this is still in #2718. Here superseded but still existing and not ignored index files would result in:

maybe some extra index entries which point to non-existing pack files which are actually not used. This will cause a security check in new prune to abort.
maybe more duplicates which would be wrongly marked for repacking depending on used the prune parameters. This would not break anything but let prune do some unneeded repacking work.

The second place is the index rebuilding, now in #2842. Here the "wrong duplicates" will be still written twice.

So I agree that for the proposed prune changes, it is sufficient to skip exact duplicates while merging.

aawsome · 2020-08-02T06:29:03Z

I added the deletion of still existing superseded index files here (and removed it from #2718). So this can now be tackled independently from #2718. Or postponed, if this change needs more discussion 😉

MichaelEischer · 2020-10-01T19:25:17Z

It is during the rebuilding of the index which is run within prune and rebuild-index. Currently, if deletion of an old index file fails, restic prints out a warning and continues (deleting the other index files and in case of prune also deleting the obsolete pack files, which may still be referenced within this specific index file). And of course there is no better way to do at this point, as the new index files have already been written...

As #2674 was merged, this is no longer a problem.

You are right about the possibility that this PR may result in too less index entries. However, the conclusion is really simple: Too less index entries may cause problems with restoring (which can be repaired by rebuilding the index) while too much index entries may cause data loss during backup...

I don't think that way of error handling is a good idea as it introduces a new failure mode in restic: Currently when a backup client or the network connections fails, the worst case (unless you hit a bug) is that you have to unlock the repository and run the command again. A failed prune run actually always leaves the repository in a working state, it just contains lots and lots of garbage data. Rebuilding the index for recovery will work, however, in the meantime the repository is unusable. And rebuild-index takes ages for large repositories, which is part of why I don't want this to be part of normal operation (once #2718 and #2842 are merged, the index is in fact no longer rebuilt from scratch during normal operation). I actually regard rebuild-index to be a last-resort command to repair a repository where the backend lost / damaged some of the data. That is it would make recovering from failed prune runs much more complicated / time intensive than right now.

As you also pointed out, NOT ignoring it during backup may cause data loss: A still-existing superseded index file may make backup think that a needed blob exists in the repo which in fact doesn't.

If we make this change I want to maintain backwards compatibility. When the supersedes field is suddenly interpreted we should ensure that this won't break older restic clients (beyond already existing problems). I'm pretty sure that there are several restic users which use different versions for different clients which still backup to the same repository. This also implies that continuing after failing to delete an old index file is not an option. Safely requiring clients to interpret the supersedes field would require a repository format change.

There is also the problem, that identifying an error when deleting a file relies on the backend. When using a eventually consistent backend, a nil error may just mean that the delete command was accepted. In edge cases where the storage has to decide in an unclear situation whether to delete a file or not I do not think that any implementation may think that NOT deleting in this case may cause data loss...

This is in my opinion the main problem which this PR solves. With the above requirements, for a strongly consistent backend (e.g. local filesystem) the only other issue is avoiding some unnecessary repack work due to duplicate blobs. And that is "only" an optimization.

We could avoid the problem of an incomplete index by uploading a "tombstone" index as the last step of rewriting the index. That way old indexes are only ignored once the new index was uploaded. To fix the problem of reappearing deleted indexes this would be enough, as packs are only deleted after the tombstone index was uploaded. However I've just made the assumption, that the tombstone index is visible to the next restic run. In that regard, adding the supersedes field to every index file would be more reliable, as it's available starting from the first part of the new index and also contained in many more files.

I just noticed that this change is related to how a non-blocking prune would work. There unreferenced blobs are first marked for deletion at which point they may no longer be used in later backup runs, but are still accessible for retrieving data from the repository. Entries from superseded indexes are quite similar to the ones marked for deletion.

It's probably a good idea to postpone this PR until after the main prune changes are completed. I also have to think a bit more about the assumptions regarding the required backend consistency guarantees.

- don't use superseded index files in masterindex - add hint to check if superseded index files still exist

MichaelEischer

I've rebased the PR to the current master branch. With the new prune implementation there are now three possible scenarios:

Either there are no obsolete indexes in which case the PR is a noop
The storage backend didn't (yet) delete some indexes. This case could cause problems for older restic versions and would be fixed by ignoring superseded indexes. I wonder whether we should let backup report an error in this case to warn users.
prune or rebuild-index was interrupted. In this case there are lots of obsolete indexes, which can be safely ignored. In this case it would probably be a good idea to not throw around lots of warnings. The main benefit of ignoring the obsolete indexes would be a lower memory usage. As this case should be more frequent than the previous one, the current error reporting in this PR should be sufficient.

That is right now it should be quite safe to merge this PR. There's just one caveat. Currently when saving the master index, an old index is marked as obsolete before all entries in it have been transferred. That is the following code lines must be moved after copying the index contents.

restic/internal/repository/master_index.go

Lines 324 to 340 in 66382b2

    
           if idx.Final() { 
        
           	ids, err := idx.IDs() 
        
           	if err != nil { 
        
           		debug.Log("index %d does not have an ID: %v", err) 
        
           		return err 
        
           	} 
        
           	debug.Log("adding index ids %v to supersedes field", ids) 
        
           	err = newIndex.AddToSupersedes(ids...) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	obsolete.Merge(restic.NewIDSet(ids...)) 
        
           } else { 
        
           	debug.Log("index %d isn't final, don't add to supersedes field", i) 
        
           }

This means that for the upgrade path it might be necessary to run rebuild-index if a partially rewritten index exists after an interrupted prune or rebuild-index.

MichaelEischer · 2021-08-18T12:17:33Z

cmd/restic/cmd_prune.go

 	}

+	// delete obsolete index files (index files that have already been superseded)
+	obsoleteIndexes := (repo.Index()).(*repository.MasterIndex).Obsolete()


I think we also need something similar for rebuild-index?

MichaelEischer · 2021-08-18T12:45:44Z

internal/repository/repository.go

 	}

+	// remove obsolete indexes
+	validIndex.Sub(r.idx.Obsolete())


Wouldn't that cause the obsolete indexes to be downloaded over and over again? After all these are still stored in the repository.

MichaelEischer · 2024-01-20T14:18:01Z

By now I've spent some time on planning how index updates should work for a non-blocking prune implementation (there's nothing to show yet). What's becoming clear is however that marking old index data obsolete should not work on an index level, but rather at the granularity of pack files. That is far more flexible and also allows for rewriting only index files that contain too many deleted pack files.

As currently the supersedes field is not interpreted by restic (which would change after merging this PR), this allows us to completely drop the old mechanism. Merging this PR however would require us to keep the old mechanism and thus would complicate the future evolution of the repository format.

aawsome · 2024-01-23T21:24:24Z

What's becoming clear is however that marking old index data obsolete should not work on an index level, but rather at the granularity of pack files. That is far more flexible and also allows for rewriting only index files that contain too many deleted pack files.

Fully agree. If you have something to show up, I'm looking forward to a discussion. As you surely know I claim that rustic already uses a non-blocking prune, the corresponding index files additions are here: https://github.com/rustic-rs/rustic_core/blob/main/crates/core/src/repofile/indexfile.rs#L26 (and L52ff)
This may be a starting point and if you think something needs to be improved, let's discuss it!

aawsome force-pushed the use-superseded branch from 0f09fe1 to 0dc629e Compare July 17, 2020 20:20

This was referenced Jul 17, 2020

Reimplementation of prune #2718

Merged

prune: Stricter error checks #2674

Merged

aawsome force-pushed the use-superseded branch 2 times, most recently from e28d96f to 2890f9c Compare July 25, 2020 08:03

aawsome force-pushed the use-superseded branch from 2890f9c to 99f64f8 Compare August 2, 2020 06:27

This was referenced Aug 2, 2020

Don't save exact duplicates in merged index #2863

Merged

Rebuild index in prune by using in-memory index #2842

Merged

MichaelEischer mentioned this pull request Jan 8, 2021

check says prune would do more work right after prune #3161

Closed

Treat superseded indexes correctly

0aaf5b7

- don't use superseded index files in masterindex - add hint to check if superseded index files still exist

MichaelEischer force-pushed the use-superseded branch from 99f64f8 to 0aaf5b7 Compare August 18, 2021 12:11

MichaelEischer requested changes Aug 18, 2021

View reviewed changes

MichaelEischer mentioned this pull request Oct 9, 2021

b2: Successful delete if file does not exist #3544

Merged

7 tasks

MichaelEischer closed this Jan 20, 2024

aawsome deleted the use-superseded branch February 24, 2024 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat superseded indexes correctly#2839

Treat superseded indexes correctly#2839
aawsome wants to merge 1 commit intorestic:masterfrom
aawsome:use-superseded

aawsome commented Jul 17, 2020

Uh oh!

MichaelEischer commented Jul 25, 2020 •

edited

Loading

Uh oh!

aawsome commented Jul 25, 2020 •

edited

Loading

Uh oh!

aawsome commented Aug 2, 2020 •

edited

Loading

Uh oh!

aawsome commented Aug 2, 2020

Uh oh!

MichaelEischer commented Oct 1, 2020

Uh oh!

MichaelEischer left a comment •

edited

Loading

Uh oh!

MichaelEischer Aug 18, 2021

Uh oh!

MichaelEischer Aug 18, 2021

Uh oh!

MichaelEischer commented Jan 20, 2024

Uh oh!

aawsome commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if idx.Final() {
	ids, err := idx.IDs()
	if err != nil {
	debug.Log("index %d does not have an ID: %v", err)
	return err
	}

	debug.Log("adding index ids %v to supersedes field", ids)

	err = newIndex.AddToSupersedes(ids...)
	if err != nil {
	return err
	}
	obsolete.Merge(restic.NewIDSet(ids...))
	} else {
	debug.Log("index %d isn't final, don't add to supersedes field", i)
	}

Conversation

aawsome commented Jul 17, 2020

What is the purpose of this change? What does it change?

Was the change discussed in an issue or in the forum before?

Checklist

Uh oh!

MichaelEischer commented Jul 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Jul 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Aug 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Aug 2, 2020

Uh oh!

MichaelEischer commented Oct 1, 2020

Uh oh!

MichaelEischer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaelEischer Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

MichaelEischer Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

MichaelEischer commented Jan 20, 2024

Uh oh!

aawsome commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelEischer commented Jul 25, 2020 •

edited

Loading

aawsome commented Jul 25, 2020 •

edited

Loading

aawsome commented Aug 2, 2020 •

edited

Loading

MichaelEischer left a comment •

edited

Loading