-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Repository format v2 #628
Copy link
Copy link
Closed
Labels
category: projectmisc: repository formatissues requiring repository format changesissues requiring repository format changestype: discussionundecided topics needing supplementary inputundecided topics needing supplementary input
Description
I'd like to start the discussion on changing the repository format to version 2. This is needed in order to support compression (see #21).
The following list will be updated when new proposals come in.
Accepted:
- Pack files: Move the header to the start of the file. At the moment, the header is at the end. I thought that it'd be nice to just write the file and when that is done write the header. However it turned out that in order to be able to retry failed backend requests, we need to buffer the file locally anyway. So we can write the content (blobs) to a tempfile, and then write the header when uploading the pack file to the backend. This allows reading the header more easily, since we don't need to start from the end of the file.
- Pack files: At the moment the pack file header is a custom binary structure (see the design document). This is inflexible, requires a custom parser and does not allow extension without changing the repository format. I'd like to rebuild the pack header as a JSON data structure, similar to the way the tree objects are stored in the repo. This allows extension without having to change the underlying data format.
- Pack files/Index: When the pack header is changed, add support for compression (algorithm, compressed/uncompressed length). Also add the compressed/uncompressed size to the index files.
- Snapshot files: Allow packed snapshots so that having a lot of snapshots becomes usable (cf [architecture] Performance of
restic snapshotswith high-latency remote #523) - Add a
READMEfile into new repositories which describes what this directory contains. - Remove username and hostname from key files (Key metadata is stored unencrypted #2128)
To be discussed:
- Is there a way to add error-correcting codes to the files? Other ideas to recover from data errors?
- Change the Index format to improve memory usage
- Add an encryption indirection: Write down in the header which key is used for authentication/encryption of each blob (so we can implement Support asymmetric backups #187 easier later on)
Postponed/rejected:
- Switch to a faster hash function (SHA3/Keccak/Blake2 instead of SHA256)
- Support asymmetric crypto
Anything else?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
category: projectmisc: repository formatissues requiring repository format changesissues requiring repository format changestype: discussionundecided topics needing supplementary inputundecided topics needing supplementary input