Skip to content

Repository format v2 #628

@fd0

Description

@fd0

I'd like to start the discussion on changing the repository format to version 2. This is needed in order to support compression (see #21).

The following list will be updated when new proposals come in.

Accepted:

  • Pack files: Move the header to the start of the file. At the moment, the header is at the end. I thought that it'd be nice to just write the file and when that is done write the header. However it turned out that in order to be able to retry failed backend requests, we need to buffer the file locally anyway. So we can write the content (blobs) to a tempfile, and then write the header when uploading the pack file to the backend. This allows reading the header more easily, since we don't need to start from the end of the file.
  • Pack files: At the moment the pack file header is a custom binary structure (see the design document). This is inflexible, requires a custom parser and does not allow extension without changing the repository format. I'd like to rebuild the pack header as a JSON data structure, similar to the way the tree objects are stored in the repo. This allows extension without having to change the underlying data format.
  • Pack files/Index: When the pack header is changed, add support for compression (algorithm, compressed/uncompressed length). Also add the compressed/uncompressed size to the index files.
  • Snapshot files: Allow packed snapshots so that having a lot of snapshots becomes usable (cf [architecture] Performance of restic snapshots with high-latency remote #523)
  • Add a README file into new repositories which describes what this directory contains.
  • Remove username and hostname from key files (Key metadata is stored unencrypted #2128)

To be discussed:

  • Is there a way to add error-correcting codes to the files? Other ideas to recover from data errors?
  • Change the Index format to improve memory usage
  • Add an encryption indirection: Write down in the header which key is used for authentication/encryption of each blob (so we can implement Support asymmetric backups #187 easier later on)

Postponed/rejected:

  • Switch to a faster hash function (SHA3/Keccak/Blake2 instead of SHA256)
  • Support asymmetric crypto

Anything else?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions