Skip to content

Absolute Path in Storage Directory prevents Moving #74231

@smklein

Description

@smklein

My Assumption

Based on comments from cockroach devs, I'm under the impression that storage directories should be movable:

moving data should work as you describe. A data directory isn’t tied to its location or the underlying file system (well, shouldn’t be and we’ve never seen any bugs with that regard in our testing).

(some issue) is likely because we're using an absolute path somewhere. This shouldn't happen as this makes it impossible to move the data directories.

This is a big assumption of mine! If this is wrong, the rest of this issue should likely be considered somewhat differently.

Background

At my company, we've been using cockroach as our storage layer. For the vast majority of our tests, we're interested in running it locally as a single node. However, doing so, we noticed some latency in populating tables + indices for every single test. As an optimization, we do the following:

  1. During our build, run cockroach as a single-node, using a "well-known" storage directory. Populate the tables + indices once. Terminate cockroach. We refer to this storage directory as our "seed" storage.
  2. For all subsequent tests, copy the seed storage directory used in step (1). Each test gets a unique storage directory, which should look identical (to start), and which should never overlap.

This cuts out several seconds of latency for all tests, which we thought was a big win, except...

The Problem

We've been seeing corruptions and test flakes, and have largely narrowed it down to the temp-dirs-record.txt being used in the storage directory: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/config.go#L74

This file contains an absolute path to the original storage directory, which is not updated, even after moving the storage directory. This means that if we create a storage directory in:

/tmp/foo it'll contain a temp-dirs-record.txt file that always points to /tmp/foo, even if the storage directory is moved to /tmp/bar.

In our use-case, this means the temporary directory in the "seed" storage directory may be modified concurrently by a number of cockroach instances, which is unexpected. However, even in the case of "simply moving" the storage directory, it seems like a bad idea to keep a reference like an absolute path around to an old directory.

This concurrent access by multiple cockroach instances has resulted in a variety of bad behaviors:

  • I've seen ERROR: could not cleanup temporary directories from record file: unlinkat /tmp/crdb-base/cockroach-temp315287726: directory not empty, as multiple cockroach instances attempt clearing the directory concurrently
  • We've seen data corruption just starting new cockroach processes, since presumably they don't expect to all share a temp directory.

The Proposed Fix

Would it be possible for all paths stored in storage directories to be relative to the base of the storage directory?
This would enable both the moving and "cloning" use-case we described.

Environment:

  • CockroachDB version: Observed in 21.1.10, still seems present in 21.1.12
  • Server OS: Linux

Jira issue: CRDB-11980

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-communityOriginated from the communityT-storageStorage TeamX-blathers-triagedblathers was able to find an ownerX-staleno-issue-activity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions