My Assumption
Based on comments from cockroach devs, I'm under the impression that storage directories should be movable:
moving data should work as you describe. A data directory isn’t tied to its location or the underlying file system (well, shouldn’t be and we’ve never seen any bugs with that regard in our testing).
(some issue) is likely because we're using an absolute path somewhere. This shouldn't happen as this makes it impossible to move the data directories.
This is a big assumption of mine! If this is wrong, the rest of this issue should likely be considered somewhat differently.
Background
At my company, we've been using cockroach as our storage layer. For the vast majority of our tests, we're interested in running it locally as a single node. However, doing so, we noticed some latency in populating tables + indices for every single test. As an optimization, we do the following:
- During our build, run cockroach as a single-node, using a "well-known" storage directory. Populate the tables + indices once. Terminate cockroach. We refer to this storage directory as our "seed" storage.
- For all subsequent tests, copy the seed storage directory used in step (1). Each test gets a unique storage directory, which should look identical (to start), and which should never overlap.
This cuts out several seconds of latency for all tests, which we thought was a big win, except...
The Problem
We've been seeing corruptions and test flakes, and have largely narrowed it down to the temp-dirs-record.txt being used in the storage directory: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/config.go#L74
This file contains an absolute path to the original storage directory, which is not updated, even after moving the storage directory. This means that if we create a storage directory in:
/tmp/foo it'll contain a temp-dirs-record.txt file that always points to /tmp/foo, even if the storage directory is moved to /tmp/bar.
In our use-case, this means the temporary directory in the "seed" storage directory may be modified concurrently by a number of cockroach instances, which is unexpected. However, even in the case of "simply moving" the storage directory, it seems like a bad idea to keep a reference like an absolute path around to an old directory.
This concurrent access by multiple cockroach instances has resulted in a variety of bad behaviors:
- I've seen
ERROR: could not cleanup temporary directories from record file: unlinkat /tmp/crdb-base/cockroach-temp315287726: directory not empty, as multiple cockroach instances attempt clearing the directory concurrently
- We've seen data corruption just starting new cockroach processes, since presumably they don't expect to all share a temp directory.
The Proposed Fix
Would it be possible for all paths stored in storage directories to be relative to the base of the storage directory?
This would enable both the moving and "cloning" use-case we described.
Environment:
- CockroachDB version: Observed in 21.1.10, still seems present in 21.1.12
- Server OS: Linux
Jira issue: CRDB-11980
My Assumption
Based on comments from cockroach devs, I'm under the impression that storage directories should be movable:
This is a big assumption of mine! If this is wrong, the rest of this issue should likely be considered somewhat differently.
Background
At my company, we've been using cockroach as our storage layer. For the vast majority of our tests, we're interested in running it locally as a single node. However, doing so, we noticed some latency in populating tables + indices for every single test. As an optimization, we do the following:
This cuts out several seconds of latency for all tests, which we thought was a big win, except...
The Problem
We've been seeing corruptions and test flakes, and have largely narrowed it down to the
temp-dirs-record.txtbeing used in the storage directory: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/config.go#L74This file contains an absolute path to the original storage directory, which is not updated, even after moving the storage directory. This means that if we create a storage directory in:
/tmp/fooit'll contain atemp-dirs-record.txtfile that always points to/tmp/foo, even if the storage directory is moved to/tmp/bar.In our use-case, this means the temporary directory in the "seed" storage directory may be modified concurrently by a number of cockroach instances, which is unexpected. However, even in the case of "simply moving" the storage directory, it seems like a bad idea to keep a reference like an absolute path around to an old directory.
This concurrent access by multiple cockroach instances has resulted in a variety of bad behaviors:
ERROR: could not cleanup temporary directories from record file: unlinkat /tmp/crdb-base/cockroach-temp315287726: directory not empty, as multiple cockroach instances attempt clearing the directory concurrentlyThe Proposed Fix
Would it be possible for all paths stored in storage directories to be relative to the base of the storage directory?
This would enable both the moving and "cloning" use-case we described.
Environment:
Jira issue: CRDB-11980