Skip to content

libroach: timebound iterator metadata doesn't consider intents #28358

@danhhz

Description

@danhhz

Note: I haven't confirmed this behavior in practice, it's currently theoretical from reading code. Holding off on the severity label until it's confirmed. The fix will need to be backported into 2.0

When computing which timestamps are present in an sstable for the time bound iterator performance optimization, all keys without a timestamp are ignored, which includes intents. This means that certain (probably unlikely) compactions will result in a situation where a time bounded iterator reading from [t1,t2] will miss an intent at t2.

rocksdb::Status AddUserKey(const rocksdb::Slice& user_key, const rocksdb::Slice& value,
rocksdb::EntryType type, rocksdb::SequenceNumber seq,
uint64_t file_size) override {
rocksdb::Slice unused;
rocksdb::Slice ts;
if (SplitKey(user_key, &unused, &ts) && !ts.empty()) {
ts.remove_prefix(1); // The NUL prefix.
if (ts_max_.empty() || ts.compare(ts_max_) > 0) {
ts_max_.assign(ts.data(), ts.size());
}
if (ts_min_.empty() || ts.compare(ts_min_) < 0) {
ts_min_.assign(ts.data(), ts.size());
}
}
return rocksdb::Status::OK();
}

Backup uses timebounded iterators and does a read at a certain timestamp until all intents below that timestamp have been resolved one way or another. This bug means that backup could incorrectly proceed with an unresolved intent. If this intent is later rolled back, the backup would contain a value that was never committed.

CDC will have similar correctness issues.

Several fixes are possible with various performance tradeoffs. An partial list:

  • Include intents in the time bound metadata. This requires unmarshalling every mvcc metadata proto in the sst and is probably too expensive, especially for inline values like timeseries.
  • Disable time bound metadata for any sstable containing intents. This will cause the time bound iterator to always look in the file, so will be correct but a performance pessimization.
  • Some combination of the two based on how many intents are in the file or where in the keyspace it is.

Metadata

Metadata

Assignees

Labels

A-cdcChange Data CaptureA-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-1High impact: many users impacted, serious risk of high unavailability or data loss

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions