Skip to content

encryption: handle non-atomic file operations #9115

@yiwu-arbug

Description

@yiwu-arbug

Bug Report

When encryption (TDE) is enabled, file operations like delete file/rename file/copy file are non-atomic. For these operations, we need to update encryption metadata via encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. When TiKV crash in between file operation and encryption metadata update, encryption metadata could be inconsistent with the file system. This is fine, providing that as long as an encrypted file exist in the file system, the corresponding entry in encryption metadata always exists. And whenever we find there's an obsolete entry in the metadata without corresponding file, the entry should be ignored and removed. Current implementation is not consistent for all DataKeyManager call sites. We need to refactor the code to provide helper method that work in the following way:

  • delete file: works in the order 1. fs::delete_file(), 2. key_manager.delete_file()
  • rename file: works in the order 1. key_manager.new_file(dst), 2. fs::rename_file(), 3. key_manager.delete_file(src)
  • link (copy) file: works in the order 1. key_manager.new_file(dst), 2. fs::copy_file()

And whenever these operations see unexpected existing entry in metadata (for example, when link file and see metadata for dst already exists), check if the file exists in the file system. If not, ignore and remove the entry.

The task include changing TiKV code and RocksDB code.

What version of TiKV are you using?

master, 4.0.x

What operating system and CPU are you using?

N/A

Steps to reproduce

#9099 for example

What did you expect?

encryption logic should handle non-atomic file operations like delete/rename/link files correctly

What did happened?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions