Bug Report
When encryption (TDE) is enabled, file operations like delete file/rename file/copy file are non-atomic. For these operations, we need to update encryption metadata via encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. When TiKV crash in between file operation and encryption metadata update, encryption metadata could be inconsistent with the file system. This is fine, providing that as long as an encrypted file exist in the file system, the corresponding entry in encryption metadata always exists. And whenever we find there's an obsolete entry in the metadata without corresponding file, the entry should be ignored and removed. Current implementation is not consistent for all DataKeyManager call sites. We need to refactor the code to provide helper method that work in the following way:
- delete file: works in the order 1. fs::delete_file(), 2. key_manager.delete_file()
- rename file: works in the order 1. key_manager.new_file(dst), 2. fs::rename_file(), 3. key_manager.delete_file(src)
- link (copy) file: works in the order 1. key_manager.new_file(dst), 2. fs::copy_file()
And whenever these operations see unexpected existing entry in metadata (for example, when link file and see metadata for dst already exists), check if the file exists in the file system. If not, ignore and remove the entry.
The task include changing TiKV code and RocksDB code.
What version of TiKV are you using?
master, 4.0.x
What operating system and CPU are you using?
N/A
Steps to reproduce
#9099 for example
What did you expect?
encryption logic should handle non-atomic file operations like delete/rename/link files correctly
What did happened?
Bug Report
When encryption (TDE) is enabled, file operations like delete file/rename file/copy file are non-atomic. For these operations, we need to update encryption metadata via
encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. When TiKV crash in between file operation and encryption metadata update, encryption metadata could be inconsistent with the file system. This is fine, providing that as long as an encrypted file exist in the file system, the corresponding entry in encryption metadata always exists. And whenever we find there's an obsolete entry in the metadata without corresponding file, the entry should be ignored and removed. Current implementation is not consistent for allDataKeyManagercall sites. We need to refactor the code to provide helper method that work in the following way:And whenever these operations see unexpected existing entry in metadata (for example, when link file and see metadata for dst already exists), check if the file exists in the file system. If not, ignore and remove the entry.
The task include changing TiKV code and RocksDB code.
What version of TiKV are you using?
master, 4.0.x
What operating system and CPU are you using?
N/A
Steps to reproduce
#9099 for example
What did you expect?
encryption logic should handle non-atomic file operations like delete/rename/link files correctly
What did happened?