etcd fails to start after power failure

### Bug report criteria

- [X] This bug report is not security related, security issues should be disclosed privately via security@etcd.io.
- [X] This is not a support request, support requests should be raised in the etcd [discussion forums](https://github.com/etcd-io/etcd/discussions).
- [X] You have read the etcd [bug reporting guidelines](https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/reporting_bugs.md).
- [x] Existing open issues along with etcd [frequently asked questions](https://etcd.io/docs/latest/faq) have been checked and this is not a duplicate.

### What happened?

After experiencing a power failure while an etcd server is bootstrapping, the server is no longer able to recover and restart again.
  
This issue occurs in both single-node and three-node clusters. The root cause of the problem is that some writes to the `member/snap/db` file exceed the common size of a page at the page cache. This can result in a "torn write" scenario where only part of the write's payload is persisted while the rest is not, since the pages of the page cache can be flushed out of order. There are several references about this problem:
- https://www.usenix.org/conference/osdi14/technical-sessions/presentation/pillai
- https://dl.acm.org/doi/pdf/10.1145/2872362.2872406
- https://mariadb.com/kb/en/atomic-write-support/
- https://pages.cs.wisc.edu/~remzi/OSTEP/file-journaling.pdf (page 9)


### What did you expect to happen?

That the server where the power failure happened restarted correctly. 

### How can we reproduce it (as minimally and precisely as possible)?

This issue can be replicated using [LazyFS](https://github.com/dsrhaslab/lazyfs), which is now capable of simulating out of order persistence of file system pages, at the disk. The main problem is a write to the file `member/snap/db` which is 16384 bytes long. LazyFS will persist portions (in sizes of 8192 bytes) of this write out of order and will crash, simulating a power failure. 
To reproduce this problem, one can follow these steps:

1. Mount LazyFS on a directory where etcd data will be saved, with a specified root directory. Assuming the data path for etcd is `/home/data/data.etcd`and the root directory is `/home/data-root/data.etcd`, add the following lines to the default configuration file (located in the `config/default.toml` directory):
```
[[injection]]
type="split_write"
file="/home/data-r/data.etcd/member/snap/db"
persist=[2]
parts=2
occurrence=1
```
These lines define a fault to be injected. A power failure will be simulated after writing to the `/home/data-r/data.etcd/member/snap/db` file. Since this write is large (16384 bytes), it is split into 2 **parts** (each with 8192 bytes), and only the second part is **persisted**. Specify that it's the first write issued to this file (with the parameter **occurrence**).

2. Start LazyFS with the following command:
`./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/data.etcd -r /home/data-r/data.etcd -f`

3. Start etcd with the command `./etcd --data-dir '/home/data/data.etcd'`. 

Immediately after this step, etcd will shut down because LazyFS was unmounted, simulating the power failure. At this point, you can analyze the logs produced by LazyFS to see the system calls issued until the moment of the fault. Here is a simplified version of the log:
```
{'syscall': 'create', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/.touch', 'mode': 'O_TRUNC'}
{'syscall': 'release', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/.touch'}
{'syscall': 'create', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/member/snap/.touch', 'mode': 'O_TRUNC'}
{'syscall': 'release', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/member/snap/.touch'}
{'syscall': 'create', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/member/snap/db', 'mode': 'O_RDWR'}
{'syscall': 'write', 'path': '/home/gsd/etcd-v3.4.25-linux-amd64/data-r/data.etcd/member/snap/db', 'size': '16384', 'off': '0'}
{'syscall': 'fault'}
```

4. Remove the fault from the configuration file, unmount the filesystem with `fusermount -uz /home/data/data.etcd` 
5. Mount LazyFS again with the previously provided command.
6. Attemp to start etcd (it fails).

By following these steps, you can replicate the issue and analyze the effects of the power failure on etcd's restart process. 

The same problem (but with a different error) happens when we persist the first 8192 bytes of the write (for this change the parameter `persist` to \[1\]).

Note that no problem happens when `persist` is changed to \[1,2\]. The whole write will be persisted and etcd will succeed to restart.

### Anything else we need to know?

Here is the output produced by etcd on restarting. The first file corresponds to the error reported after only persisting the first 8192 bytes of the `member/snap/db` file and the second file to the error reported after only persisting the second 8192 bytes of the `member/snap/db` file. 
[persist_first_part.txt](https://github.com/etcd-io/etcd/files/12607696/persist_first_part.txt)
[persist_second_part.txt](https://github.com/etcd-io/etcd/files/12607697/persist_second_part.txt)


### Etcd version (please run commands below)

<details>

```console
$ etcd --version
etcd Version: 3.4.25
Git SHA: 94593e63d
Go Version: go1.19.8
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.4.25
API version: 3.4
```

</details>


### Etcd configuration (command line flags or environment variables)

<details>

--data-dir 'data/data.etcd'

</details>


### Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

<details>

```console
$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
```

</details>


### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd fails to start after power failure #16596

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

etcd fails to start after power failure #16596

Description

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions