-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Overview
The .wrr WAL format is incompatible with the OSS 2.x WAL format. In order to avoid data loss when performing an in-place conversion to OSS 2.x, a check should be implemented for .wrr files that have not been committed to TSM files. If uncommitted .wrr files are found, startup should abort with an error informing the user to get the .wrr WAL files committed before proceeding with the in-place conversion.
A .wrr file is uncommitted to a TSM file if there is not a .wrr.snapshot newer than a .wrr file. Conversely, if there are .wrr.snapshot files newer than a given .wrr file, that .wrr file is considered committed and should not block startup, as there is no possibility of data loss with committed .wrr.
Changes required
- On startup, scan WAL directories for
.wrrand.wrr.snapshotfiles and sort the resulting file set. - If there are
.wrrfiles newer than than the neweset.wrr.snapshotfile, abort startup with an explanatory error.
Remediation
- The preferred way to handle uncommitted
.wrrfiles is to start an InfluxDB edition that uses.wrrfiles with the--storage-wal-flush-on-shutdownflag, then cleanly shut down that InfluxDB instance to get all.wrrfiles committed. - The
.wrrfiles could also be deleted or moved out of the way, but this will result in losing any uncommitted data in the.wrrfiles.
Tooling recommendations
No operational changes are required for standard InfluxDB OSS 2.x installations. For applications that may switch between InfluxDB OSS 2.x and editions which use the .wrr WAL format, the following recommendations apply:
- Start
influxdprocesses with the--storage-wal-flush-on-shutdownflag set. - A clean shutdown should occur before switching between versions with incompatible WAL formats.
- The use of a PID file (--pid-file=/path/to/pidfile) may be helpful to determine when the instance finishes shutting down.
- When switching to an OSS 2.x instance, ensure that there are no
.wrrfiles newer than the newest.wrr.snapshotfile before performing the switch. Follow remediation steps if there are.wrrfiles newer than the newest.wrr.snapshotfile. - Once the
.wrrfiles have been properly committed to TSM files, the switch to InfluxDB OSS 2.x can happen without issue.