Skip to content

Incompatible WAL format check on startup #25503

@gwossum

Description

@gwossum

Overview

The .wrr WAL format is incompatible with the OSS 2.x WAL format. In order to avoid data loss when performing an in-place conversion to OSS 2.x, a check should be implemented for .wrr files that have not been committed to TSM files. If uncommitted .wrr files are found, startup should abort with an error informing the user to get the .wrr WAL files committed before proceeding with the in-place conversion.

A .wrr file is uncommitted to a TSM file if there is not a .wrr.snapshot newer than a .wrr file. Conversely, if there are .wrr.snapshot files newer than a given .wrr file, that .wrr file is considered committed and should not block startup, as there is no possibility of data loss with committed .wrr.

Changes required

  1. On startup, scan WAL directories for .wrr and .wrr.snapshot files and sort the resulting file set.
  2. If there are .wrr files newer than than the neweset .wrr.snapshot file, abort startup with an explanatory error.

Remediation

  • The preferred way to handle uncommitted .wrr files is to start an InfluxDB edition that uses .wrr files with the --storage-wal-flush-on-shutdown flag, then cleanly shut down that InfluxDB instance to get all .wrr files committed.
  • The .wrr files could also be deleted or moved out of the way, but this will result in losing any uncommitted data in the .wrr files.

Tooling recommendations

No operational changes are required for standard InfluxDB OSS 2.x installations. For applications that may switch between InfluxDB OSS 2.x and editions which use the .wrr WAL format, the following recommendations apply:

  1. Start influxd processes with the --storage-wal-flush-on-shutdown flag set.
  2. A clean shutdown should occur before switching between versions with incompatible WAL formats.
  3. The use of a PID file (--pid-file=/path/to/pidfile) may be helpful to determine when the instance finishes shutting down.
  4. When switching to an OSS 2.x instance, ensure that there are no .wrr files newer than the newest .wrr.snapshot file before performing the switch. Follow remediation steps if there are .wrr files newer than the newest .wrr.snapshot file.
  5. Once the .wrr files have been properly committed to TSM files, the switch to InfluxDB OSS 2.x can happen without issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions