Skip to content

Hitting max map count can cause data to become unqueryable #23172

@lesam

Description

@lesam

Steps to reproduce:
List the minimal actions needed to reproduce the behavior.

  1. Set vm.max_mmap_count 'too low'
  2. Create many (likely 1000's) of shards with TSI enabled

Expected behavior:

  • The database should gracefully fail with a complaint about vm.max_mmap_count being too low - this is now in Log when approaching linux kernel limits #23439
  • The database should be more aware of different errors that occur during opening of TSM files, and only move the file aside for an error that actually indicates corruption.

Actual behavior:

  1. Because we are out of mmap space, random memory allocations fail, crashing the application
  2. Because we are out of mmap space, opening TSM files fails in a way that causes the TSM files to be marked as corrupt and moved aside.

Environment info:

  • System info: Any linux version
  • InfluxDB version: 1.9.6
  • Other relevant environment details: n/a

Logs:

The most concerning errors are: Cannot read corrupt tsm file, renaming, which moves the TSM files aside even though they are not actually corrupt.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions