Skip to content

vinyl: fix gc vs vylog race leading to duplicate record#10130

Merged
locker merged 1 commit intotarantool:masterfrom
locker:vy-invalid-vylog-fix
Jun 13, 2024
Merged

vinyl: fix gc vs vylog race leading to duplicate record#10130
locker merged 1 commit intotarantool:masterfrom
locker:vy-invalid-vylog-fix

Conversation

@locker
Copy link
Member

@locker locker commented Jun 11, 2024

Vinyl run files aren't always deleted immediately after compaction, because we need to keep run files corresponding to checkpoints for backups. Such run files are deleted by the garbage collection procedure, which performs the following steps:

  1. Loads information about all run files from the last vylog file.
  2. For each loaded run record that is marked as dropped: a. Tries to remove the run files. b. On success, writes a "forget" record for the dropped run, which will make vylog purge the run record on the next vylog rotation (checkpoint).

(see vinyl_engine_collect_garbage())

The garbage collection procedure writes the "forget" records asynchronously using vy_log_tx_try_commit(), see vy_gc_run(). This procedure can be successfully executed during vylog rotation, because it doesn't take the vylog latch. It simply appends records to a memory buffer which is flushed either on the next synchronous vylog write or vylog recovery.

The problem is that the garbage collection isn't necessarily loads the latest vylog file because the vylog file may be rotated between it calls vy_log_signature() and vy_recovery_new(). This may result in a "forget" record written twice to the same vylog file for the same run file, as follows:

  1. GC loads last vylog N
  2. GC starts removing dropped run files.
  3. CHECKPOINT starts vylog rotation.
  4. CHECKPOINT loads vylog N.
  5. GC writes a "forget" record for run A to the buffer.
  6. GC is completed.
  7. GC is restarted.
  8. GC finds that the last vylog is N and blocks on the vylog latch trying to load it.
  9. CHECKPOINT saves vylog M (M > N).
  10. GC loads vylog N. This triggers flushing the forget record for run A to vylog M (not to vylog N), because vylog M is the last vylog at this point of time.
  11. GC starts removing dropped run files.
  12. GC writes a "forget" record for run A to the buffer again, because in vylog N it's still marked as dropped and not forgotten. (The previous "forget" record was written to vylog M).
  13. Now we have two "forget" records for run A in vylog M.

Such duplicate run records aren't tolerated by the vylog recovery procedure, resulting in a permanent error on the next checkpoint:

ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run XXXX forgotten but not registered

To fix this issue, we move vy_log_signature() under the vylog latch to vy_recovery_new(). This makes sure that GC will see vylog records that it's written during the previous execution.

Catching this race in a function test would require a bunch of ugly error injections so let's assume that it'll be tested by fuzzing.

Closes #10128

@locker locker requested a review from a team as a code owner June 11, 2024 15:36
@locker locker requested a review from nshy June 11, 2024 15:46
@coveralls
Copy link

Coverage Status

coverage: 87.099% (-0.01%) from 87.11%
when pulling 0c8d013 on locker:vy-invalid-vylog-fix
into 9b63ced
on tarantool:master
.

Vinyl run files aren't always deleted immediately after compaction,
because we need to keep run files corresponding to checkpoints for
backups. Such run files are deleted by the garbage collection procedure,
which performs the following steps:

 1. Loads information about all run files from the last vylog file.
 2. For each loaded run record that is marked as dropped:
    a. Tries to remove the run files.
    b. On success, writes a "forget" record for the dropped run,
       which will make vylog purge the run record on the next
       vylog rotation (checkpoint).

(see `vinyl_engine_collect_garbage()`)

The garbage collection procedure writes the "forget" records
asynchronously using `vy_log_tx_try_commit()`, see `vy_gc_run()`.
This procedure can be successfully executed during vylog rotation,
because it doesn't take the vylog latch. It simply appends records
to a memory buffer which is flushed either on the next synchronous
vylog write or vylog recovery.

The problem is that the garbage collection isn't necessarily loads
the latest vylog file because the vylog file may be rotated between
it calls `vy_log_signature()` and `vy_recovery_new()`. This may
result in a "forget" record written twice to the same vylog file
for the same run file, as follows:

  1. GC loads last vylog N
  2. GC starts removing dropped run files.
  3. CHECKPOINT starts vylog rotation.
  4. CHECKPOINT loads vylog N.
  5. GC writes a "forget" record for run A to the buffer.
  6. GC is completed.
  7. GC is restarted.
  8. GC finds that the last vylog is N and blocks on the vylog latch
     trying to load it.
  9. CHECKPOINT saves vylog M (M > N).
 10. GC loads vylog N. This triggers flushing the forget record for
     run A to vylog M (not to vylog N), because vylog M is the last
     vylog at this point of time.
 11. GC starts removing dropped run files.
 12. GC writes a "forget" record for run A to the buffer again,
     because in vylog N it's still marked as dropped and not forgotten.
     (The previous "forget" record was written to vylog M).
 13. Now we have two "forget" records for run A in vylog M.

Such duplicate run records aren't tolerated by the vylog recovery
procedure, resulting in a permanent error on the next checkpoint:

```
ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run XXXX forgotten but not registered
```

To fix this issue, we move `vy_log_signature()` under the vylog latch
to `vy_recovery_new()`. This makes sure that GC will see vylog records
that it's written during the previous execution.

Catching this race in a function test would require a bunch of ugly
error injections so let's assume that it'll be tested by fuzzing.

Closes tarantool#10128

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer
@locker locker force-pushed the vy-invalid-vylog-fix branch from 0c8d013 to c46f10c Compare June 13, 2024 07:29
@coveralls
Copy link

Coverage Status

coverage: 87.097% (-0.03%) from 87.127%
when pulling c46f10c on locker:vy-invalid-vylog-fix
into 19d1f1c
on tarantool:master
.

Copy link
Contributor

@nshy nshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@locker locker assigned locker and unassigned nshy Jun 13, 2024
@locker locker added the full-ci Enables all tests for a pull request label Jun 13, 2024
@locker locker merged commit 9d3859b into tarantool:master Jun 13, 2024
@locker locker deleted the vy-invalid-vylog-fix branch June 13, 2024 12:51
@locker
Copy link
Member Author

locker commented Jun 13, 2024

Cherry-picked to 2.11 and 3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run 9455 forgotten but not registered

4 participants