Skip to content

gc: use *.snap st_mtime to schedule after restart#11074

Merged
sergepetrenko merged 1 commit intotarantool:masterfrom
LevKats:gh-9820-checkpoint-interval-after-restart
Mar 27, 2025
Merged

gc: use *.snap st_mtime to schedule after restart#11074
sergepetrenko merged 1 commit intotarantool:masterfrom
LevKats:gh-9820-checkpoint-interval-after-restart

Conversation

@LevKats
Copy link
Contributor

@LevKats LevKats commented Jan 31, 2025

Introduce the timestamp field in gc_checkpoint so now gc.{c,h} are aware of actual times of checkpoints, which is important since this subsystem is responsible for scheduling. Now, one can track the unix time of a new checkpoint with new timestamp argument of gc_add_checkpoint. This change allows us to track previous checkpoints made before the server restart and even checkpoint_interval value reconfiguring.

This approach was chosen instead of just scanning the snap_dir in gc.c because it was engine-independent. One may also notice that even if the actual time after the last snapshot before the restart is greater than 2 * checkpoint_interval we won't start checkpointing immediately because that may cause high disk load in case of multiple instances. So in this case we just schedule a checkpoint at a random moment in the first checkpoint_interval seconds after the restart. It seems like even with this scheduling strategy a snapshot will be eventually created even during constant restarting.

Fixes #9820
NO_DOC=bugfix

@LevKats LevKats requested a review from Serpentian January 31, 2025 17:14
@LevKats LevKats requested a review from a team as a code owner January 31, 2025 17:14
@coveralls
Copy link

coveralls commented Jan 31, 2025

Coverage Status

coverage: 87.47% (+0.02%) from 87.455%
when pulling 3150a20 on LevKats:gh-9820-checkpoint-interval-after-restart
into f001417
on tarantool:master
.

@LevKats LevKats requested a review from nshy February 5, 2025 11:20
@nshy nshy unassigned nshy and Serpentian Feb 5, 2025
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch 2 times, most recently from 85f4157 to 339e7a7 Compare February 11, 2025 07:28
@LevKats LevKats requested a review from nshy February 11, 2025 07:55
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch 2 times, most recently from ba54302 to b2c20d6 Compare February 18, 2025 15:01
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch from b2c20d6 to 90d1cdd Compare February 21, 2025 14:56
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch 3 times, most recently from 2f3e401 to 83e8b0f Compare February 28, 2025 17:05
@LevKats LevKats requested a review from nshy February 28, 2025 17:06
@nshy nshy assigned LevKats and unassigned nshy Mar 3, 2025
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch from 83e8b0f to 06fd730 Compare March 7, 2025 17:18
@nshy nshy self-requested a review March 24, 2025 08:13
@nshy nshy removed their assignment Mar 24, 2025
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch from c3c3fe4 to 95c758d Compare March 25, 2025 06:52
@LevKats LevKats requested a review from nshy March 25, 2025 07:16
@nshy nshy removed their assignment Mar 25, 2025
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch from 95c758d to 279a2d9 Compare March 26, 2025 08:08
Introduce the `timestamp` field in `gc_checkpoint` so now `gc.{c,h}` are
aware of actual times of checkpoints, which is important since this
subsystem is responsible for scheduling. Now, one can track the unix time
of a new checkpoint with new `timestamp` argument of `gc_add_checkpoint`.
This change allows us to track previous checkpoints made before the
server restart and even `checkpoint_interval` value reconfiguring.

This approach was chosen instead of just scanning the `snap_dir` in
`gc.c` because it was engine-independent. One may also notice that even
if the actual time after the last snapshot before the restart is greater
than `2 * checkpoint_interval` we won't start checkpointing immediately
because that may cause high disk load in case of multiple instances. So
in this case we just schedule a checkpoint at a random moment in the
first `checkpoint_interval` seconds after the restart. It seems like
even with this scheduling strategy a snapshot will be eventually created
even during constant restarting.

Fixes tarantool#9820
NO_DOC=bugfix
@LevKats LevKats force-pushed the gh-9820-checkpoint-interval-after-restart branch from 279a2d9 to 3150a20 Compare March 26, 2025 08:18
@LevKats LevKats requested a review from nshy March 26, 2025 09:37
@nshy nshy removed their assignment Mar 26, 2025
@LevKats LevKats added the full-ci Enables all tests for a pull request label Mar 26, 2025
@sergepetrenko sergepetrenko added backport/2.11 Automatically create a 2.11 backport PR backport/3.2 Automatically create a 3.2 backport PR backport/3.3 Automatically create a 3.3 backport PR and removed full-ci Enables all tests for a pull request labels Mar 27, 2025
@sergepetrenko sergepetrenko merged commit d590c72 into tarantool:master Mar 27, 2025
95 of 97 checks passed
@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/2.11:

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.2:

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.3:

@TarantoolBot
Copy link
Collaborator

Backport summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/2.11 Automatically create a 2.11 backport PR backport/3.2 Automatically create a 3.2 backport PR backport/3.3 Automatically create a 3.3 backport PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

checkpoint_interval doesn't account the time from the last snapshot creation to the restart

7 participants