Skip to content

Feature Request: change vtbackup_duration_by_phase to binary vtbackup_duration #12972

@maxenglander

Description

@maxenglander

Feature Description

After using vtbackup_duration_by_phase for a few weeks in production, I can confidently say that they are pretty awkward to use.

I recommend changing this metric to vtbackup_phase, a binary valued gauge similar to K8s metrics like kube_pod_status_phase. Here's an example of what these metrics could look like:

# HELP vtbackup_phase Active phase.
# TYPE vtbackup_phase gauge
vtbackup_phase{phase="CatchUpReplication"} 0
vtbackup_phase{phase="InitialBackup"} 0
vtbackup_phase{phase="RestoreLastBackup"} 0
vtbackup_phase{phase="TakeNewBackup"} 1

At any given moment, only one phase would be active. In order to calculate how long a phase has been active, you could do something like this:

sum_over_time(vtbackup_phase{phase="TakeNewBackup"}) * <interval>

Where <interval> is the number of seconds between data points.

Use Case(s)

Some issues that would be resolved by the proposed change.

  1. vtbackup currently doesn't report that a phase as active. It only reports the phase duration once that phase completes. This means that there's no way to tell what phase vtbackup is currently in, unless you know enough about the internals of the program to infer the current state from other metrics and logs.
  2. If vtbackup exits before completing a phase, it won't report the time it spent in that phase.
  3. After completing the last phase (TakeNewBackup), vtbackup exits pretty much right away. This means that there might only be a few seconds between vtbackup reporting that phase for the first time and vtbackup exiting, which might not be enough time for the metric collector (e.g. Prometheus) to have a chance to collect that metric. This necessitates using something awkward like --keep-alive-timeout to get keep vtbackup alive long enough for the collector to do at least one scrape.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions