Skip to content

Bug Report: Enabling the transaction throttler can lead to vttablet crash #12619

@ejortegau

Description

@ejortegau

Overview of the Issue

Enabling the transaction throttler can lead to vttablet segfaulting:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x100ebf282]

goroutine 496 [running]:
vitess.io/vitess/go/vt/throttler.replicationLagRecord.lag(...)
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/replication_lag_record.go:40
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).recalculateRate(0xc00039c540, {{0xc0fbf3ac9824f050, 0x11539634, 0x1055b9260}, {{0x0, 0x0}, 0xc0002c01a0, 0xc0006a2360, 0x0, 0x0, ...}})
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:306 +0x122
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).processRecord(0xc00039c540, {{0xc0fbf3ac9824f050, 0x11539634, 0x1055b9260}, {{0x0, 0x0}, 0xc0002c01a0, 0xc0006a2360, 0x0, 0x0, ...}})
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:272 +0x165
vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).ProcessRecords(0xc00039c540)
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:261 +0x125
created by vitess.io/vitess/go/vt/throttler.(*MaxReplicationLagModule).Start
        /Users/eduardo.ortega/git/vitess/go/vt/throttler/max_replication_lag_module.go:169 +0x8e

Process finished with the exit code 2

This comes from the MaxReplicationLagModule trying to process a lagRecord with nil Stats.

Reproduction Steps

  1. Start the local deployment with ./101_initial_cluster.sh.
  2. Bring down the primary vttablet with CELL=zone1 KEYSPACE=commerce TABLET_UID=100 bash -x ../common/scripts/vttablet-down.sh
  3. Start it up again, adding the TxThrottler arguments shown below:
--enable-tx-throttler --tx-throttler-config "target_replication_lag_sec: 5 max_replication_lag_sec: 20 initial_rate: 10000 max_increase: 1 emergency_decrease: 0.5 min_duration_between_increases_sec: 2 max_duration_between_increases_sec:5 min_duration_between_decreases_sec: 1 spread_backlog_across_sec: 1 age_bad_rate_after_sec: 180 bad_rate_increase: 0.1 max_rate_approach_threshold: 0.9" --tx-throttler-healthcheck-cells zone1
  1. See it crash with the message above.

Binary Version

[ 8060 ] eduardo.ortega@eduardo-ltmnjyh ~/git/vitess (main)% ➜    ./bin/vttablet --version
Version: 17.0.0-SNAPSHOT (Git revision c89fa57b6330a4a878aea82350f8525b50ad4b77 branch 'main') built on Mon Mar 13 18:00:40 CET 2023 by eduardo.ortega@eduardo-ltmnjyh.internal.salesforce.com using go1.20.2 darwin/amd64

Operating System and Environment details

macOS Ventura 13.2.1
Darwin 22.3.0
x86_64

Log Fragments

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions