Skip to content

A cockroachdb service failed to startup after mupdate to c8f8332bc #7221

@leftwo

Description

@leftwo

Dogfood rack was mupdated to omicron commit c8f8332

After the mupdate, everything came online expect for one cockroachdb zone on sled 17:

BRM42220017 # svcs -xZ
svc:/oxide/cockroachdb:default (CockroachDB)
  Zone: oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739
 State: maintenance since Tue Dec 10 16:33:26 2024
Reason: Restarting too quickly.
   See: http://illumos.org/msg/SMF-8000-L5
   See: /pool/ext/ae56280b-17ce-4266-8573-e1da9db6c6bb/crypt/zone/oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739/root/var/svc/log/oxide-cockroachdb:default.log
Impact: This service is not running.

Looking at the log, it does not provide much info other than processes have exited and then restarted:

[ Dec 10 16:31:15 Enabled. ]                                                                                       
[ Dec 10 16:31:15 Rereading configuration. ]
[ Dec 10 16:31:16 Rereading configuration. ]                                                                       
[ Dec 10 16:31:28 Executing start method ("/opt/oxide/lib/svc/manifest/cockroachdb.sh"). ]
+ set -o errexit                          
+ set -o pipefail                                                                                                  
+ . /lib/svc/share/smf_include.sh                                                                                  
++ SMF_EXIT_OK=0                                                                                                                                                                                                                      
++ SMF_EXIT_NODAEMON=94        
++ SMF_EXIT_ERR_FATAL=95                           
++ SMF_EXIT_ERR_CONFIG=96        
++ SMF_EXIT_MON_DEGRADE=97     
++ SMF_EXIT_MON_OFFLINE=98 
++ SMF_EXIT_ERR_NOSMF=99                                                                                           
++ SMF_EXIT_ERR_PERM=100                        
++ svcprop -c -p config/listen_addr svc:/oxide/cockroachdb:default
+ LISTEN_ADDR='[fd00:1122:3344:109::3]:32221'                                                                      
++ svcprop -c -p config/store svc:/oxide/cockroachdb:default
+ DATASTORE=/data                                                                                                  
++ /opt/oxide/internal-dns-cli/bin/dnswait cockroach
++ head -n 5     
++ tr '\n' ,                     
note: configured to log to "/dev/stderr"
16:31:28.240Z INFO dnswait: using system configuration
+ JOIN_ADDRS=3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide
.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
+ [[ -z 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.inte
rnal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221, ]]
+ args=('--insecure' '--listen-addr' "$LISTEN_ADDR" '--http-addr' '127.0.0.1:8080' '--store' "$DATASTORE" '--join' "$JOIN_ADDRS")
+ exec /opt/oxide/cockroachdb/bin/cockroach start --insecure --listen-addr '[fd00:1122:3344:109::3]:32221' --http-addr 127.0.0.1:8080 --store /data --join 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:322
21,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.
:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
[ Dec 10 16:31:28 Method "start" exited with status 0. ]
*                
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
*                                                
* This mode is intended for non-production testing only.
*                                       
* In this mode:                                       
* - Your cluster is open to any client that can access fd00:1122:3344:109::3.                                                                                                                                                         
* - Intruders with access to your machine or network can observe client-server traffic.                                                                                                                                               
* - Intruders can log in without password and read or write any data in the cluster.                                                                                                                                                  
* - Intruders can consume all your server's resources and cause unavailability.                                                                                                                                                       
*                                                                                                                                                                                                                                     
*                                                                                                                  
* INFO: To start a secure server without mandating TLS for clients,                                                                                                                                                                   
* consider --accept-sql-without-tls instead. For other options, see:                                                                                                                                                                  
*                                                                                                                  
* - https://go.crdb.dev/issue-v/53404/v22.1
* - https://www.cockroachlabs.com/docs/v22.1/secure-a-cluster.html
*
CockroachDB node starting at 2024-12-10 16:31:59.237331244 +0000 UTC (took 30.1s)
build:               OSS v22.1.22-27-g76e176e260 @ 2024/10/23 21:38:21 (go1.17.13)
webui:               http://127.0.0.1:8080
sql:                 postgresql://root@[fd00:1122:3344:109::3]:32221/defaultdb?sslmode=disable
sql (JDBC):          jdbc:postgresql://[fd00:1122:3344:109::3]:32221/defaultdb?sslmode=disable&user=root
RPC client flags:    /opt/oxide/cockroachdb/bin/cockroach <client cmd> --host=[fd00:1122:3344:109::3]:32221 --insecure
logs:                /data/logs
temp dir:            /data/cockroach-temp3047379273
external I/O path:   /data/extern
store[0]:            path=/data
storage engine:      pebble
clusterID:           2a348c29-7ccb-4d77-9afd-f1e37b9abb40 
status:              restarted pre-existing node
nodeID:              1
[ Dec 10 16:33:25 Stopping because all processes in service exited. ]
[ Dec 10 16:33:25 Executing stop method (:kill). ]
[ Dec 10 16:33:25 Executing start method ("/opt/oxide/lib/svc/manifest/cockroachdb.sh"). ]
+ set -o errexit
+ set -o pipefail
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/listen_addr svc:/oxide/cockroachdb:default
+ LISTEN_ADDR='[fd00:1122:3344:109::3]:32221'
++ svcprop -c -p config/store svc:/oxide/cockroachdb:default
+ DATASTORE=/data
++ ++ ++ head -n 5
/opt/oxide/internal-dns-cli/bin/dnswait cockroach
tr '\n' ,
note: configured to log to "/dev/stderr"
16:33:25.957Z INFO dnswait: using system configuration
+ JOIN_ADDRS=3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide
.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
+ [[ -z 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.inte
rnal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221, ]]
+ args=('--insecure' '--listen-addr' "$LISTEN_ADDR" '--http-addr' '127.0.0.1:8080' '--store' "$DATASTORE" '--join' "$JOIN_ADDRS")
+ [ Dec 10 16:33:26 Method "start" exited with status 0. ]
exec /opt/oxide/cockroachdb/bin/cockroach start --insecure --listen-addr '[fd00:1122:3344:109::3]:32221' --http-addr 127.0.0.1:8080 --store /data --join 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221
,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:3
2221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,

No core files were found in the expected places.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions