Skip to content

SQLState(08006) error causing HTTP 500 error upstream #31645

@tim-o

Description

@tim-o

Describe the problem
While conducting a routine migration, customer encountered the following:

  1. Context canceled / aborted transactions in the logs, for example:
    W181019 15:24:49.629554 61831020 internal/client/txn.go:556 [n2,client=130.211.2.195:59869,user=foo] failure aborting transaction: HandledRetryableTxnError: TransactionAbortedError: txn aborted "sql txn" id=581b6de4 key=/Table/82/1/"\xe4\x15\xd4\xe7\xab\xc4G\x1d\x9c5\xd99\x98\xab\xe2\xf5"/0 rw=true pri=0.03491282 iso=SERIALIZABLE stat=PENDING epo=0 ts=1539962689.619304580,0 orig=1539962689.619304580,0 max=1539962690.119304580,0 wto=false rop=false seq=2; abort caused by: failed to send RPC: sending to all 3 replicas failed; last error: {<nil> context canceled} I181019 15:24:56.600105 181 gossip/gossip.go:488 [n2] gossip status (ok, 3 nodes)
  2. Upstream SQLSTATE(08006) errors in HikariCP: WARN c.z.hikari.pool.ProxyConnection - roach - Connection org.postgresql.jdbc.PgConnection@594e7e23 marked as broken because of SQLSTATE(08006), ErrorCode(0)
  3. HTTP 500 errors in their application due to the broken connection & 08006 error.

Looking in our code, I see code 08006 is the result of a CodeConnectionFailureError. These are called only twice: in schema_changer.go and inbound.go:99

The comment above that line does seem to describe the condition described in the log warning. However, it's not clear why the result of this race is a 08006 error since that would usually indicate a problem with the connection while (at least as far as I can see) the underlying issue is just context cancellation.

To Reproduce
This doesn't reproduce easily, but did result in 4 errors today for the customer during routine migrations.

Environment:
Postgres JDBC 42.1.3
Hibernate 5.1.8.Final
HikariCP 3.2.0
CRDB version to be provided.

Additional context
What was the impact?
HTTP 500 errors potentially sent to end users.

@vivekmenezes, assigning to you first since @andreimatei indicated you wrote this portion of the code. If there's a better home let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-executionRelating to SQL execution.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions