Skip to content

sql: leak in root SQL bytes monitor/account #95324

@irfansharif

Description

@irfansharif

Describe the problem

Was running a write-heavy workload against a cluster, and eventually it ran into "memory budget exceeded" errors coming out of our memory monitoring infrastructure. Eventually the throughput completely collapses, so marking it as an S-2.

To Reproduce

Using 76e3ea8:

roachprod put $CLUSTER:10 $CRDB/artifacts/cockroach ./cockroach
roachprod run $CLUSTER:10 -- ./cockroach workload init kv --splits 5000 (roachprod pgurl $CLUSTER:1)
roachprod run $CLUSTER:10 -- ./cockroach workload run kv --min-block-bytes=262144 --max-block-bytes=262144 \
  --concurrency=50 --splits=5000 --read-percent=0 \
  --max-rate=150 --ramp 1m --duration 1h \
  --tolerate-errors (roachprod pgurl $CLUSTER:1-9)

Where after a while, the workload generator starts tripping up with the following:

  843.0s        0          148.9          150.0     10.5     12.6     14.2     15.7 write
  844.0s        0          151.0          150.0     11.0     14.2     18.9     21.0 write
  845.0s        0          150.0          150.0     10.5     14.2     15.2     21.0 write
  846.0s        0          150.1          150.0     10.5     13.6     17.8     18.9 write
  847.0s        0          150.0          150.0     10.5     13.6     15.7     15.7 write
  848.0s        0          149.9          150.0     10.5     14.2     16.8     19.9 write
  849.0s        0          150.1          150.0     11.0     14.7     67.1    142.6 write
  850.0s        0          150.1          150.0     11.0     13.6     17.8     28.3 write
  851.0s        0          150.0          150.0     11.0     14.2     16.3     22.0 write
  852.0s        0          150.0          150.0     11.5     15.2     16.8     18.9 write
  853.0s        0          149.9          150.0     11.0     14.7     18.9     21.0 write
  854.0s        0          150.0          150.0     10.5     13.1     19.9     19.9 write
  855.0s        0          150.0          150.0     10.5     14.2     46.1     62.9 write
I230116 20:06:08.350421 258 workload/pgx_helpers.go:79  [-] 3  pgx logger [error]: Exec logParams=map[args:[-3254532635867646659 4c1b9ff64253350810d57807dd2c134f16ca3ffb73861498c871c2436ae912b5353f3fc6dc06a8aef7d8681b27d87f35ed68a1f1caefa80f082c1bb73ca9eed9 (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:2689528 sql:kv-2 time:4.360028ms]
E230116 20:06:08.350626 1 workload/cli/run.go:546  [-] 4  ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200)
I230116 20:06:08.396573 270 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[3139555668903684332 aadd31241468115cbc8b7a33050c2c992b97ccfc046a1f4997eccaff5db827c0ff630bab707f44c7ec441fdb9c758682da0cee83195e54e91217b4964df82bee (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997327360 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:9659884 sql:kv-2 time:3.732324ms]
I230116 20:06:08.409964 257 workload/pgx_helpers.go:79  [-] 6  pgx logger [error]: Exec logParams=map[args:[-4919977530161583301 3d274121bc7a59bec1da3d9a11270ecf60a40f2d6542d1cd6c624bf35c50ab2439df3e4a4fd0fd3d9d2b93b674e530a53f3eed08e45f2d6585f98a10fb3fbb92 (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:2689528 sql:kv-2 time:3.907051ms]
I230116 20:06:08.459162 271 workload/pgx_helpers.go:79  [-] 7  pgx logger [error]: Exec logParams=map[args:[8383264568401713769 ceb2976f5150b90dd0ecab40bb4d45a36bcc76e1a0fa06ef48463dc4a816c3357825d6349d79521230c149d8e7e6a704f57963346d9b55ff6b642eae73035216 (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997327360 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:9659884 sql:kv-2 time:6.312705ms]
I230116 20:06:08.470333 242 workload/pgx_helpers.go:79  [-] 8  pgx logger [error]: Exec logParams=map[args:[-1869484712175301760 11ec9ff410bcab8fb72b5ab3e680329b7804e4c62f0a7cb9d3a046f36b659553c2d93be44a57377111b0083538b44b3cc2897fa85eccc2287b051f41745af16b (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:2689528 sql:kv-2 time:3.82225ms]
  856.0s        5          150.1          150.0     10.5     13.1     19.9     21.0 write
I230116 20:06:08.516808 272 workload/pgx_helpers.go:79  [-] 9  pgx logger [error]: Exec logParams=map[args:[-1239462385144657077 967ec05c376c0ab6cb0ad883fa9b55de86d3f6ec44fc54201d2811dc8fd2bf0979b05ba3a7f9048391e976d957a8bcdf835d75857880022afd80d3113096bd53 (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997327360 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:9659884 sql:kv-2 time:3.977102ms]
I230116 20:06:08.531356 243 workload/pgx_helpers.go:79  [-] 10  pgx logger [error]: Exec logParams=map[args:[4977677096930467774 439f0d2bfdb948ff4bae63c89c41e8914f583341b7dcb8d98fd997114b2437cb0671b58fbc1d7737e60133e216bb8a970203234f40405d8b548bca2ba0558b8b (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:2689528 sql:kv-2 time:5.16205ms]
I230116 20:06:08.577092 256 workload/pgx_helpers.go:79  [-] 11  pgx logger [error]: Exec logParams=map[args:[-5650848376985580255 d31b31496c032afa7f691d56c50c6088748835534c5b1e8eff2efa0c7e68714a585d721cfeacb0ff4edc37b2df4e73f1ca9d705228eb7497b5155abdcf0f837f (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997327360 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:9659884 sql:kv-2 time:3.670825ms]
I230116 20:06:08.590932 95 workload/pgx_helpers.go:79  [-] 12  pgx logger [error]: Exec logParams=map[args:[-4134403631077663780 dd90037c1ca751308cb73ba893be96d9ae4298304eb545cb48537ca9998c251a3eb6ed3099a7bd91caaecd1efaecf87d2acdd29d4e99d5c40c0c9f0fdd9651b7 (truncated 262080 bytes)] err:ERROR: root: memory budget exceeded: 245760 bytes requested, 7997399040 currently allocated, 7997484032 bytes in budget (SQLSTATE 53200) pid:2689528 sql:kv-2 time:4.447271ms]

The metrics suggest that we're never free-ing up the memory budget acquired. Do we have a leak? The red annotations, in order:

  1. Workload started.
  2. Memory usage stopped growing, also where we start running into errors. Throughput starts collapsing but bytes in stays steady.
  3. Workload stopped. Bytes in will go to zero.

image

Jira issue: CRDB-23459

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-2Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errorsT-sql-queriesSQL Queries Team

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions