Followup from #9866 -- see that issue for context.
With CockroachDB, if we take a table with a sufficiently large amount of data in it, and then create an index (in, e.g., a migration), it will fail with an out of memory error.
Steps to reproduce
Run this reproduction script in Omicron (so that Omicron's copy of cockroach is in the PATH): https://gist.github.com/sunshowers/a5e690e8aa40df4ef00882b1e71e1f2e
This fails with:
[nexus1] ERROR: transaction committed but schema change aborted with error: (53200): failed to construct index entries during backfill: root: memory budget exceeded: 2437120 bytes requested, 132620137 currently allocated, 134217728 bytes in budget
[nexus1] SQLSTATE: XXA00
[nexus1] HINT: Consider increasing --max-sql-memory startup parameter.
[nexus1] --
[nexus1] Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
[nexus1] Manual inspection may be required to determine the actual state of the database.
[nexus1] --
[nexus1] See: https://github.com/cockroachdb/cockroach/issues/42061
[nexus1] Failed running "sql"
What does "sufficiently large amount of data" mean?
TODO, still analyzing this. A quick summary is that for the example in the repro script above, 1.7M rows doesn't fail but 1.8M rows does.
Possible options
Reducing the batch size
For index backfills, bulkio.index_backfill.batch_size controls how many rows are loaded in at once. The default value of this tunable is 50,000, and we don't alter it. Reducing the batch size sufficiently appears to get past this issue (TODO still need to verify this claim for real.)
cockroachdb/cockroach#135563 uses a halving/backoff strategy for the batch size. But due to licensing issues we can't use it directly. If we decide to fix this in our fork of CockroachDB we'll need to reimplement this idea ourselves.
But we can also do this at the migration driver layer. For example, we can either conditionally (if the migration is a CREATE INDEX DDL) or unconditionally reduce the batch size to a smaller value of, say, 5000. That should get us past this issue.
--max-sql-memory
The --max-sql-memory parameter is set on each node at startup time. By default it is 128 MiB, and we do not alter the default:
|
args=( |
|
'--insecure' |
|
'--listen-addr' "$LISTEN_ADDR" |
|
'--http-addr' '127.0.0.1:8080' |
|
'--store' "$DATASTORE" |
|
'--join' "$JOIN_ADDRS" |
|
) |
|
|
|
exec /opt/oxide/cockroachdb/bin/cockroach start "${args[@]}" & |
Bumping this up to a larger value can help.
Followup from #9866 -- see that issue for context.
With CockroachDB, if we take a table with a sufficiently large amount of data in it, and then create an index (in, e.g., a migration), it will fail with an out of memory error.
Steps to reproduce
Run this reproduction script in Omicron (so that Omicron's copy of
cockroachis in the PATH): https://gist.github.com/sunshowers/a5e690e8aa40df4ef00882b1e71e1f2eThis fails with:
What does "sufficiently large amount of data" mean?
TODO, still analyzing this. A quick summary is that for the example in the repro script above, 1.7M rows doesn't fail but 1.8M rows does.
Possible options
Reducing the batch size
For index backfills,
bulkio.index_backfill.batch_sizecontrols how many rows are loaded in at once. The default value of this tunable is 50,000, and we don't alter it. Reducing the batch size sufficiently appears to get past this issue (TODO still need to verify this claim for real.)cockroachdb/cockroach#135563 uses a halving/backoff strategy for the batch size. But due to licensing issues we can't use it directly. If we decide to fix this in our fork of CockroachDB we'll need to reimplement this idea ourselves.
But we can also do this at the migration driver layer. For example, we can either conditionally (if the migration is a
CREATE INDEXDDL) or unconditionally reduce the batch size to a smaller value of, say, 5000. That should get us past this issue.--max-sql-memoryThe
--max-sql-memoryparameter is set on each node at startup time. By default it is 128 MiB, and we do not alter the default:omicron/smf/cockroachdb/method_script.sh
Lines 25 to 33 in 79fac7d
Bumping this up to a larger value can help.