-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: splits can hit command is too large error #25233
Copy link
Copy link
Closed
Labels
A-kv-replicationRelating to Raft, consensus, and coordination.Relating to Raft, consensus, and coordination.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-ux-surpriseIssue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Milestone
Description
We've seen cases where a split request can hit a command is too large error. For instance, this is what happened in https://forum.cockroachlabs.com/t/trying-to-split-same-key-range/1592, where splits continuously failed with the following error:
I180501 02:24:15.005467 26305083 internal/client/db.go:556 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] failed batch: command is too large: 87076188 bytes (max: 67108864)
We should figure out how this is possible and if there's anything that needs fixing. Off the top of my head, I'm not even sure exactly what factors result in command growth for splits.
Here are the full set of relevant logs of a single split attempt:
Table/54/4/1921-09-04T06:01:02.831999999Z/1010711/605/"_activity"/114510941/1524994500000 [r42877]
I180501 02:24:10.652441 26305083 storage/log.go:54 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] Range Event: "split", range: 186, info: updated_desc:<range_id:186 start_key:"\276\213\370\017l&\024\371Z\343*i\371\005{\317\000\211\022_name\000\001\375Rc\230\334\264g+e\373\001c\006z,W" end_key:"\276\214\024\204\245\031\370\037\204\365\374\205\377\370\017l\027\367\002]\022_activity\000\001\371\006\323L]\373\001c\020\301e\240" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:3 store_id:3 replica_id:2 > replicas:<node_id:2 store_id:2 replica_id:3 > next_replica_id:4 > new_desc:<range_id:42877 start_key:"\276\214\024\204\245\031\370\037\204\365\374\205\377\370\017l\027\367\002]\022_activity\000\001\371\006\323L]\373\001c\020\301e\240" end_key:"\276\214\024\204\245\034\313\373\204\314a\343\177\370\017l%\211\022_canonical\000\001\375_k[-\367P\244!\373\001c\006\225\"}" replicas:<node_id:1 store_id:1 replica_id:1 > replicas:<node_id:3 store_id:3 replica_id:2 > replicas:<node_id:2 store_id:2 replica_id:3 > next_replica_id:4 >
I180501 02:24:10.652544 26305083 sql/planner.go:338 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal query:
INSERT INTO system.rangelog (
timestamp, "rangeID", "storeID", "eventType", "otherRangeID", info
)
VALUES(
$1, $2, $3, $4, $5, $6
)
I180501 02:24:10.652586 26305083 sql/planner.go:340 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] placeholders: ["2018-05-01 02:24:10.654522612 +0000 UTC" "186" "2" "split" "42877" "{\"UpdatedDesc\":{\"range_id\":186,\"start_key\":\"vov4D2wmFPla4ypp+QV7zwCJEl9uYW1lAAH9UmOY3LRnK2X7AWMGeixX\",\"end_key\":\"vowUhKUZ+B+E9fyF//gPbBf3Al0SX2FjdGl2aXR5AAH5BtNMXfsBYxDBZaA=\",\"replicas\":[{\"node_id\":1,\"store_id\":1,\"replica_id\":1},{\"node_id\":3,\"store_id\":3,\"replica_id\":2},{\"node_id\":2,\"store_id\":2,\"replica_id\":3}],\"next_replica_id\":4},\"NewDesc\":{\"range_id\":42877,\"start_key\":\"vowUhKUZ+B+E9fyF//gPbBf3Al0SX2FjdGl2aXR5AAH5BtNMXfsBYxDBZaA=\",\"end_key\":\"vowUhKUcy/uEzGHjf/gPbCWJEl9jYW5vbmljYWwAAf1fa1st91CkIfsBYwaVIn0=\",\"replicas\":[{\"node_id\":1,\"store_id\":1,\"replica_id\":1},{\"node_id\":3,\"store_id\":3,\"replica_id\":2},{\"node_id\":2,\"store_id\":2,\"replica_id\":3}],\"next_replica_id\":4}}"]
I180501 02:24:10.652766 26305083 sql/table.go:220 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] planner acquiring lease on table 'system.system.rangelog'
I180501 02:24:10.652794 26305083 sql/table.go:220 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] planner acquiring lease on table 'system.public.rangelog'
I180501 02:24:10.655802 26305083 sql/plan.go:338 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] statement INSERT INTO system.public.rangelog("timestamp", "rangeID", "storeID", "eventType", "otherRangeID", info) VALUES ($1, $2, $3, $4, $5, $6) compiled to:
0 insert ()
0 .into rangelog(timestamp, rangeID, storeID, eventType, otherRangeID, info, uniqueID) ()
0 .default 0 (NULL)[unknown] ()
0 .default 1 (NULL)[unknown] ()
0 .default 2 (NULL)[unknown] ()
0 .default 3 (NULL)[unknown] ()
0 .default 4 (NULL)[unknown] ()
0 .default 5 (NULL)[unknown] ()
0 .default 6 (unique_rowid())[int] ()
1 values ()
1 .size 7 columns, 1 row ()
1 .row 0, expr 0 ($1)[timestamp] ()
1 .row 0, expr 1 ($2)[int] ()
1 .row 0, expr 2 ($3)[int] ()
1 .row 0, expr 3 ($4)[string] ()
1 .row 0, expr 4 ($5)[int] ()
1 .row 0, expr 5 ($6)[string] ()
1 .row 0, expr 6 (unique_rowid())[int] ()
I180501 02:24:10.655836 26305083 util/mon/bytes_usage.go:624 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal-txn: requesting 10240 bytes from the pool
I180501 02:24:10.655853 26305083 util/mon/bytes_usage.go:585 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal-root: now at 10240 bytes (+10240) - util/mon.(*BoundAccount).Grow:519,util/mon.(*BytesMonitor).increaseBudget:627,util/mon.(*BytesMonitor).reserveBytes:558,util/mon.(*BoundAccount).Grow:519,sql/sqlbase.(*RowContainer).AddRow:272
I180501 02:24:10.655867 26305083 util/mon/bytes_usage.go:585 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal-txn: now at 10240 bytes (+10240) - util/mon.(*BoundAccount).Grow:519,sql/sqlbase.(*RowContainer).AddRow:272,sql.(*valuesNode).startExec:166,sql.startExec.func2:510,sql.(*planVisitor).visit.func1:90
I180501 02:24:10.661211 26305083 sql/exec_log.go:173 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal-exec "" {} "INSERT INTO system.public.rangelog(\"timestamp\", \"rangeID\", \"storeID\", \"eventType\", \"otherRangeID\", info) VALUES ($1, $2, $3, $4, $5, $6)" {$1:"'2018-05-01 02:24:10.654523+00:00'", $2:"186", $3:"2", $4:"'split'", $5:"42877", $6:"'{\"UpdatedDesc\":{\"range_id\":186,\"start_key\":\"vov4D2wmFPla4ypp+QV7zwCJEl9uYW1lAAH9UmOY3LRnK2X7AWMGeixX\",\"end_key\":\"vowUhKUZ+B+E9fyF//gPbBf3Al0SX2FjdGl2aXR5AAH5BtNMXfsBYxDBZaA=\",\"replicas\":[{\"node_id\":1,\"store_id\":1,\"replica_id\":1},{\"node_id\":3,\"store_id\":3,\"replica_id\":2},{\"node_id\":2,\"store_id\":2,\"replica_id\":3}],\"next_replica_id\":4},\"NewDesc\":{\"range_id\":42877,\"start_key\":\"vowUhKUZ+B+E9fyF//gPbBf3Al0SX2FjdGl2aXR5AAH5BtNMXfsBYxDBZaA=\",\"end_key\":\"vowUhKUcy/uEzGHjf/gPbCWJEl9jYW5vbmljYWwAAf1fa1st91CkIfsBYwaVIn0=\",\"replicas\":[{\"node_id\":1,\"store_id\":1,\"replica_id\":1},{\"node_id\":3,\"store_id\":3,\"replica_id\":2},{\"node_id\":2,\"store_id\":2,\"replica_id\":3}],\"next_replica_id\":4}}'"} 7.346 1 ""
I180501 02:24:10.661241 26305083 util/mon/bytes_usage.go:609 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] internal-txn: now at 0 bytes (-10240) - util/mon.(*BoundAccount).Close:494,sql/sqlbase.(*RowContainer).Close:221,sql.(*valuesNode).Close:198,sql.(*insertNode).Close:308,sql.(*planTop).close:401
I180501 02:24:15.005467 26305083 internal/client/db.go:556 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] failed batch: command is too large: 87076188 bytes (max: 67108864)
I180501 02:24:15.005491 26305083 internal/client/txn.go:997 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] failed batch: command is too large: 87076188 bytes (max: 67108864)
I180501 02:24:15.005502 26305083 internal/client/txn.go:604 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] rolling back transaction
I180501 02:24:15.022413 26305083 storage/queue.go:642 [split,n2,s2,r186/3:/Table/54/{3/10107…-4/1921-…}] done 4.90105918s
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-kv-replicationRelating to Raft, consensus, and coordination.Relating to Raft, consensus, and coordination.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-ux-surpriseIssue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.