changefeedccl: Improve JSON encoder performance by miretskiy · Pull Request #88064 · cockroachdb/cockroach

miretskiy · 2022-09-16T19:25:44Z

Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:

New Go map objects were constructed for each event.
Underlying json conversion functions had inefficiencies
(tracked in tree: Improve performance of tree.AsJSON #87968)
Conversion of Go maps to JSON incurs the cost
of sorting the keys -- for each row. Sorting,
particularly when rows are wide, has significant cost.
Each conversion to JSON allocated new array builder
(to encode keys) and new object builder; that too has cost.
Underlying code structure, while attempting to reuse
code when constructing different "envelope" formats,
cause the code to be more inefficient.

This PR addresses all of the above. In particular, since
a schema version for the table is guaranteed to have
the same set of primary key and value columns, we can construct
JSON builders once. The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:

Key encoding speed up is 5-30%, depending on the number of primary
keys.
Value encoding 30% - 60% faster (slowest being "wrapped" envelope
with diff -- which effectively encodes 2x values)
Byte allocations per row reduces by over 70%, with the number
of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement

cockroach-teamcity · 2022-09-16T19:25:52Z

This change is

miretskiy · 2022-09-16T19:27:15Z

Full benchmark results:

name                                                            old time/op    new time/op    delta
Encoders/json/encodeKey/1cols-10                                   248ns ± 1%     235ns ± 0%   -5.34%  (p=0.000 n=15+14)
Encoders/json/encodeKey/2cols-10                                   427ns ± 1%     364ns ± 0%  -14.96%  (p=0.000 n=13+14)
Encoders/json/encodeKey/3cols-10                                  1.17µs ± 1%    0.82µs ± 2%  -30.10%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                  1.43µs ± 4%    1.03µs ± 1%  -27.84%  (p=0.000 n=15+13)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        3.35µs ± 1%    1.49µs ± 2%  -55.60%  (p=0.000 n=14+14)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         3.35µs ± 2%    1.48µs ± 1%  -55.81%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    3.79µs ± 2%    1.80µs ± 0%  -52.47%  (p=0.000 n=14+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     4.42µs ± 1%    2.85µs ± 0%  -35.47%  (p=0.000 n=14+14)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        6.73µs ± 0%    3.04µs ± 2%  -54.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         6.73µs ± 0%    3.05µs ± 2%  -54.74%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    6.87µs ± 0%    3.45µs ± 1%  -49.79%  (p=0.000 n=13+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     8.30µs ± 0%    6.00µs ± 2%  -27.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        13.7µs ± 1%     5.3µs ± 0%  -61.04%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         13.7µs ± 1%     5.3µs ± 0%  -61.13%  (p=0.000 n=15+12)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    14.9µs ± 1%     6.3µs ± 3%  -57.92%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     18.1µs ± 0%    11.0µs ± 0%  -39.31%  (p=0.000 n=14+13)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        29.8µs ± 0%    12.6µs ± 1%  -57.56%  (p=0.000 n=14+14)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         29.7µs ± 0%    12.7µs ± 0%  -57.26%  (p=0.000 n=14+13)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    31.2µs ± 0%    13.8µs ± 1%  -55.66%  (p=0.000 n=15+14)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     39.4µs ± 0%    25.3µs ± 1%  -35.73%  (p=0.000 n=12+15)


name                                                            old alloc/op   new alloc/op   delta
Encoders/json/encodeKey/1cols-10                                    105B ± 0%       73B ± 0%  -30.48%  (p=0.000 n=15+15)
Encoders/json/encodeKey/2cols-10                                    181B ± 0%      101B ± 0%  -44.20%  (p=0.000 n=15+15)
Encoders/json/encodeKey/3cols-10                                    649B ± 0%      295B ± 0%  -54.55%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                    757B ± 0%      381B ± 0%  -49.67%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    2.76kB ± 0%    0.55kB ± 0%  -80.01%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     3.10kB ± 0%    0.93kB ± 0%  -70.08%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    3.93kB ± 0%    1.08kB ± 0%  -72.47%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     4.61kB ± 0%    1.95kB ± 0%  -57.59%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    10.6kB ± 0%     2.1kB ± 0%  -79.70%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     11.7kB ± 0%     3.9kB ± 0%  -66.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    20.7kB ± 0%     4.6kB ± 0%  -77.75%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     23.6kB ± 0%     8.7kB ± 0%  -63.08%  (p=0.000 n=15+15)


name                                                            old alloc/op   new alloc/op   delta
Encoders/json/encodeKey/1cols-10                                    105B ± 0%       73B ± 0%  -30.48%  (p=0.000 n=15+15)
Encoders/json/encodeKey/2cols-10                                    181B ± 0%      101B ± 0%  -44.20%  (p=0.000 n=15+15)
Encoders/json/encodeKey/3cols-10                                    649B ± 0%      295B ± 0%  -54.55%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                    757B ± 0%      381B ± 0%  -49.67%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    2.76kB ± 0%    0.55kB ± 0%  -80.01%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     3.10kB ± 0%    0.93kB ± 0%  -70.08%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    3.93kB ± 0%    1.08kB ± 0%  -72.47%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     4.61kB ± 0%    1.95kB ± 0%  -57.59%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    10.6kB ± 0%     2.1kB ± 0%  -79.70%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     11.7kB ± 0%     3.9kB ± 0%  -66.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    20.7kB ± 0%     4.6kB ± 0%  -77.75%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     23.6kB ± 0%     8.7kB ± 0%  -63.08%  (p=0.000 n=15+15)

HonoreDB

Reviewed 2 of 2 files at r1, 11 of 11 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @stevendanna)

miretskiy · 2022-09-18T22:01:22Z

@HonoreDB: This needs to have a more careful review;
In particular, I had to change how "bare" envelope encodes key_in_value.
Putting "key" as a top level tuple is not okay (since building json object would fail if the table also has a column named
key); therefore, I made the changes to put the key inside crdb and made appropriate changes in the encoder as well as testfeed implementation.

In addition, I removed the forced setting of key_in_value when using bare envelope -- why would you force that?
If you select *, you have the key -- so if you want the option, then specify it.

HonoreDB

Reviewed 11 of 13 files at r5, 18 of 23 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @stevendanna)

Add JSON encoder benchmark. Release note: None Release justification: test only change

Rewrite JSON encoder to improve its performance. Prior to this change JSON encoder was very inefficient. This inefficiency had multiple underlying reasons: * New Go map objects were constructed for each event. * Underlying json conversion functions had inefficiencies (tracked in cockroachdb#87968) * Conversion of Go maps to JSON incurs the cost of sorting the keys -- for each row. Sorting, particularly when rows are wide, has significant cost. * Each conversion to JSON allocated new array builder (to encode keys) and new object builder; that too has cost. * Underlying code structure, while attempting to reuse code when constructing different "envelope" formats, cause the code to be more inefficient. This PR addresses all of the above. In particular, since a schema version for the table is guaranteeed to have the same set of primary key and value columns, we can construct JSON builders once. The expensive sort operation can be performed once per version; builders can be memoized and cached. The performance impact is significant: * Key encoding speed up is 5-30%, depending on the number of primary keys. * Value encoding 30% - 60% faster (slowest being "wrapped" envelope with diff -- which effectively encodes 2x values) * Byte allocations per row reduces by over 70%, with the number of allocations reduced similarly. Release note (enterprise change): Changefeed JSON encoder performance improved by 50%. Release justification: performance improvement

miretskiy · 2022-09-22T12:41:45Z

bors r=honoredb

craig · 2022-09-22T13:35:20Z

Build succeeded:

Bazel Essential CI (Cockroach)

miretskiy · 2022-09-22T13:40:03Z

Adding backport-22.2 label; but will not backport to 22.2.0; will wait until at least 22.2.1

miretskiy requested a review from a team as a code owner September 16, 2022 19:25

miretskiy requested review from stevendanna and removed request for a team September 16, 2022 19:25

miretskiy force-pushed the json_encoder branch from ab46f61 to 8c9c047 Compare September 16, 2022 19:25

miretskiy requested review from HonoreDB and ajwerner September 16, 2022 19:27

miretskiy force-pushed the json_encoder branch from 8c9c047 to f7dee74 Compare September 16, 2022 19:37

HonoreDB approved these changes Sep 16, 2022

View reviewed changes

shermanCRL mentioned this pull request Sep 16, 2022

changefeedccl: macro benchmarks over several performance improvements #87921

Closed

miretskiy force-pushed the json_encoder branch from f7dee74 to d7a672a Compare September 16, 2022 20:10

miretskiy requested a review from a team September 16, 2022 20:10

miretskiy force-pushed the json_encoder branch 3 times, most recently from 8ccf155 to e0b2467 Compare September 18, 2022 21:53

miretskiy force-pushed the json_encoder branch 5 times, most recently from fc0c3a5 to 63e48ca Compare September 20, 2022 12:34

HonoreDB approved these changes Sep 20, 2022

View reviewed changes

This was referenced Sep 20, 2022

cdc: fix assignment to nil map panic in json encoder #88241

Merged

cdc: changefeeds with AS SELECT * panic when encoding a delete message #88239

Closed

miretskiy force-pushed the json_encoder branch from 63e48ca to 0d91118 Compare September 20, 2022 16:33

changefeedccl: Add JSON encoder benchmark.

0835676

Add JSON encoder benchmark. Release note: None Release justification: test only change

miretskiy force-pushed the json_encoder branch 3 times, most recently from 9ac0374 to 575f5dd Compare September 21, 2022 21:02

miretskiy force-pushed the json_encoder branch from 575f5dd to e819c0c Compare September 21, 2022 22:12

miretskiy force-pushed the json_encoder branch from e819c0c to c281a29 Compare September 22, 2022 00:45

craig bot merged commit a18db70 into cockroachdb:master Sep 22, 2022

miretskiy added the backport-22.2.x label Sep 22, 2022

cockroach-teamcity mentioned this pull request Sep 22, 2022

PR #88064 - changefeedccl: Improve JSON encoder performance cockroachdb/docs#15172

Closed

miretskiy mentioned this pull request Sep 28, 2022

release-22.2: Changefeed backfill performance fixes #88915

Closed

shermanCRL mentioned this pull request Nov 9, 2022

changefeedccl: avoid per-value allocations in jsonEncoder.EncodeValue #84417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: Improve JSON encoder performance #88064

changefeedccl: Improve JSON encoder performance #88064
craig[bot] merged 2 commits intocockroachdb:masterfrom
miretskiy:json_encoder

miretskiy commented Sep 16, 2022

Uh oh!

cockroach-teamcity commented Sep 16, 2022

Uh oh!

miretskiy commented Sep 16, 2022

Uh oh!

HonoreDB left a comment

Uh oh!

miretskiy commented Sep 18, 2022

Uh oh!

HonoreDB left a comment

Uh oh!

miretskiy commented Sep 22, 2022

Uh oh!

craig bot commented Sep 22, 2022

Uh oh!

miretskiy commented Sep 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

miretskiy commented Sep 16, 2022

Uh oh!

cockroach-teamcity commented Sep 16, 2022

Uh oh!

miretskiy commented Sep 16, 2022

Uh oh!

HonoreDB left a comment

Choose a reason for hiding this comment

Uh oh!

miretskiy commented Sep 18, 2022

Uh oh!

HonoreDB left a comment

Choose a reason for hiding this comment

Uh oh!

miretskiy commented Sep 22, 2022

Uh oh!

craig bot commented Sep 22, 2022

Uh oh!

miretskiy commented Sep 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants