Skip to content

changefeedccl: Improve JSON encoder performance #88064

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
miretskiy:json_encoder
Sep 22, 2022
Merged

changefeedccl: Improve JSON encoder performance #88064
craig[bot] merged 2 commits intocockroachdb:masterfrom
miretskiy:json_encoder

Conversation

@miretskiy
Copy link
Copy Markdown
Contributor

Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:

  • New Go map objects were constructed for each event.
  • Underlying json conversion functions had inefficiencies
    (tracked in tree: Improve performance of tree.AsJSON #87968)
  • Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  • Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  • Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above. In particular, since
a schema version for the table is guaranteed to have
the same set of primary key and value columns, we can construct
JSON builders once. The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:

  • Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  • Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  • Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement

@miretskiy miretskiy requested a review from a team as a code owner September 16, 2022 19:25
@miretskiy miretskiy requested review from stevendanna and removed request for a team September 16, 2022 19:25
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@miretskiy
Copy link
Copy Markdown
Contributor Author

Full benchmark results:

name                                                            old time/op    new time/op    delta
Encoders/json/encodeKey/1cols-10                                   248ns ± 1%     235ns ± 0%   -5.34%  (p=0.000 n=15+14)
Encoders/json/encodeKey/2cols-10                                   427ns ± 1%     364ns ± 0%  -14.96%  (p=0.000 n=13+14)
Encoders/json/encodeKey/3cols-10                                  1.17µs ± 1%    0.82µs ± 2%  -30.10%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                  1.43µs ± 4%    1.03µs ± 1%  -27.84%  (p=0.000 n=15+13)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        3.35µs ± 1%    1.49µs ± 2%  -55.60%  (p=0.000 n=14+14)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         3.35µs ± 2%    1.48µs ± 1%  -55.81%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    3.79µs ± 2%    1.80µs ± 0%  -52.47%  (p=0.000 n=14+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     4.42µs ± 1%    2.85µs ± 0%  -35.47%  (p=0.000 n=14+14)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        6.73µs ± 0%    3.04µs ± 2%  -54.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         6.73µs ± 0%    3.05µs ± 2%  -54.74%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    6.87µs ± 0%    3.45µs ± 1%  -49.79%  (p=0.000 n=13+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     8.30µs ± 0%    6.00µs ± 2%  -27.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        13.7µs ± 1%     5.3µs ± 0%  -61.04%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         13.7µs ± 1%     5.3µs ± 0%  -61.13%  (p=0.000 n=15+12)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    14.9µs ± 1%     6.3µs ± 3%  -57.92%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     18.1µs ± 0%    11.0µs ± 0%  -39.31%  (p=0.000 n=14+13)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        29.8µs ± 0%    12.6µs ± 1%  -57.56%  (p=0.000 n=14+14)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         29.7µs ± 0%    12.7µs ± 0%  -57.26%  (p=0.000 n=14+13)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    31.2µs ± 0%    13.8µs ± 1%  -55.66%  (p=0.000 n=15+14)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     39.4µs ± 0%    25.3µs ± 1%  -35.73%  (p=0.000 n=12+15)


name                                                            old alloc/op   new alloc/op   delta
Encoders/json/encodeKey/1cols-10                                    105B ± 0%       73B ± 0%  -30.48%  (p=0.000 n=15+15)
Encoders/json/encodeKey/2cols-10                                    181B ± 0%      101B ± 0%  -44.20%  (p=0.000 n=15+15)
Encoders/json/encodeKey/3cols-10                                    649B ± 0%      295B ± 0%  -54.55%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                    757B ± 0%      381B ± 0%  -49.67%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    2.76kB ± 0%    0.55kB ± 0%  -80.01%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     3.10kB ± 0%    0.93kB ± 0%  -70.08%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    3.93kB ± 0%    1.08kB ± 0%  -72.47%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     4.61kB ± 0%    1.95kB ± 0%  -57.59%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    10.6kB ± 0%     2.1kB ± 0%  -79.70%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     11.7kB ± 0%     3.9kB ± 0%  -66.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    20.7kB ± 0%     4.6kB ± 0%  -77.75%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     23.6kB ± 0%     8.7kB ± 0%  -63.08%  (p=0.000 n=15+15)


name                                                            old alloc/op   new alloc/op   delta
Encoders/json/encodeKey/1cols-10                                    105B ± 0%       73B ± 0%  -30.48%  (p=0.000 n=15+15)
Encoders/json/encodeKey/2cols-10                                    181B ± 0%      101B ± 0%  -44.20%  (p=0.000 n=15+15)
Encoders/json/encodeKey/3cols-10                                    649B ± 0%      295B ± 0%  -54.55%  (p=0.000 n=15+15)
Encoders/json/encodeKey/4cols-10                                    757B ± 0%      381B ± 0%  -49.67%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=false-10        2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=row/diff=true-10         2.53kB ± 0%    0.46kB ± 0%  -81.65%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=false-10    2.76kB ± 0%    0.55kB ± 0%  -80.01%  (p=0.000 n=15+15)
Encoders/json/encodeValue/1cols/envelope=wrapped/diff=true-10     3.10kB ± 0%    0.93kB ± 0%  -70.08%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=false-10        4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=row/diff=true-10         4.19kB ± 0%    0.96kB ± 0%  -77.06%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=false-10    3.93kB ± 0%    1.08kB ± 0%  -72.47%  (p=0.000 n=15+15)
Encoders/json/encodeValue/2cols/envelope=wrapped/diff=true-10     4.61kB ± 0%    1.95kB ± 0%  -57.59%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=false-10        9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=row/diff=true-10         9.85kB ± 0%    1.83kB ± 0%  -81.38%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=false-10    10.6kB ± 0%     2.1kB ± 0%  -79.70%  (p=0.000 n=15+15)
Encoders/json/encodeValue/3cols/envelope=wrapped/diff=true-10     11.7kB ± 0%     3.9kB ± 0%  -66.82%  (p=0.000 n=14+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=false-10        20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=row/diff=true-10         20.0kB ± 0%     4.2kB ± 0%  -78.93%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=false-10    20.7kB ± 0%     4.6kB ± 0%  -77.75%  (p=0.000 n=15+15)
Encoders/json/encodeValue/4cols/envelope=wrapped/diff=true-10     23.6kB ± 0%     8.7kB ± 0%  -63.08%  (p=0.000 n=15+15)

Copy link
Copy Markdown
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 11 of 11 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @stevendanna)

@miretskiy
Copy link
Copy Markdown
Contributor Author

@HonoreDB: This needs to have a more careful review;
In particular, I had to change how "bare" envelope encodes key_in_value.
Putting "key" as a top level tuple is not okay (since building json object would fail if the table also has a column named
key); therefore, I made the changes to put the key inside crdb and made appropriate changes in the encoder as well as testfeed implementation.

In addition, I removed the forced setting of key_in_value when using bare envelope -- why would you force that?
If you select *, you have the key -- so if you want the option, then specify it.

@miretskiy miretskiy force-pushed the json_encoder branch 5 times, most recently from fc0c3a5 to 63e48ca Compare September 20, 2022 12:34
Copy link
Copy Markdown
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 13 files at r5, 18 of 23 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @stevendanna)

Add JSON encoder benchmark.

Release note: None
Release justification: test only change
@miretskiy miretskiy force-pushed the json_encoder branch 3 times, most recently from 9ac0374 to 575f5dd Compare September 21, 2022 21:02
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
@miretskiy
Copy link
Copy Markdown
Contributor Author

bors r=honoredb

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 22, 2022

Build succeeded:

@craig craig bot merged commit a18db70 into cockroachdb:master Sep 22, 2022
@miretskiy
Copy link
Copy Markdown
Contributor Author

Adding backport-22.2 label; but will not backport to 22.2.0; will wait until at least 22.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants