Skip to content

workload/ycsb: add flag to use column families#32704

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/ycsbFam
Nov 29, 2018
Merged

workload/ycsb: add flag to use column families#32704
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/ycsbFam

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Nov 29, 2018

This change adds a new --families flag to the ycsb workload. Now
that #18168 is addressed, this significantly reduces the contention
present in the workload by avoiding conflicts on updates to different
columns in the same table.

I just confirmed that this still provides a huge speedup. On a 24 cpu machine:

workload run ycsb --init --workload='A' --concurrency=128 --duration=1m


--families=false gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         103089         1718.0     13.9      0.6      1.8    604.0   1476.4  read
   60.0s        0         102947         1715.6     59.5      5.2     11.5   2281.7   8321.5  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         206036         3433.6     36.7      3.3      8.9   1342.2   8321.5


--families=true gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         333477         5557.8      9.2      0.6      6.0    302.0   1275.1  read
   60.0s        0         332366         5539.3     13.7      6.8     17.8     54.5   4831.8  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         665843        11097.1     11.5      3.9     16.3    268.4   4831.8

cc. @robert-s-lee @drewdeally

Release note: None

This change adds a new `--families` flag to the ycsb workload. Now
that cockroachdb#18168 is addressed, this significantly reduces the contention
present in the workload by avoiding conflicts on updates to different
columns in the same table.

Release note: None
@nvb nvb requested a review from m-schneider November 29, 2018 00:36
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Nov 29, 2018

bors r+

craig bot pushed a commit that referenced this pull request Nov 29, 2018
32704: workload/ycsb: add flag to use column families r=nvanbenschoten a=nvanbenschoten

This change adds a new `--families` flag to the ycsb workload. Now
that #18168 is addressed, this significantly reduces the contention
present in the workload by avoiding conflicts on updates to different
columns in the same table.

I just confirmed that this still provides a huge speedup. On a 24 cpu machine:
```
workload run ycsb --init --workload='A' --concurrency=128 --duration=1m


--families=false gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         103089         1718.0     13.9      0.6      1.8    604.0   1476.4  read
   60.0s        0         102947         1715.6     59.5      5.2     11.5   2281.7   8321.5  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         206036         3433.6     36.7      3.3      8.9   1342.2   8321.5


--families=true gives:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         333477         5557.8      9.2      0.6      6.0    302.0   1275.1  read
   60.0s        0         332366         5539.3     13.7      6.8     17.8     54.5   4831.8  update

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0         665843        11097.1     11.5      3.9     16.3    268.4   4831.8
```

cc. @robert-s-lee @drewdeally 

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 29, 2018

Build succeeded

@craig craig bot merged commit 16ffee3 into cockroachdb:master Nov 29, 2018
@nvb nvb deleted the nvanbenschoten/ycsbFam branch November 29, 2018 18:54
nvb added a commit to nvb/cockroach that referenced this pull request Mar 5, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
RichardJCai pushed a commit to RichardJCai/cockroach that referenced this pull request Mar 9, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
RichardJCai pushed a commit to RichardJCai/cockroach that referenced this pull request Mar 9, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants