Skip to content

workload/ycsb: correctly populate columns during import#50475

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
petermattis:pmattis/ycsb-import
Jun 22, 2020
Merged

workload/ycsb: correctly populate columns during import#50475
craig[bot] merged 1 commit intocockroachdb:masterfrom
petermattis:pmattis/ycsb-import

Conversation

@petermattis
Copy link
Copy Markdown
Collaborator

Fix fixtures import ycsb to import correctly size values instead of
empty values.

Switch to using golang.org/x/exp/rand from math/rand which is faster
and lower memory overhead. This follows the path blazed by other
workloads such as tpcc.

Release note: None

@petermattis petermattis requested a review from nvb June 22, 2020 14:15
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

				key := cb.ColVec(0).Bytes()
				key.Reset()

Out of curiosity, why is Reset needed?

@petermattis petermattis force-pushed the pmattis/ycsb-import branch from f0d7695 to 830dea2 Compare June 22, 2020 14:28
Copy link
Copy Markdown
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Out of curiosity, why is Reset needed?

I'm not sure. I cargo-culted this from the bank workload. I've now cargo-culted the comment from there as well.

@petermattis petermattis force-pushed the pmattis/ycsb-import branch 2 times, most recently from be1c0ef to 2aa1af2 Compare June 22, 2020 16:51
Copy link
Copy Markdown
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/ccl/workloadccl/allccl/all_test.go, line 270 at r2 (raw file):

		`tpcc`:       0xab32e4f5e899eb2f,
		`tpch`:       0xdd952207e22aa577,
		`ycsb`:       0x1244ea1c29ef67f6,

@nvanbenschoten CI caught this. I believe the value just needed to be updated and that there is no requirement that it remain unchanged.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)


pkg/ccl/workloadccl/allccl/all_test.go, line 270 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

@nvanbenschoten CI caught this. I believe the value just needed to be updated and that there is no requirement that it remain unchanged.

Yeah, this might have been a little more important back when we used pre-generated fixtures all over the place. It should be fine to change now.


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not sure. I cargo-culted this from the bank workload. I've now cargo-culted the comment from there as well.

Does the workload fail without it? 466fb1a indicates that it should.

Copy link
Copy Markdown
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does the workload fail without it? 466fb1a indicates that it should.

I didn't actually try. Looks like there is some commentary on coldata.Bytes to prefer AppendVal instead of Set.

@asubiotto or @yuzefovich What is the correct pattern to be following here?

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I didn't actually try. Looks like there is some commentary on coldata.Bytes to prefer AppendVal instead of Set.

@asubiotto or @yuzefovich What is the correct pattern to be following here?

If we want to reuse the same underlying memory of a Bytes columnar vector, then Reset needs to be called.

In terms of which method, AppendVal or Set, should be used - under the hood they behave very similar in a sense that there is no data movement going on in the big flat []byte slice that is underlying the vector (since we allow Sets to be only in ascending order of positions). The difference is that Sets can occur for a position within the length of Bytes vector (so it is similar to something like vec[idx] = value for which idx < len(vec)) whereas AppendVal appends past the current length of the vector (so it is similar to vec = append(vec, value)). The current way of using Set operation seems fine to me (alternative way would be

vec.SetLength(0)
for ... {
  vec.AppendVal(value)
}

if all Sets are for consequent indices).

Copy link
Copy Markdown
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, yuzefovich wrote…

If we want to reuse the same underlying memory of a Bytes columnar vector, then Reset needs to be called.

In terms of which method, AppendVal or Set, should be used - under the hood they behave very similar in a sense that there is no data movement going on in the big flat []byte slice that is underlying the vector (since we allow Sets to be only in ascending order of positions). The difference is that Sets can occur for a position within the length of Bytes vector (so it is similar to something like vec[idx] = value for which idx < len(vec)) whereas AppendVal appends past the current length of the vector (so it is similar to vec = append(vec, value)). The current way of using Set operation seems fine to me (alternative way would be

vec.SetLength(0)
for ... {
  vec.AppendVal(value)
}

if all Sets are for consequent indices).

Thanks @yuzefovich. Something I'm confused about. We're using a bufalloc.ByteAllocator for allocating the fields before calling coldata.Bytes.Set. That allocation doesn't seem to be necessary as coldata.Bytes.Set is copying the data. Is the use of bufalloc here unnecessary? Could I instead use a temp buffer?

Note that I was following the pattern in workload/bank/bank.go. I'd want to make both pieces of code look similar. It is possible other workloads are also using bufalloc.ByteAllocator unnecessarily.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Is the use of bufalloc here unnecessary? Could I instead use a temp buffer?

Yes, I think the usage of temporary buffer is acceptable here since we do perform a full copy on Bytes.Set.

We seem to be using bufalloc in many workloads. I think the reason was that in initial implementation of coldata.Bytes we were using [][]byte under the hood, so a separate allocator was beneficial (since we might have not been performing a deep copy or maybe to improve data locality or something), but we changed the implementation to use a single "flat" []byte with a deep copy after 19.2 release, so we probably don't need to use bufalloc anymore.

Fix `fixtures import ycsb` to import correctly size values instead of
empty values.

Switch to using `golang.org/x/exp/rand` from `math/rand` which is faster
and lower memory overhead. This follows the path blazed by other
workloads such as `tpcc`.

Release note: None
@petermattis petermattis force-pushed the pmattis/ycsb-import branch from 2aa1af2 to 69f7933 Compare June 22, 2020 18:50
Copy link
Copy Markdown
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @nvanbenschoten)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, yuzefovich wrote…

Is the use of bufalloc here unnecessary? Could I instead use a temp buffer?

Yes, I think the usage of temporary buffer is acceptable here since we do perform a full copy on Bytes.Set.

We seem to be using bufalloc in many workloads. I think the reason was that in initial implementation of coldata.Bytes we were using [][]byte under the hood, so a separate allocator was beneficial (since we might have not been performing a deep copy or maybe to improve data locality or something), but we changed the implementation to use a single "flat" []byte with a deep copy after 19.2 release, so we probably don't need to use bufalloc anymore.

Thanks @yuzefovich. I've removed the use of bufalloc.ByteAllocator. Everything still seems to work, so I'll bors when CI is green. I'm leaving fixing up of other workloads to someone else. Do you want to pick this up?

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @nvanbenschoten)


pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Thanks @yuzefovich. I've removed the use of bufalloc.ByteAllocator. Everything still seems to work, so I'll bors when CI is green. I'm leaving fixing up of other workloads to someone else. Do you want to pick this up?

Yeah, sounds good.

@petermattis
Copy link
Copy Markdown
Collaborator Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 22, 2020

Build failed (retrying...)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 22, 2020

Build succeeded

@craig craig bot merged commit 71feb47 into cockroachdb:master Jun 22, 2020
@petermattis petermattis deleted the pmattis/ycsb-import branch June 22, 2020 23:26
robert-s-lee added a commit that referenced this pull request Jun 29, 2020
Merge changes from PR #50475 which  populates the fields[0-9] during the init
robert-s-lee added a commit that referenced this pull request Jul 1, 2020
--insert-hash=true creates user1, user2 for ycsb_key
--zero-padding=2 creates user01, user02 for ycsb_key
--time-string=true creates field[0-9] with example
`2020-06-23 13:23:58.529711+00:00DwYFKxzTOKqgYwdHZNmbzdkRfINphVojmwiDbhaFsEnFHRpxSJQdSRnXOrQbjJhjYzko`
instead of
`rmArXhKIjbpTAPwyjDNsmkGTLXWNQHQCLAyKwWFOrlopXliOKqQANpakAxsmqfPzYSvAhgYYVQQDhUEuXCVuvsEdJrrguNSaXDbx`

Closes #50545 and #50544

Release note (general change): added new options to YCSB that mirrors OpenSource YCSB options

merge PR 50475

Merge changes from PR #50475 which  populates the fields[0-9] during the init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants