workload/ycsb: correctly populate columns during import#50475
workload/ycsb: correctly populate columns during import#50475craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
key := cb.ColVec(0).Bytes() key.Reset()
Out of curiosity, why is Reset needed?
f0d7695 to
830dea2
Compare
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Out of curiosity, why is
Resetneeded?
I'm not sure. I cargo-culted this from the bank workload. I've now cargo-culted the comment from there as well.
be1c0ef to
2aa1af2
Compare
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/ccl/workloadccl/allccl/all_test.go, line 270 at r2 (raw file):
`tpcc`: 0xab32e4f5e899eb2f, `tpch`: 0xdd952207e22aa577, `ycsb`: 0x1244ea1c29ef67f6,
@nvanbenschoten CI caught this. I believe the value just needed to be updated and that there is no requirement that it remain unchanged.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
pkg/ccl/workloadccl/allccl/all_test.go, line 270 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
@nvanbenschoten CI caught this. I believe the value just needed to be updated and that there is no requirement that it remain unchanged.
Yeah, this might have been a little more important back when we used pre-generated fixtures all over the place. It should be fine to change now.
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm not sure. I cargo-culted this from the
bankworkload. I've now cargo-culted the comment from there as well.
Does the workload fail without it? 466fb1a indicates that it should.
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does the workload fail without it? 466fb1a indicates that it should.
I didn't actually try. Looks like there is some commentary on coldata.Bytes to prefer AppendVal instead of Set.
@asubiotto or @yuzefovich What is the correct pattern to be following here?
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I didn't actually try. Looks like there is some commentary on
coldata.Bytesto preferAppendValinstead ofSet.@asubiotto or @yuzefovich What is the correct pattern to be following here?
If we want to reuse the same underlying memory of a Bytes columnar vector, then Reset needs to be called.
In terms of which method, AppendVal or Set, should be used - under the hood they behave very similar in a sense that there is no data movement going on in the big flat []byte slice that is underlying the vector (since we allow Sets to be only in ascending order of positions). The difference is that Sets can occur for a position within the length of Bytes vector (so it is similar to something like vec[idx] = value for which idx < len(vec)) whereas AppendVal appends past the current length of the vector (so it is similar to vec = append(vec, value)). The current way of using Set operation seems fine to me (alternative way would be
vec.SetLength(0)
for ... {
vec.AppendVal(value)
}
if all Sets are for consequent indices).
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, yuzefovich wrote…
If we want to reuse the same underlying memory of a
Bytescolumnar vector, thenResetneeds to be called.In terms of which method,
AppendValorSet, should be used - under the hood they behave very similar in a sense that there is no data movement going on in the big flat[]byteslice that is underlying the vector (since we allowSets to be only in ascending order of positions). The difference is thatSets can occur for a position within the length ofBytesvector (so it is similar to something likevec[idx] = valuefor whichidx < len(vec)) whereasAppendValappends past the current length of the vector (so it is similar tovec = append(vec, value)). The current way of usingSetoperation seems fine to me (alternative way would bevec.SetLength(0) for ... { vec.AppendVal(value) }if all
Sets are for consequent indices).
Thanks @yuzefovich. Something I'm confused about. We're using a bufalloc.ByteAllocator for allocating the fields before calling coldata.Bytes.Set. That allocation doesn't seem to be necessary as coldata.Bytes.Set is copying the data. Is the use of bufalloc here unnecessary? Could I instead use a temp buffer?
Note that I was following the pattern in workload/bank/bank.go. I'd want to make both pieces of code look similar. It is possible other workloads are also using bufalloc.ByteAllocator unnecessarily.
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Is the use of
bufallochere unnecessary? Could I instead use a temp buffer?
Yes, I think the usage of temporary buffer is acceptable here since we do perform a full copy on Bytes.Set.
We seem to be using bufalloc in many workloads. I think the reason was that in initial implementation of coldata.Bytes we were using [][]byte under the hood, so a separate allocator was beneficial (since we might have not been performing a deep copy or maybe to improve data locality or something), but we changed the implementation to use a single "flat" []byte with a deep copy after 19.2 release, so we probably don't need to use bufalloc anymore.
Fix `fixtures import ycsb` to import correctly size values instead of empty values. Switch to using `golang.org/x/exp/rand` from `math/rand` which is faster and lower memory overhead. This follows the path blazed by other workloads such as `tpcc`. Release note: None
2aa1af2 to
69f7933
Compare
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, yuzefovich wrote…
Is the use of
bufallochere unnecessary? Could I instead use a temp buffer?Yes, I think the usage of temporary buffer is acceptable here since we do perform a full copy on
Bytes.Set.We seem to be using
bufallocin many workloads. I think the reason was that in initial implementation ofcoldata.Byteswe were using[][]byteunder the hood, so a separate allocator was beneficial (since we might have not been performing a deep copy or maybe to improve data locality or something), but we changed the implementation to use a single "flat"[]bytewith a deep copy after 19.2 release, so we probably don't need to usebufallocanymore.
Thanks @yuzefovich. I've removed the use of bufalloc.ByteAllocator. Everything still seems to work, so I'll bors when CI is green. I'm leaving fixing up of other workloads to someone else. Do you want to pick this up?
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 323 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Thanks @yuzefovich. I've removed the use of
bufalloc.ByteAllocator. Everything still seems to work, so I'll bors when CI is green. I'm leaving fixing up of other workloads to someone else. Do you want to pick this up?
Yeah, sounds good.
|
TFTR! bors r+ |
Build failed (retrying...) |
Build succeeded |
Merge changes from PR #50475 which populates the fields[0-9] during the init
--insert-hash=true creates user1, user2 for ycsb_key --zero-padding=2 creates user01, user02 for ycsb_key --time-string=true creates field[0-9] with example `2020-06-23 13:23:58.529711+00:00DwYFKxzTOKqgYwdHZNmbzdkRfINphVojmwiDbhaFsEnFHRpxSJQdSRnXOrQbjJhjYzko` instead of `rmArXhKIjbpTAPwyjDNsmkGTLXWNQHQCLAyKwWFOrlopXliOKqQANpakAxsmqfPzYSvAhgYYVQQDhUEuXCVuvsEdJrrguNSaXDbx` Closes #50545 and #50544 Release note (general change): added new options to YCSB that mirrors OpenSource YCSB options merge PR 50475 Merge changes from PR #50475 which populates the fields[0-9] during the init
Fix
fixtures import ycsbto import correctly size values instead ofempty values.
Switch to using
golang.org/x/exp/randfrommath/randwhich is fasterand lower memory overhead. This follows the path blazed by other
workloads such as
tpcc.Release note: None