roachtest: add native libraries to test specification and verify#86451
Conversation
renatolabs
left a comment
There was a problem hiding this comment.
Great job! 🎉
Question: should we make it easier for people to indicate the test needs libgeos? The only real value we ever pass to NativeLibs is []string{"libgeos.so", "libgeos_c.so"} right now (and this may continue to be the case for the short/medium term).
Maybe something like
// cluster.go
var LibGEOS = []string{"libgeos.so", "libgeos_c.so"}
// roachtest
NativeLibs: LibGEOS,Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)
pkg/cmd/roachtest/cluster.go line 1646 at r1 (raw file):
// at the specified location. func (c *clusterImpl) PutLibraries( ctx context.Context, libraryDir string, libraries []string,
Documentation needs to change to reflect that only the libraries in libraries will be staged.
pkg/cmd/roachtest/cluster.go line 1652 at r1 (raw file):
} libSet := map[string]bool{}
Nits:
- sets in Go are slightly more idiomatic if defined as
map[T]struct{} - maybe we don't need a set at all and linear search here is fine (adding a
containsfunction to this package). Would probably increase readability too:if !contains(libraries, filename) { continue }
pkg/cmd/roachtest/cluster.go line 1681 at r1 (raw file):
} func (c *clusterImpl) VerifyLibraries(requiredLibs []string) error {
This function would benefit from comments, especially if the logic involving local and hasRemoteSuffix is indeed necessary.
pkg/cmd/roachtest/cluster.go line 1681 at r1 (raw file):
} func (c *clusterImpl) VerifyLibraries(requiredLibs []string) error {
Also, could we make this a package-level function? It doesn't not use any Cluster-related state and only uses package-level state. Added benefit of not having to change the Cluster interface.
pkg/cmd/roachtest/cluster.go line 1682 at r1 (raw file):
func (c *clusterImpl) VerifyLibraries(requiredLibs []string) error { if len(requiredLibs) == 0 {
Nit: since libraryFilePaths will be fairly small for the foreseeable future, I'd just remove the short-circuit logic and let the return nil emerge naturally as a consequence of the empty requiredLibs.
pkg/cmd/roachtest/cluster.go line 1693 at r1 (raw file):
} else if local && hasRemoteSuffix { continue }
I'm confused by the logic here. Why do we ignore libraries in lib if this is a remote run and vice-versa? Is it necessary? It also concerns me that this function is coupled to the directory order defined in findBinaryOrLibrary, so a change there could lead to this function behaving differently which is unexpected.
I also noticed that this verification will fail on my gceworker (which has SOs on both the lib and lib.docker_amd64 directories) if I run the test locally.
pkg/cmd/roachtest/cluster_test.go line 243 at r1 (raw file):
err := c.VerifyLibraries(tc.verifyLibs) if tc.expectedError != (err != nil) { t.Fatalf("expected %t, but found %t", tc.expectedError, err != nil)
Some suggestions here:
- make
expectedErrora string instead of a boolean, with the actual error message you are expecting. Makes the test stronger, and the list of test cases easier to understand. - print the error message you got in
t.Fatalf(). This makes the test output a lot more actionable. Right now we just inform that two booleans didn't match, but I'd have no idea what error the function returned without running it again (in this case the function is deterministic -- imagine if this was a complicated non-deterministic test!)
srosenberg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @smg260)
pkg/cmd/roachtest/cluster.go line 100 at r1 (raw file):
const ( defaultEncryptionProbability = 1 remoteLibrarySuffix = ".docker_amd64"
Methinks this is an artifact of the make build [1] whereas the cross-compiler puts them under BAZEL_BIN, copies them to lib.docker_amd64 (presumably for compatibility with make?) and symlinks lib[2]. Since roachtest is already on bazel, we can probably drop make compatibility, i.e., the libs should be under /lib. @rickystewart Does that sound right?
PS: note, we could also have arm64 (when cross-building) although that's in scope for a follow-up PR.
[1] https://github.com/cockroachdb/cockroach/blob/master/build/builder.sh#L147
[2] https://github.com/cockroachdb/cockroach/blob/master/build/teamcity/cockroach/nightlies/roachtest_compile_bits.sh#L22
srosenberg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @smg260)
pkg/cmd/roachtest/tests/activerecord.go line 257 at r1 (raw file):
Owner: registry.OwnerSQLExperience, Cluster: r.MakeClusterSpec(1), NativeLibs: []string{"libgeos.so", "libgeos_c.so"},
findLibrary takes care of the suffix conditional on runtime.GOOS; why do we need it here?
herkolategan
left a comment
There was a problem hiding this comment.
Thanks for the thorough review @renatolabs, I appreciate it. This will definitely help me get up to speed quicker.
Suggestion on this comment makes sense to me, the thought did cross my mind as well.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @smg260)
pkg/cmd/roachtest/cluster.go line 1646 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Documentation needs to change to reflect that only the libraries in
librarieswill be staged.
Good catch. I might revert this method based on previous comments, but if I keep it I'll be sure to update the documentation.
pkg/cmd/roachtest/cluster.go line 1652 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Nits:
- sets in Go are slightly more idiomatic if defined as
map[T]struct{}- maybe we don't need a set at all and linear search here is fine (adding a
containsfunction to this package). Would probably increase readability too:if !contains(libraries, filename) { continue }
Good idea, no need for cumbersome code. And thanks for the hint on how to define sets
pkg/cmd/roachtest/cluster.go line 1681 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
This function would benefit from comments, especially if the logic involving
localandhasRemoteSuffixis indeed necessary.
Yes I agree, since the reasoning behind hasRemoteSuffix has a troublesome history. I'm most likely going to remove this logic, but I'll add commentary.
pkg/cmd/roachtest/cluster.go line 1681 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Also, could we make this a package-level function? It doesn't not use any
Cluster-related state and only uses package-level state. Added benefit of not having to change theClusterinterface.
I was contemplating this as well! And then completely forgot about it. I'll amend.
pkg/cmd/roachtest/cluster.go line 1682 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Nit: since
libraryFilePathswill be fairly small for the foreseeable future, I'd just remove the short-circuit logic and let thereturn nilemerge naturally as a consequence of the emptyrequiredLibs.
Fair enough
pkg/cmd/roachtest/cluster.go line 1693 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
I'm confused by the logic here. Why do we ignore libraries in
libif this is a remote run and vice-versa? Is it necessary? It also concerns me that this function is coupled to the directory order defined infindBinaryOrLibrary, so a change there could lead to this function behaving differently which is unexpected.I also noticed that this verification will fail on my gceworker (which has SOs on both the
libandlib.docker_amd64directories) if I run the test locally.
The libraries in lib and lib.docker_amd64 should be the same, but for different architectures, hence you only need to copy the correct set, depending on where the test will run.
I tested this on Linux which produces ".so" libs in /lib & /lib.docker_amd64 and it passed, similar to a test like this, which should actually be part of the test suite to confirm this scenario:
{
name: "conflict both SO",
local: true,
verifyLibs: []string{"geos.so"},
libraryFilePaths: []string{"/lib/geos.so", fmt.Sprintf("/lib.%s/geos.so", remoteLibrarySuffix)},
expectedError: false,
},
This will pass for local: false as well.
Having said that, this logic is hard to read and make sense of. I gave a long-winded response on a comment @srosenberg left to explain the scenario.
You are absolutely right about relying on the output of findBinaryOrLibrary which I believe could cause issues, especially when more than one Linux distribution is involved between local & remote.
I believe thefindBinaryOrLibrary should be updated to only add good candidates to the list that is compatible with the target execution cluster, and would appreciate @srosenberg and your opinion on that.
The resolution here could also be to simplify this logic and to address the possible problems with findBinaryOrLibrary in a different PR / issue.
pkg/cmd/roachtest/cluster_test.go line 243 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Some suggestions here:
- make
expectedErrora string instead of a boolean, with the actual error message you are expecting. Makes the test stronger, and the list of test cases easier to understand.- print the error message you got in
t.Fatalf(). This makes the test output a lot more actionable. Right now we just inform that two booleans didn't match, but I'd have no idea what error the function returned without running it again (in this case the function is deterministic -- imagine if this was a complicated non-deterministic test!)
Makes perfect sense, thanks for the detailed explanation.
pkg/cmd/roachtest/tests/activerecord.go line 257 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
findLibrarytakes care of the suffix conditional onruntime.GOOS; why do we need it here?
Good catch, and no good reason; this is an oversight which I shall fix.
herkolategan
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, @rickystewart, @smg260, and @srosenberg)
pkg/cmd/roachtest/cluster.go line 100 at r1 (raw file):
Excuse the long-winded response, but there are some subtle intricacies that come into play here that I want to point out:
Having said that, I think you're right and there are some things here that need revision, but it might not be in the scope of this PR as it solves a niche problem that I'll try to articulate:
I tried to solve a problem here which happens when your local system is a different flavour of Linux than the target cluster for roachtest. In this scenario it's possible that the test will execute and use locally built incompatible libraries against the target since the libs in "/lib" are also ".so". For remote clusters the existing code assumes as long as the suffix is ".so" it can be added to the found libraries, but in essence a cross compilation should have been done using docker. Judging by this code snippet [1] the assumption is also that the remote target cluster is always Linux based.
The question here is if we assume that remote clusters are always Linux based, can we assume that we should be checking only for binaries / libraries in the Linux target docker output (lib.docker_amd64), and ignore the local "/lib" folder since it could be incompatible with the remote target even if it contains ".so" libs. The FindBinaryOrLibrary implementation already has more specific logic to look explicitly for the cross compiled version [2], and maybe this should be applicable to the libraries as well.
But I think this should be resolved in a different PR, and I should rely on the output provided by FindLibrary for this PR. If the FindBinaryOrLibrary code is incorrect in its assumption that any ".so" will suffice for a remote cluster we can create a different issue for that.
[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/cluster.go#L131
[2] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/cluster.go#L201
PS:
copies them to lib.docker_amd64 (presumably for compatibility with make?) and symlinks lib[2].
I think this might only be for teamcity. On a local build the "/lib" folder is not sym linked to "/lib.docker_amd64", and my assumption is that "/lib" is meant for local architecture, and "/lib.docker_amd64" is meant for remote architecture.
I agree further improvements on the library resolution/lookup should be addressed in a different PR and this one should focus on removing
I wonder if this was the intention. I would expect the directory order in |
8523e60 to
5a6262e
Compare
herkolategan
left a comment
There was a problem hiding this comment.
CI Failing due to #86838
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @rickystewart, @smg260, and @srosenberg)
Your understanding is correct. I would like to see the whole |
5a6262e to
6fbef4b
Compare
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @rickystewart, @smg260, and @srosenberg)
a discussion (no related file):
@srosenberg @renatolabs ready for another pass at your leisure.
renatolabs
left a comment
There was a problem hiding this comment.
This looks much simpler now!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @rickystewart, @smg260, and @srosenberg)
pkg/cmd/roachtest/test_runner.go line 621 at r3 (raw file):
t.spec.Owner = oldOwner } else { c.setTest(t)
Nit: do we need this for PutLibraries? We could keep this call where it used to be, right after the "running test" status is posted.
pkg/cmd/roachtest/test_runner.go line 622 at r3 (raw file):
} else { c.setTest(t) c.status("copying libraries")
Nit: we don't need this, as PutLibraries itself will print a similar message. Also, it could be confusing in the case when there are no libraries to be copied.
pkg/cmd/roachtest/test_runner.go line 624 at r3 (raw file):
c.status("copying libraries") if len(t.spec.NativeLibs) != 0 { if err = c.PutLibraries(ctx, "./lib", t.spec.NativeLibs); err != nil {
Nit: PutLibraries could handle an empty NativeLibs slice; otherwise, callers will only have to check before calling. I know this is the only calling site right now, so this is more of a general suggestion/API comment.
renatolabs
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @rickystewart, @smg260, and @srosenberg)
pkg/cmd/roachtest/cluster_test.go line 184 at r3 (raw file):
func TestVerifyLibraries(t *testing.T) { testCases := []struct {
Forgot to mention: while not a big deal in this case specifically, it's good to keep the package's state untouched after running this test (at the end of it, the libraryFilePaths package variable will be set to a test value).
We can add the following at the top to that end:
originalLibraryPaths = libraryFilePaths
defer func() { libraryFilePaths = originalLibraryPaths }()This commit adds native libraries to the test specifications. Each native library that is required should be specified by name. The spec is verified in the test runner and uses the existing logic that populates libraryFilePaths to confirm that libraries are present. Note was taken on this change that findLibrary might need to be revised at a later stage to ensure it only provides libraries compatible with the target cluster. Verification happens before the cluster is provisioned. The test will fail quickly if libraries are not present to support the planned execution. Resolves: cockroachdb#81081 Release justification: test-only change. Release note: None.
6fbef4b to
7f491f5
Compare
This commit adds native libraries to the test specifications. Each native library that is required should be specified by name. The spec is verified in the test runner and uses the existing logic that populates libraryFilePaths to confirm that libraries are present. Note was taken on this change that findLibrary might need to be revised at a later stage to ensure it only provides libraries compatible with the target cluster. Verification happens before the cluster is provisioned. The test will fail quickly if libraries are not present to support the planned execution. Resolves: cockroachdb#81081 Release justification: test-only change. Release note: None.
7f491f5 to
b46bde6
Compare
|
bors r=renatolabs |
|
Build succeeded: |
|
blathers backport 22.2 |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 81a81fb to blathers/backport-release-22.2-86451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
This commit adds native libraries to the test specifications.
Each native library that is required should be specified by
filename.
The spec is verified in the test runner and also takes into account
whether the test is running locally or remotely. For local tests
the lib directory is verified, and for remote tests the docker lib
(amd64) directory is verified.
Verification happens before the cluster is provisioned. The test
will fail quickly if libraries are not present to support the
planned execution.
Resolves: #81081
Release justification: test-only change.
Release note: None.