The default of tests.jvms should not be capped #35259
The default of tests.jvms should not be capped #35259alpar-t merged 6 commits intoelastic:masterfrom
Conversation
Lift the upper limit of the default. We heuristically divide by two because it was found that increasing past the number of physical cores does not yield benefits.
|
Pinging @elastic/es-core-infra |
jasontedor
left a comment
There was a problem hiding this comment.
I left a comment to trigger some discussion.
| project.tasks.withType(RandomizedTestingTask) { | ||
| jvm "${project.runtimeJavaHome}/bin/java" | ||
| parallelism System.getProperty('tests.jvms', 'auto') | ||
| parallelism System.getProperty('tests.jvms', String.valueOf(Runtime.getRuntime().availableProcessors() / 2)) |
There was a problem hiding this comment.
I have some concerns with this. Not all machines have hyper-threading, and not all machines have hyper-threading enabled. Some folks are choosing to disable hyper-threading because of security concerns. Intel is even releasing high-end processors that do not have hyper-threading (e.g., the i7-9700K, a workstation-class processor). On some machines, this will be too conservative.
There was a problem hiding this comment.
ES itself uses Runtime.getRuntime().availableProcessors() as the default for the number of processors. Is there any reason we shouldn't match it? Maybe because that'd be a bit aggressive for tests. Some of them are multi-threaded and they all benefit from multi-threaded GC.
Maybe we make a hasHyperthreading variable? It'd be OS dependant, but easy enough to check on linux and mac. I dunno about windows.
There was a problem hiding this comment.
Dropping my 2 cents in, I feel like we should go with Runtime.getRuntime().availableProcessors(). I think the optimal value varies based on hardware, but I found on an i7-6850k (6 core, 12 threads) that -Dtests.jvms=12 was faster than -Dtests.jvms=6; I'll caveat saying that the last time I tested this was 6 months ago and I didn't test values in between so there could be an optimal value between number of physical cores and number of threads that a processor can handle. Given my feeling that this is going to be hardware dependent, I think we could take the simplest approach first and if needed consider adding detection for hyper threading later on?
There was a problem hiding this comment.
CI has 8 cores 16 threads right now, and running with 8 is faster than with 16 for sure.
It might not be related to hardware alone, but also to the number of the tests, as if there are few tests per worker the cost of starting up those workers is not worth it.
One can easily add systemProp.tests.jvms=6 to ~/.gradle/gradle.properties to customize this for their own hardware if the default is not appropriate.
@jasontedor I'm not entirely convinced it will be too conservative. On the processor you mentioned it will make no difference as it has 8 cores, and we'll continue using 4 on that processor, so there will be no change, as we currently cap at 4.
We will use fewer test workers on CPUs with fewer than 8 cores that have no or disabled hyper-threading.
I don't have a strong preference on the value we set this to. We could set it to the number of processors, but would prefer not to do hyper-threading detection ( or anything OS specific for that matter ) until we are absolutely sure we need it.
Maybe adding some documentation in TESTING will be enough.
There was a problem hiding this comment.
@jaymode In my experience, going above the number of physical cores has led to worse performance. I've seen this on an i7-5930K and an i7-6950X. 🤷♀️
There was a problem hiding this comment.
@atorok By conservative, I mean that it sets the default value at one that is lower than what I think the optimal value is.
There was a problem hiding this comment.
@jasontedor do you think that we should implement hyper-threading detection now ? I was trying to make the point that while the configuration will not be optimal for users with hyper threading disabled or absent, it will still be the same or close as we have Today as we cap to 4 threads right now. If you agree that this is the case we could start with this change to get some benefits in CI right away and then look at what else we can do to have a more generally useful default if this is common enough that we feel having systemProp.tests.jvms documented will not be sufficient.
|
@jasontedor @nik9000 @rjernst ready for review with hyper threading detection on Linux |
| static void applyCommonTestConfig(Project project) { | ||
| String defaultParallel = 'auto' | ||
| if (project.file("/proc/cpuinfo").exists()) { | ||
| Map<String, Integer> socketToCore = [:] |
There was a problem hiding this comment.
Maybe add a comment that we expect this branch for all linux distros. I'm not sure if they all have proper /proc filesystems but I think this is fairly reasonable. At worst they get auto instead.
| project.tasks.withType(RandomizedTestingTask) { | ||
| jvm "${project.runtimeJavaHome}/bin/java" | ||
| parallelism System.getProperty('tests.jvms', 'auto') | ||
| parallelism System.getProperty('tests.jvms', defaultParallel) |
TESTING.asciidoc
Outdated
| By default the tests run on up to 4 JVMs based on the number of cores. If you | ||
| want to explicitly specify the number of JVMs you can do so on the command | ||
| By default the tests run on multiple processes using half the available processors | ||
| as reported by the JVM. |
There was a problem hiding this comment.
Could you update this to mention physical cores?
| if (name == "cpu cores") { | ||
| assert currentID.isEmpty() == false | ||
| socketToCore[currentID] = Integer.valueOf(value) | ||
| } |
There was a problem hiding this comment.
Clear currentID here too I think.
| } | ||
| } | ||
| }) | ||
| defaultParallel = socketToCore.values().sum().toString(); |
There was a problem hiding this comment.
This gets the number of logical cores, I think. I thought we wanted the number of physical cores here. So socketToCore.size(). I think.
There was a problem hiding this comment.
Socket is for when you might have multiple CPU socket, as in multiple CPU chips, thus the sum.
|
Thanks @nik9000 . Ready for another round. |
|
@elasticmachine test thsi please |
|
@elasticmachine test this please |
nik9000
left a comment
There was a problem hiding this comment.
LGTM. I'll get the macOS one once you merge this.
Default to the number of physical cores across all sockets instead.
In elastic#35259 we switched the default number of VMs to fork for unit tests to the number of physical CPU cores. But because we could only get an accurate count on machines with a normal `/proc` filesystem, macOS machine did not pick up the new default. Given that macOS is a huge portion of developer machines, we'd like to get the right default there. This does that. It also moves the default-finding process from happening once per testing task to happening once at startup. This seems like a good choice in general, but a very good choice for macOS because we have to run a command to list the count.
In #35259 we switched the default number of VMs to fork for unit tests to the number of physical CPU cores. But because we could only get an accurate count on machines with a normal `/proc` filesystem, macOS machine did not pick up the new default. Given that macOS is a huge portion of developer machines, we'd like to get the right default there. This does that. It also moves the default-finding process from happening once per testing task to happening once at startup. This seems like a good choice in general, but a very good choice for macOS because we have to run a command to list the count.
In #35259 we switched the default number of VMs to fork for unit tests to the number of physical CPU cores. But because we could only get an accurate count on machines with a normal `/proc` filesystem, macOS machine did not pick up the new default. Given that macOS is a huge portion of developer machines, we'd like to get the right default there. This does that. It also moves the default-finding process from happening once per testing task to happening once at startup. This seems like a good choice in general, but a very good choice for macOS because we have to run a command to list the count.
Lift the upper limit of the default.
We heuristically divide by two because it was found that increasing past
the number of physical cores does not yield benefits.