Prevent Task Hosts from communicating with other nodes Fixes #5423#5439
Prevent Task Hosts from communicating with other nodes Fixes #5423#5439Forgind merged 2 commits intodotnet:masterfrom
Conversation
rainersigwald
left a comment
There was a problem hiding this comment.
We need to figure out if this needs to go into 16.7p3 QB mode at the last minute.
If the problem was the FileVersionHash's eighth bit, can you double check the p3 version's hash to make sure it doesn't have that?
There was a problem hiding this comment.
Can you update this comment to match current reality, please?
683b24c to
e5f43ca
Compare
e5f43ca to
67921e6
Compare
There was a problem hiding this comment.
I'm afraid that the casts to long in the original code were important. The result of shifting an int left by N bits is still an int and all but the lowest 5 bits of N are discarded.
E.g.
(nodeHandshakeSalt << 44)
is the same as
(nodeHandshakeSalt << 12)
here. The fact that it's being assigned to a long doesn't make a difference.
rainersigwald
left a comment
There was a problem hiding this comment.
Notes from triage:
- Looks like the 16.7p3 build did not happen to have a 1 in the bit location that would expose the bug, so this is not a critical fix for that preview release.
- We thought about replacing the shifting and combining with a struct with all relevant fields + a generated hash-combination. But then there could be a collision between (vN + taskhost) and (vM + not-taskhost).
- Switch to
unsigned longto avoid extending the sign bit to create a1111prefix.
67921e6 to
3516d9d
Compare
rainersigwald
left a comment
There was a problem hiding this comment.
I think this looks like what we discussed in triage today.
There was a problem hiding this comment.
+ has higher precedence than << so this is not doing the right thing.
Also:
The reason you're xor'ing the result with the salt is that you can use the full 32 bits of the salt now, is that correct? It introduces potential collisions between the salt the other fields, however. For example:
nodeType = NodeType.None, FileVersionHash = 0, nodeHandshakeSalt = 256
and
nodeType = NodeType.TaskHost, FileVersionHash = 1, nodeHandshakeSalt = 0
both yield the same baseHandshake of 1099511628032 so this has the same problem as just encapsulating all these integers in a struct and computing its hash. I'd expect this to be undesired and would rather be open to more collisions within the handshake salt but keeping the guarantee that a task host will never be mistaken for a non-task host. What do you think?
There was a problem hiding this comment.
I should have checked that, but I'd thought all bitwise operators were higher precedence than arithmetic ones. Thanks!
I may have misunderstood what we wanted to do, but I thought that was actually what we were trying for. The salt can still be used to prevent nodes from communicating as long as you don't choose one of the few salts that would cause a collision, and this means that if we have a mistake in the hash computation, we can ask users to set the salt to a specific value, and their problems may magically disappear pending a real fix.
There was a problem hiding this comment.
Thinking about this with Monday Brain, I think I'm back on "salt only the non-options bits".
I was thinking that it'd be nice to be able to drop particular bits out of the hash computation. But that doesn't make sense in practice, because the environment variable will be inherited to child processes. So suppose we decided to mask TaskHost--we'd need to have one salt to set it up right in the scheduler process, and another to set it up in the TaskHost workers, or we'd get the same kind of cross-node-type communication we're getting with the current buggy code.
Sorry for the waffling here @Forgind.
ab525db to
1da1216
Compare
There was a problem hiding this comment.
You can do this at compile time instead
| int numHandshakeOptions = (int)Math.Pow(2, Enum.GetNames(HandshakeOptions.None.GetType()).Length - 1); | |
| int numHandshakeOptions = (int)Math.Pow(2, Enum.GetNames(typeof(HandshakeOptions)).Length - 1); |
There was a problem hiding this comment.
Thinking about this with Monday Brain, I think I'm back on "salt only the non-options bits".
I was thinking that it'd be nice to be able to drop particular bits out of the hash computation. But that doesn't make sense in practice, because the environment variable will be inherited to child processes. So suppose we decided to mask TaskHost--we'd need to have one salt to set it up right in the scheduler process, and another to set it up in the TaskHost workers, or we'd get the same kind of cross-node-type communication we're getting with the current buggy code.
Sorry for the waffling here @Forgind.
There was a problem hiding this comment.
Why is this CommunicationsUtilities.Trace instead of an xunit logging trace? Or for that matter just a failed assertion?
There was a problem hiding this comment.
I can switch to _output.WriteLine. I don't want a failed assertion because if there are multiple collisions, I want to see them all.
There was a problem hiding this comment.
nit: It's effectively truncated to 11 bits because the most significant byte is masked out in GenerateHostHandshakeFromBase(). And I don't think the session ID is necessarily an 8-bit value either.
Considering the above and taking a step back, I'm wondering why we insist on encoding everything in a single long instead of defining a proper back- and forward-compatible structure.
struct Handshake
{
byte zeroForBackCompat = 0;
byte lengthForForwardCompat = sizeof(Handshake);
NodeType nodeType;
int handshakeSalt;
int versionHash;
int sessionId;
// ...
}There was a problem hiding this comment.
That does make logging on the rejecting side easier too. Should we push this to unblock VS insertions and other PRs and then switch to that?
There was a problem hiding this comment.
I think it would make sense to merge this PR now and track making further improvements as a lower-pri work item.
There was a problem hiding this comment.
I'd even think about dropping the version hash and just sending the 4 parts of AssemblyFileVersion?
There was a problem hiding this comment.
I put my first draft here.
There are two parts that I think still need fixing:
- fileVersion is currently a string because the four parts were returned together. @rainersigwald, is there a way to get the parts individually and turn them into ints if necessary, is there a premade way to do that, or do I have to roll my own (probably regex) parser?
- It's claiming I need to import ValueTuple before I can have
internal IEnumerable<int part, int index)> RetrieveHandshakeComponents(). Then it's making me sayhandshake.RetrieveHandshakeComponents().First().Item1instead ofhandshake.RetrieveHandshakeComponents().First().partRemove usages of Tuples #5270 made it look like I could just have the valueTuples, and it would work. What do I need to do differently?
There was a problem hiding this comment.
fileVersion is currently a string because the four parts were returned together
You should be able to call Assembly.GetName().Version to get to the four version parts but it looks like in the current code we use AssemblyInformationalVersion which could be an arbitrary string and not necessarily just integers e.g. "16.7.0-dev-20324-01+ee1c9fd0c7e0e43174785a4e2ed177f14f5856a0". Is it maybe too restrictive to have the handshake consist only of integers?
For the new handshake code please consider all these scenarios, both from the client and host perspective:
- The code you are writing trying to connect to a very old MSBuild with an 8-bit handshake.
- The code you are writing trying to connect to a not-so-old MSBuild with a 64-bit handshake.
- The code you are writing trying to connect to a newer MSBuild with an updated
Handshakestructure.
and make sure that the handshake will reliably fail without hanging.
Convert to unsigned longs after converting to unsigned ints to prevent overwriting higher order bits after conversion from negative numbers.
f855e19 to
49bf37a
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
TaskHost bit was sometimes being overwritten by a FileVersionHash with a 1 in its eighth bit. This prevents that from being a problem by switching to xor instead of or.
Fixes #5423.