Skip to content

Prevent Task Hosts from communicating with other nodes Fixes #5423#5439

Merged
Forgind merged 2 commits intodotnet:masterfrom
Forgind:fix-semicolon-test
Jun 23, 2020
Merged

Prevent Task Hosts from communicating with other nodes Fixes #5423#5439
Forgind merged 2 commits intodotnet:masterfrom
Forgind:fix-semicolon-test

Conversation

@Forgind
Copy link
Copy Markdown
Contributor

@Forgind Forgind commented Jun 17, 2020

TaskHost bit was sometimes being overwritten by a FileVersionHash with a 1 in its eighth bit. This prevents that from being a problem by switching to xor instead of or.

Fixes #5423.

Copy link
Copy Markdown
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out if this needs to go into 16.7p3 QB mode at the last minute.

If the problem was the FileVersionHash's eighth bit, can you double check the p3 version's hash to make sure it doesn't have that?

Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Comment on lines 337 to 338
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment to match current reality, please?

Comment thread src/Build.UnitTests/BackEnd/NodeEndpointInProc_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/NodeEndpointInProc_Tests.cs Outdated
Comment thread src/Build.UnitTests/BackEnd/NodeEndpointInProc_Tests.cs Outdated
@Forgind Forgind force-pushed the fix-semicolon-test branch from 683b24c to e5f43ca Compare June 17, 2020 21:38
@Forgind Forgind changed the title Fix semicolon test Fixes #5423 Prevent Task Hosts from communicating with other nodes Fixes #5423 Jun 17, 2020
@Forgind Forgind force-pushed the fix-semicolon-test branch from e5f43ca to 67921e6 Compare June 17, 2020 22:51
Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that the casts to long in the original code were important. The result of shifting an int left by N bits is still an int and all but the lowest 5 bits of N are discarded.

E.g.

(nodeHandshakeSalt << 44)

is the same as

(nodeHandshakeSalt << 12)

here. The fact that it's being assigned to a long doesn't make a difference.

Copy link
Copy Markdown
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from triage:

  1. Looks like the 16.7p3 build did not happen to have a 1 in the bit location that would expose the bug, so this is not a critical fix for that preview release.
  2. We thought about replacing the shifting and combining with a struct with all relevant fields + a generated hash-combination. But then there could be a collision between (vN + taskhost) and (vM + not-taskhost).
  3. Switch to unsigned long to avoid extending the sign bit to create a 1111 prefix.

@Forgind Forgind force-pushed the fix-semicolon-test branch from 67921e6 to 3516d9d Compare June 18, 2020 17:30
Copy link
Copy Markdown
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like what we discussed in triage today.

Comment thread src/Build.UnitTests/Utilities_Tests.cs Outdated
Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ has higher precedence than << so this is not doing the right thing.

Also:

The reason you're xor'ing the result with the salt is that you can use the full 32 bits of the salt now, is that correct? It introduces potential collisions between the salt the other fields, however. For example:

nodeType = NodeType.None, FileVersionHash = 0, nodeHandshakeSalt = 256

and

nodeType = NodeType.TaskHost, FileVersionHash = 1, nodeHandshakeSalt = 0

both yield the same baseHandshake of 1099511628032 so this has the same problem as just encapsulating all these integers in a struct and computing its hash. I'd expect this to be undesired and would rather be open to more collisions within the handshake salt but keeping the guarantee that a task host will never be mistaken for a non-task host. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have checked that, but I'd thought all bitwise operators were higher precedence than arithmetic ones. Thanks!

I may have misunderstood what we wanted to do, but I thought that was actually what we were trying for. The salt can still be used to prevent nodes from communicating as long as you don't choose one of the few salts that would cause a collision, and this means that if we have a mistake in the hash computation, we can ask users to set the salt to a specific value, and their problems may magically disappear pending a real fix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this with Monday Brain, I think I'm back on "salt only the non-options bits".

I was thinking that it'd be nice to be able to drop particular bits out of the hash computation. But that doesn't make sense in practice, because the environment variable will be inherited to child processes. So suppose we decided to mask TaskHost--we'd need to have one salt to set it up right in the scheduler process, and another to set it up in the TaskHost workers, or we'd get the same kind of cross-node-type communication we're getting with the current buggy code.

Sorry for the waffling here @Forgind.

@Forgind Forgind force-pushed the fix-semicolon-test branch from ab525db to 1da1216 Compare June 22, 2020 15:30
Comment thread src/Build.UnitTests/Utilities_Tests.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this at compile time instead

Suggested change
int numHandshakeOptions = (int)Math.Pow(2, Enum.GetNames(HandshakeOptions.None.GetType()).Length - 1);
int numHandshakeOptions = (int)Math.Pow(2, Enum.GetNames(typeof(HandshakeOptions)).Length - 1);

Comment thread src/Build.UnitTests/Utilities_Tests.cs Outdated
Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this with Monday Brain, I think I'm back on "salt only the non-options bits".

I was thinking that it'd be nice to be able to drop particular bits out of the hash computation. But that doesn't make sense in practice, because the environment variable will be inherited to child processes. So suppose we decided to mask TaskHost--we'd need to have one salt to set it up right in the scheduler process, and another to set it up in the TaskHost workers, or we'd get the same kind of cross-node-type communication we're getting with the current buggy code.

Sorry for the waffling here @Forgind.

Comment thread src/Build.UnitTests/Utilities_Tests.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this CommunicationsUtilities.Trace instead of an xunit logging trace? Or for that matter just a failed assertion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch to _output.WriteLine. I don't want a failed assertion because if there are multiple collisions, I want to see them all.

Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Comment thread src/Shared/CommunicationsUtilities.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's effectively truncated to 11 bits because the most significant byte is masked out in GenerateHostHandshakeFromBase(). And I don't think the session ID is necessarily an 8-bit value either.

Considering the above and taking a step back, I'm wondering why we insist on encoding everything in a single long instead of defining a proper back- and forward-compatible structure.

struct Handshake
{
  byte zeroForBackCompat = 0;
  byte lengthForForwardCompat = sizeof(Handshake);
  NodeType nodeType;
  int handshakeSalt;
  int versionHash;
  int sessionId;
  // ...
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does make logging on the rejecting side easier too. Should we push this to unblock VS insertions and other PRs and then switch to that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to merge this PR now and track making further improvements as a lower-pri work item.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even think about dropping the version hash and just sending the 4 parts of AssemblyFileVersion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rainersigwald, @ladipro

I put my first draft here.

There are two parts that I think still need fixing:

  1. fileVersion is currently a string because the four parts were returned together. @rainersigwald, is there a way to get the parts individually and turn them into ints if necessary, is there a premade way to do that, or do I have to roll my own (probably regex) parser?
  2. It's claiming I need to import ValueTuple before I can have internal IEnumerable<int part, int index)> RetrieveHandshakeComponents(). Then it's making me say handshake.RetrieveHandshakeComponents().First().Item1 instead of handshake.RetrieveHandshakeComponents().First().part Remove usages of Tuples #5270 made it look like I could just have the valueTuples, and it would work. What do I need to do differently?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileVersion is currently a string because the four parts were returned together

You should be able to call Assembly.GetName().Version to get to the four version parts but it looks like in the current code we use AssemblyInformationalVersion which could be an arbitrary string and not necessarily just integers e.g. "16.7.0-dev-20324-01+ee1c9fd0c7e0e43174785a4e2ed177f14f5856a0". Is it maybe too restrictive to have the handshake consist only of integers?

For the new handshake code please consider all these scenarios, both from the client and host perspective:

  • The code you are writing trying to connect to a very old MSBuild with an 8-bit handshake.
  • The code you are writing trying to connect to a not-so-old MSBuild with a 64-bit handshake.
  • The code you are writing trying to connect to a newer MSBuild with an updated Handshake structure.

and make sure that the handshake will reliably fail without hanging.

Forgind added 2 commits June 22, 2020 14:17
Convert to unsigned longs after converting to unsigned ints to prevent
overwriting higher order bits after conversion from negative numbers.
@Forgind Forgind force-pushed the fix-semicolon-test branch from f855e19 to 49bf37a Compare June 22, 2020 21:18
@Forgind
Copy link
Copy Markdown
Contributor Author

Forgind commented Jun 23, 2020

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ItemTransformContainingSemicolon_InTaskHost test failure

3 participants