-
Notifications
You must be signed in to change notification settings - Fork 14
Allow generating execution graphs from MPI runs #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test to check that the MPI calls record what we expect them to?
c631709 to
2257f29
Compare
fb8c201 to
ba74dcf
Compare
ba74dcf to
102b79c
Compare
| // Update the result for the master message | ||
| sch.setFunctionResult(msg); | ||
| // Wait for the scheduler to set the result on the MPI non-master messages | ||
| SLEEP_MS(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than sleeping here, can we instead call getFunctionResult for each message in turn? Although some sleeps still exist in the tests, we should try to avoid wherever possible.
tests/utils/message_utils.cpp
Outdated
| REQUIRE(msgA.mpiworldid() == msgB.mpiworldid()); | ||
| REQUIRE(msgA.mpirank() == msgB.mpirank()); | ||
| REQUIRE(msgA.mpiworldsize() == msgB.mpiworldsize()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the need for this method, but is it really just doing the same thing as checkMessageEquality, just not checking the ID? Could we instead add a bool flag to checkMessageEquality, which is something like checkIDs which is true by default, then pass false when we need to do this sort of check?
Then we could change the parameter on checkExecGraphEquality to also be checkIDs with true default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we check only these fields is that these are the ones MPI changes, and thus we manually set them in the test.
Any other field we also wanted to check, we'd have to, afaict, manually add ourselves to each message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we don't touch any of the others, is there any harm in checking them? They will all be uninitialised and therefore pass the checks won't they? We'll just be comparing lots of blank strings/ zeros etc.
I.e. can we avoid having to add a new comparison function to check every permutation of edits we might make to a message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, for instance the executedHost or finishtimestamp fields are set by the scheduler when the chained functions finish (i.e. the world is destroyed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the executed host not be the current host therefore we can check that? We deal with the finish timestamp issue in another tests here by just overriding it: https://github.com/faasm/faabric/blob/master/tests/test/scheduler/test_scheduler.cpp#L663
The reason I'm against having more than one message comparison function is that it's easy to forget to add fields to them, and if we just have one, then it's easier to track.
It's a sort of blacklisting/ whitelisting problem, where rather than saying "we want to check that this subset of fields is the same", we're saying "we only expect these fields to have changed, and all others to be the same". The latter is more strict (and more vebose), but catches problems more frequently as a result.
bb95f65 to
6801be6
Compare
6d87aa3 to
5246a0e
Compare
In this PR I add support to gneerate execution graphs from MPI runs.
Before only chained calls would log subsequent invocations (i.e. calls to
sch.callFunctions) to redis.Logging all invocations to
callFunctionswould prevent having more fixes like this, but I am not sure if we want to do it or not. Plus, we'd have to pass the current message id to the scheduler, but that's a minor issue I guess.I also add a test where we check that the MPI world creation generates the expected execution graph (i.e. root process faning-out to
size - 1processes). Given that we don't have control over the messages that MPI uses to generate that graph, we are just able to imitate it in theexpectedgraph for the test. In particular some fields likemsg.id()will differ from theactualgraph and theexpectedone. Thus we only compare the fields thatMpiWorld::create()sets.