Allow generating execution graphs from MPI runs #154

csegarragonz · 2021-10-13T09:29:08Z

In this PR I add support to gneerate execution graphs from MPI runs.

Before only chained calls would log subsequent invocations (i.e. calls to sch.callFunctions) to redis.

Logging all invocations to callFunctions would prevent having more fixes like this, but I am not sure if we want to do it or not. Plus, we'd have to pass the current message id to the scheduler, but that's a minor issue I guess.

I also add a test where we check that the MPI world creation generates the expected execution graph (i.e. root process faning-out to size - 1 processes). Given that we don't have control over the messages that MPI uses to generate that graph, we are just able to imitate it in the expected graph for the test. In particular some fields like msg.id() will differ from the actual graph and the expected one. Thus we only compare the fields that MpiWorld::create() sets.

Shillaker

Could you add a test to check that the MPI calls record what we expect them to?

Shillaker · 2021-10-15T12:50:00Z

tests/test/scheduler/test_exec_graph.cpp

+    // Update the result for the master message
+    sch.setFunctionResult(msg);
+    // Wait for the scheduler to set the result on the MPI non-master messages
+    SLEEP_MS(500);


Rather than sleeping here, can we instead call getFunctionResult for each message in turn? Although some sleeps still exist in the tests, we should try to avoid wherever possible.

Shillaker · 2021-10-15T12:54:15Z

tests/utils/message_utils.cpp

+    REQUIRE(msgA.mpiworldid() == msgB.mpiworldid());
+    REQUIRE(msgA.mpirank() == msgB.mpirank());
+    REQUIRE(msgA.mpiworldsize() == msgB.mpiworldsize());
+}


I see the need for this method, but is it really just doing the same thing as checkMessageEquality, just not checking the ID? Could we instead add a bool flag to checkMessageEquality, which is something like checkIDs which is true by default, then pass false when we need to do this sort of check?

Then we could change the parameter on checkExecGraphEquality to also be checkIDs with true default.

The reason we check only these fields is that these are the ones MPI changes, and thus we manually set them in the test.

Any other field we also wanted to check, we'd have to, afaict, manually add ourselves to each message.

But if we don't touch any of the others, is there any harm in checking them? They will all be uninitialised and therefore pass the checks won't they? We'll just be comparing lots of blank strings/ zeros etc.

I.e. can we avoid having to add a new comparison function to check every permutation of edits we might make to a message?

Not really, for instance the executedHost or finishtimestamp fields are set by the scheduler when the chained functions finish (i.e. the world is destroyed).

Will the executed host not be the current host therefore we can check that? We deal with the finish timestamp issue in another tests here by just overriding it: https://github.com/faasm/faabric/blob/master/tests/test/scheduler/test_scheduler.cpp#L663

The reason I'm against having more than one message comparison function is that it's easy to forget to add fields to them, and if we just have one, then it's easier to track.

It's a sort of blacklisting/ whitelisting problem, where rather than saying "we want to check that this subset of fields is the same", we're saying "we only expect these fields to have changed, and all others to be the same". The latter is more strict (and more vebose), but catches problems more frequently as a result.

log mpi calls to generate execution graphs

64ed2ff

csegarragonz self-assigned this Oct 13, 2021

csegarragonz requested a review from Shillaker October 13, 2021 10:32

Shillaker requested changes Oct 13, 2021

View reviewed changes

csegarragonz force-pushed the exec-graph branch 4 times, most recently from c631709 to 2257f29 Compare October 15, 2021 08:53

add test for exec graph generation in mpi

797abe4

csegarragonz force-pushed the exec-graph branch 2 times, most recently from fb8c201 to ba74dcf Compare October 15, 2021 09:17

setting result in scheduler, and deeply comparing mpi messages

102b79c

csegarragonz force-pushed the exec-graph branch from ba74dcf to 102b79c Compare October 15, 2021 09:19

csegarragonz requested a review from Shillaker October 15, 2021 09:34

remove extra include

b206a4e

Shillaker requested changes Oct 15, 2021

View reviewed changes

pr comments

6801be6

csegarragonz force-pushed the exec-graph branch from bb95f65 to 6801be6 Compare October 15, 2021 15:31

manually blacklist/whitelist message fields

5246a0e

csegarragonz force-pushed the exec-graph branch from 6d87aa3 to 5246a0e Compare October 15, 2021 16:27

csegarragonz requested a review from Shillaker October 15, 2021 16:39

Shillaker approved these changes Oct 15, 2021

View reviewed changes

csegarragonz merged commit ad8fcd3 into master Oct 15, 2021

csegarragonz deleted the exec-graph branch October 15, 2021 16:42

csegarragonz mentioned this pull request Feb 23, 2022

Add task to generate release body #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow generating execution graphs from MPI runs #154

Allow generating execution graphs from MPI runs #154

Uh oh!

csegarragonz commented Oct 13, 2021 •

edited

Loading

Uh oh!

Shillaker left a comment •

edited

Loading

Uh oh!

Shillaker Oct 15, 2021

Uh oh!

Shillaker Oct 15, 2021

Uh oh!

csegarragonz Oct 15, 2021

Uh oh!

Shillaker Oct 15, 2021 •

edited

Loading

Uh oh!

csegarragonz Oct 15, 2021

Uh oh!

Shillaker Oct 15, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Allow generating execution graphs from MPI runs #154

Allow generating execution graphs from MPI runs #154

Uh oh!

Conversation

csegarragonz commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shillaker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shillaker Oct 15, 2021

Choose a reason for hiding this comment

Uh oh!

Shillaker Oct 15, 2021

Choose a reason for hiding this comment

Uh oh!

csegarragonz Oct 15, 2021

Choose a reason for hiding this comment

Uh oh!

Shillaker Oct 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csegarragonz Oct 15, 2021

Choose a reason for hiding this comment

Uh oh!

Shillaker Oct 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csegarragonz commented Oct 13, 2021 •

edited

Loading

Shillaker left a comment •

edited

Loading

Shillaker Oct 15, 2021 •

edited

Loading

Shillaker Oct 15, 2021 •

edited

Loading