Skip to content

Potential scheduler bug? nextMsgIdx not updated when scheduling to other hosts #65

@eigenraven

Description

@eigenraven

I'm adapting my compute/storage division-aware scheduler patches for faasm-ndp to the new code, and I noticed that in the blocks of code responsible for scheduling off to other nodes when the current one is busy, the nextMsgIndex used for calculations later is not updated - it was updated in the for loop stepping above, maybe it was missed in a copy-paste?

https://github.com/faasm/faabric/blob/master/src/scheduler/Scheduler.cpp#L287-L391

for (; nextMsgIdx < nLocally; nextMsgIdx++) { // for loop - nextMsgIdx updated
...
remainder -= nOnThisHost; // but nextMsgIdx not updated

From a code review standpoint it seems that there are multiple variables manually tracked: nextMsgIdx, remainder (which should just always be total minus nextMsgIdx), and the actual messages vector. The vector is not mutated, so that's not a problem, but I think replacing either nextMsgIdx or remainder with a formula based on the other one (you could put it in a capture-by-reference local lambda to avoid duplication) would eliminate this bug and also prevent making similar mistakes. Also marking locals like nMessages, isThreads, cores as const would make it clear these are not changing in the function.

const auto remainder = [&](){ return nMessages - nextMsgIdx; };
if (remainder() > 0) {...}

or

if (nextMsgIdx < nMessages) {...} // without remainder

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions