Skip to content

Disable speculative execution in ingest framework / document in input/output format docs #170

@chrisbennight

Description

@chrisbennight

Hadoop speculative execution is on by default for mappers and reducers
(ref: https://hadoop.apache.org/docs/r2.5.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml )

This will, based on certain criteria, start up duplicate instances of mappers/reducers. Only one "wins" - the rest are terminated - but if there are side effects to their actions (i.e. output format + data schema isn't idempotent) then duplicates occur.

It's also dubious whether we want to potentially generate extra reads/writes through accumulo.

Might warrant more discussion, but I think in the ingest framework we probably want to disable this by default, disable it in the prototype deduplicating mapper/reducers, and document it in the input/output format examples.
(looks like there's a convienence method on the Job class to turn it off - I'm assuming both for mappers and reducers: ref: http://hadoop.apache.org/docs/r2.5.2/api/org/apache/hadoop/mapreduce/Job.html#setSpeculativeExecution(boolean) )

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions