Disable speculative execution in ingest framework / document in input/output format docs

Hadoop speculative execution is on by default for mappers and reducers
(ref: https://hadoop.apache.org/docs/r2.5.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml )

This will, based on certain criteria, start up duplicate instances of mappers/reducers.   Only one "wins" - the rest are terminated - but if there are side effects to their actions (i.e. output format + data schema isn't idempotent) then duplicates occur.

It's also dubious whether we want to potentially generate extra reads/writes through accumulo.

Might warrant more discussion, but I think in the ingest framework we probably want to disable this by default, disable it in the prototype deduplicating mapper/reducers, and document it in the input/output format examples.
(looks like there's a convienence method on the Job class to turn it off - I'm assuming both for mappers and reducers: ref: http://hadoop.apache.org/docs/r2.5.2/api/org/apache/hadoop/mapreduce/Job.html#setSpeculativeExecution(boolean) )


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable speculative execution in ingest framework / document in input/output format docs #170

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Disable speculative execution in ingest framework / document in input/output format docs #170

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions