Keep your big data pipelines running predictably and with full visibility.
Automate job scheduling
Get full control over timing, dependencies and retries.
Centralize big data
Manage MapReduce alongside your other workloads.
Accelerate data processing
Automate submission, tracking and exception handling.
MapReduce is the Apache Hadoop framework programming model used to access and process large amounts of data stored in the Hadoop Distributed File System (HDFS). The adapter uses the Hadoop API to submit and monitor MapReduce jobs using Tidal’s full scheduling capabilities. It serves as the job client to automate the execution of MapReduce jobs as part of Tidal-managed processes.
The adapter uses the Apache Hadoop API to submit and monitor MapReduce jobs with full scheduling capabilities and parameter support. As a platform-independent solution, the adapter can run on any platform where the Tidal Master runs.
The client then assumes these responsibilities:
As MapReduce tasks run as pre-scheduled or event-based jobs, you can monitor the jobs as you would any other type of job in Tidal using the Job Details dialog. You can also use Business Views to monitor job activity and view when the jobs are active.
The scheduler provides these job control capabilities for the current process or the entire job:
An adapter job divides the input dataset into independent chunks, processed by the map tasks in parallel. The framework sorts the map’s outputs, which are then sent to the reduce tasks. Input and output are typically stored in HDFS. The framework schedules tasks, monitors them and re-executes failed tasks.
Applications must minimally define input/output locations and supply map and reduce functions via appropriate interfaces or abstract classes. These and other job parameters comprise the job configuration. The Hadoop job client then submits the job (jar/executable, etc.) and configuration to the Job Tracker.
Resources
Set up your business to thrive with Tidal.
Hadoop Distributed File System (HDFS) is the storage layer of Hadoop, while MapReduce is the processing engine. MapReduce uses data stored in HDFS as input for distributed computation across a cluster.
MapReduce is ideal for large-scale, batch-oriented processing jobs like data aggregating, log analysis, indexing or sorting that can be parallelized across many nodes.
MapReduce is used to process vast amounts of structured or unstructured data in parallel, breaking jobs into smaller subtasks (map) and combining the results (reduce) efficiently across a Hadoop cluster.