2. Hadoop MapReduce Basic Tutorial

This extends the Part 1 tutorial 1. Hadoop MapReduce Basic Tutorial. The key difference of this tutorial is using a “TextInputFormat” instead of “KeyValueTextInputFormat“.

TextInputFormat reads

The key as line offset number starting from 0 and the values as “Science, 80, 75, 89, 90” from the Scores.data file.


Science, 80, 75, 89, 90
Maths,  90, 87, 78, 92
English, 78, 88, 65, 99

Science, 80, 75, 89, 90

Maths, 90, 87, 78, 92

English, 78, 88, 65, 99

Mapper input with TextInputFormat.

Hadoop MapReduce Steps

Step 1: The Hadoop based mapper class “ScoreMapper” that can be executed in parallel by multiple nodes. It processe each input line as key/value pairs. E.g 0/Science, 80, 75, 89, 90. It is imperative to note that the key of type “LongWritable” instead of type “Text”.


package com.mytutorial;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class ScoreMapper extends Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value, Context context) throws IOException,
            InterruptedException {

        StringTokenizer st = new StringTokenizer(value.toString(), ",");
        String mappedString = ""; //for the scores
        
        String subject = (String) st.nextElement();
        while (st.hasMoreElements()) {
            String score = (String) st.nextElement();
            mappedString += score.trim() + "_";    
        }
        
        //subject is the new key
        context.write(new Text(subject), new Text(mappedString));
    }

}

package com.mytutorial;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class ScoreMapper extends Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable key, Text value, Context context) throws IOException,

InterruptedException {

StringTokenizer st = new StringTokenizer(value.toString(), ",");

String mappedString = ""; //for the scores

String subject = (String) st.nextElement();

while (st.hasMoreElements()) {

String score = (String) st.nextElement();

mappedString += score.trim() + "_";

}

//subject is the new key

context.write(new Text(subject), new Text(mappedString));

}

Step 2: The Hadoop based reducer class “ScoreReducer” that can be executed in parallel by multiple nodes. It processe each input line as key/value pairs. E.g Science/80_75_89_90_. The output key/value pairs will be E.g Science/max score is: 90. This is same as the Part 1 example.



package com.mytutorial;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class ScoreReducer extends Reducer<Text, Text, Text, Text> {

    @Override
    protected void reduce(Text key, Iterable<Text> values,
            Reducer<Text, Text, Text, Text>.Context context) throws IOException,
            InterruptedException {
        
        String mapString = null;
        String[] split = null;
        long tempValue = 0;
        long maxScore = 0;
        
        for (Text val : values) {
            mapString = val.toString();
            split = mapString.split("_");
            for (int i = 0; i < split.length; i++) {
                tempValue = new Long(split[i].trim()).longValue();
                if(tempValue > maxScore){
                    maxScore = tempValue;
                }
            }
        }
        
        context.write(key, new Text("max score is: " + maxScore));
    }

}

package com.mytutorial;

import java.io.IOException;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class ScoreReducer extends Reducer<Text, Text, Text, Text> {

@Override

protected void reduce(Text key, Iterable<Text> values,

Reducer<Text, Text, Text, Text>.Context context) throws IOException,

InterruptedException {

String mapString = null;

String[] split = null;

long tempValue = 0;

long maxScore = 0;

for (Text val : values) {

mapString = val.toString();

split = mapString.split("_");

for (int i = 0; i < split.length; i++) {

tempValue = new Long(split[i].trim()).longValue();

if(tempValue > maxScore){

maxScore = tempValue;

}

context.write(key, new Text("max score is: " + maxScore));

}

Step 3: Finally the executable main Java class “MaxScoreMain” that ties everything together.



package com.mytutorial;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MaxScoreMain {

    public static void main(String[] args) throws Exception {
        
        Configuration conf = new Configuration();
        Job job = new Job(conf, "maxscoremain");

        job.setJarByClass(MaxScoreMain.class);
        job.setMapperClass(ScoreMapper.class);
        job.setReducerClass(ScoreReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/Users/arulk/projects/scores.data"));
        FileOutputFormat.setOutputPath(job, new Path("/Users/arulk/tempMapreduce"));
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }

}

package com.mytutorial;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MaxScoreMain {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf, "maxscoremain");

job.setJarByClass(MaxScoreMain.class);

job.setMapperClass(ScoreMapper.class);

job.setReducerClass(ScoreReducer.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(

"/Users/arulk/projects/scores.data"));

FileOutputFormat.setOutputPath(job, new Path("/Users/arulk/tempMapreduce"));

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

As you can see the “job.setInputFormatClass” uses “TextInputFormat“. The output results will be exactly same as the part 1.

About
Latest Posts

Arulkumaran Kumaraswamipillai

In 1999, I pivoted from mechanical engineering to become a self-taught Java expert. Over the past 19+ years, I have built a high-demand contracting career, delivering mission-critical solutions for 13+ top-tier organisations. With a proven track record that has generated over 130+ job offers, I help enterprises solve complex data challenges at scale.

Author of the book "Java/J2EE job interview companion", which sold 35,000+ copies on amazon.com & superseded by this site with 2000+ registered users.

Amazon.com profile | LinkedIn | LinkedIn Group | YouTube

Email: java-interview@hotmail.com

Latest posts by Arulkumaran Kumaraswamipillai (see all)

How would you go about.. building an ETL pipeline to handle 3k TPS (i.e transactions per second) - April 17, 2026
00b: Python variable scopes, assigning types optionally and execution order interview Q&As - April 11, 2026
How to prepare for the Java job interviews? - January 3, 2026

Read Amazon reviews ➔ Read Goodreads reviews ➔

The contents in this Java-Success are copyrighted and from EmpoweringTech pty ltd. The EmpoweringTech pty ltd has the right to correct or enhance the current content without any prior notice. These are general advice only, and one needs to take his/her own circumstances into consideration. The EmpoweringTech pty ltd will not be held liable for any damages caused or alleged to be caused either directly or indirectly by these materials and resources. Any trademarked names or labels used in this blog remain the property of their respective trademark owners. Links to external sites do not imply endorsement of the linked-to sites. Privacy Policy

Responsive WordPress Theme powered by CyberChimps

Top

Categories

2. Hadoop MapReduce Basic Tutorial

300+ Java Interview FAQs

300 + Big Data Interview FAQs

16+ Java Tech Key Areas

10+ Companion Techs Q&As

300+ Java Interview Q&As

Tutorials on Java & Big Data

50+ Free Java & Big Data Interview Q&As

Disclaimer