Apache Groovy Closures: Beyond the basics
We’ve explored their power, now get ready for a deep dive into using closures with Groovy’s rich ecosystem of libraries. We’ll also unlock their potential for writing cleaner, more concise code in your projects. (If you haven’t installed Groovy yet, please read the intro to this series.)
Let’s start with the kind of dataset that all Groovy script writers run into from time to time — the delimited text file. I’m going to use the thermometry dataset generously provided by OpenIntro.
It looks like this:
body.temp,gender,heart.rate
96.3,male,70
96.7,male,71
96.9,male,74
97,male,80
97.1,male,73
97.1,male,75
97.1,male,82
97.2,male,64
97.3,male,69
...
The first line provides the column names, with the data corresponding in the subsequent rows. Fields in the row are delimited by commas.
Now write a simple Groovy script to calculate the mean and standard deviation of temperature for males and females:
1 if (args.length != 1) {
2 System.err.println "Usage: groovy Groovy15a.groovy input- file"
3 System.exit(0)
4 }
5 // the data:
6 // body.temp,gender,heart.rate
7 // 96.3,male,70
8 // 96.7,male,71
9 // ...
10 def count = [male: 0, female: 0]
11 def tempTotal = [male: 0d, female: 0d]
12 def tempSumSq = [male: 0d, female: 0d]
13 new File(args[0]).withReader { reader ->
14 def fieldNames = reader.
15 readLine().
16 split(',') as ArrayList<String>
17 def fieldsByName = fieldNames.
18 withIndex().
19 collectEntries { name, index ->
20 [(name): index]
21 }
22 reader.splitEachLine(',') { fieldList ->
23 double t = fieldList[fieldsByName['body.temp']] as Double
24 String g = fieldList[fieldsByName['gender']]
25 count[g]++
26 tempTotal[g] += t
27 tempSumSq[g] += (t * t)
28 }
29 }
30 def tempMean = tempTotal.collectEntries { k, v ->
31 int n = count[k]
32 [(k): v / n]
33 }
34 def tempStdDev = tempSumSq.collectEntries { k, v ->
35 int n = count[k]
36 double t = tempMean[k]
37 double t2 = t * t
38 [(k): Math.sqrt((v - n * t2) / (n - 1))]
39 }
40 println "tempMean $tempMean"
41 println "tempStdDev $tempStdDev"
When you run this you’ ll see:
$ groovy Groovy15a.groovy
Groovy15_thermometry.csv
tempMean [male:98.1046153846154,
female:98.39384615384616]
tempStdDev [male:0.6987557623247544,
female:0.7434877527343278]
$
Let’s take a closer look at the code above.
Lines one through nine check the usage and give you a sample of the data to work with below, as comments.
Lines 10-12 define the accumulators – count, to record the number of occurrences of temperature by gender. You can usetempTotal to record the sum of all the temperature observations by gender. Use tempSumSqto record the sum of squares of all the temperature observations by gender. Here you use maps to collect the gender values.
Line 13 opens a Reader instance on the file named on the command line using the withReader method. This takes a Closure instance, passing it to the reader opened. The withReader method is an example of the Groovy design pattern “loan my resource”. It handles the overhead of opening and closing the reader so that the programmer can focus on what’s being done with the reader.
Lines 14-16 read the first line of the file and split it into the field names.
Lines 17-21 create a map whose keys are the field names and whose value are the field indexes. I generally do this so I’m able to refer to the fields by name as opposed to by index. This seems to make my code less prone to bugs.
Line 22 iterates over the remaining lines in the file, splitting them into fields, and takes a Closure instance, passing it the list of fields.
Lines 23-24 get the temperature as a double and the gender as a String.
Lines 25-27 accumulate the count, tempTotal and tempSumSq by gender.
Lines 28-29 end the line processing Closure and the reader processing Closure respectively.
Lines 30-33 process the tempTotal map, dividing the totals by the count to get the mean, by gender.
Lines 34-39 process the tempSumSq map, calculating the standard deviation by gender.
Lines 40-41 print the final results.
The code in lines 10-25 isn’t the most elegant Groovy code. There’s a decent argument to be made to use a more functional approach by applying Groovy’s inject() on the collection of values resulting from reading the lines.
There are probably a lot of scripts you could write that need to take a delimited text file, do stuff with the fields on its rows and return a result. If you create a method that applies the “loan my resource pattern” to handle all the overhead. This leaves it up to a closure to carry out the specific “stuff” necessary in each row.
Take a look:
1 if (args.length != 1) {
2 System.err.println "Usage: groovy Groovy15a.groovy input-file"
3 System.exit(0)
4 }
4 // the data:
6 // body.temp,gender,heart.rate
7 // 96.3,male,70
8 // 96.7,male,71
9 // ...
10 def reduce(fileName, delimiter, result, worker) {
11 new File(fileName).withReader { reader ->
12 def fieldNames = reader.
13 readLine().
14 split(delimiter) as ArrayList<String>
15 reader.splitEachLine(',') { fieldList ->
16 def fieldValues = fieldNames.
17 withIndex().
18 collectEntries { name, index ->
19 [(name): fieldList[index]]
20}
21 result = worker(result, fieldValues)
22 }
23 }
24 result
25 }
26 def intermediate = reduce(args[0], ',', [
27 count: [male: 0, female: 0],
28 tempTotal: [male: 0d, female:0d],
29 tempSumSq: [male:0d, female: 0d]
30 ]) { result, fieldValues ->
31 double t = fieldValues['body.temp'] as Double
32 String g = fieldValues['gender']
33 if (g == 'male')
34 [
35 count: [male: result.count.male + 1,
36 female: result.count.female],
37 tempTotal: [male: result.tempTotal.male + t,
38 female: result.tempTotal.female],
39 tempSumSq: [male: result.tempSumSq.male + t*t,
40 female: result.tempSumSq.female]
41 ]
42 else
43 [
44 count: [male: result.count.male,
45 female: result.count.female + 1],
46 tempTotal: [male: result.tempTotal.male,
47 female: result.tempTotal.female + t],
48 tempSumSq: [male: result.tempSumSq.male,
49 female: result.tempSumSq.female + t*t]
50 ]
51 }
52 def tempMean = intermediate.tempTotal.collectEntries { k, v ->
53 int n = intermediate.count[k]
54 [(k): v / n]
55 }
56 def tempStdDev = intermediate.tempSumSq.collectEntries { k, v ->
57 int n = intermediate.count[k]
58 double m = tempMean[k]
59 double m2 = m * m
60 [(k): Math.sqrt((v - n * m2) / (n - 1))]
61 }
62 println "tempMean $tempMean"
63 println "tempStdDev $tempStdDev"
When you run this, you see:
$ groovy Groovy15b.groovy
Groovy15_thermometry.csv
tempMean [male:98.1046153846154,
female:98.39384615384616]
tempStdDev [male:0.6987557623247544,
female:0.7434877527343278]
$
Let’s review the code.
Lines one to nine didn’t change, it just checks the usage and provides a reminder of the structure of the data.
Lines 10-25 are new. Here you’re defining a method reduce() that you’re going to apply to the input to produce the summary data.
Line 10 shows that reduce() has four parameters: the fileName, the delimiter character or regular expression, the result accumulator, which initially is expected to be set to “zeros”, and the worker Closure instance.
Line 11 opens a Reader instance on the file named by fileName and passes it to a Closure instance that handles the reader.
Lines 12-14 read the first line of the file and split it into the field names. Note that in this version you don’t bother with building a map of field name to field index. Instead, you copy the field values into a map (see below) on each line processed.
Line 15 iterates over the remaining lines in the file, splitting them into fields and takes a Closure instance, passing it the list of field values.
Lines 16-20 copy the field values into a map whose keys are the field names and whose values are the field values.
Line 21 calls the worker Closure instance, passing it the result so far and the field value map, and expects the updated result, having processed the line, to be returned.
Line 22 ends the line processing closure started in line 15.
Line 23 ends the reader processing closure started in line 11.
Line 24 returns the result of the processing.
Line 25 ends the reduce() method.
Lines 26-61 use the reduce() method to calculate the intermediate results. In detail:
Lines 26-30 call the reduce() method with the file name as the first argument to the script, comma as the delimiter, a map of maps that is the result initialized to zeros, and a Closure instance to process each line. This receives the result so far. It also receives the map of fieldValues.
Lines 31-32 get the temperature as a double and the gender as a String.
Lines 33-50 update either the male or the female intermediate results for count, tempTotal, and tempSumSq according to the gender of the data in the line.
Line 51 ends the Closure instance that processes each line.
Lines 52-55 calculate the mean male and female temperatures by dividing the corresponding totals by the number of observations. This is done in a manner similar to the first example.
Lines 56-61 calculate the standard deviation of the male and female temperatures. This is done in a manner similar to the first example.
Lines 62-63 print the results.
Hopefully, you understood the administrative overhead of the file and created the file structure the way that I demonstrated. If you did that then you might be ready for the next step which is to create a new class. You need a new class to contain the reduce() method code. You can refer to that in your scripts.
Those who find the map-of-maps result structure awkward could also create a version that behaves more like the first example. The results are declared and initialized and the body of the Closure instance refers to those.
This is a good point to suggest a diversion to the reference descriptions for Collection, List, Map, and Interface, conveniently found together in the Groovy GDK documentation. You should also read of the Groovy documentation on Closure.
Conclusion
Groovy closures predate Java lambdas and offer a few interesting advantages — they aren’t restricted to referring to effectively final variables in the containing scope. They are (from my perspective at least) easier to learn since they don’t depend on a whole new and complicated set of classes involving streams, functions and the like.
In my own experience, I find that I use closures as anonymous functions almost every time I write something in Groovy. Much less commonly do I create code like the reduce() method above that calls a closure to handle its processing. Nevertheless, knowing how to do so means having some great reusable scripting tools in the box.
Stay tuned for the next tutorial where you’ll learn how to use sort and spaceship operators in Groovy.
