Feeds:
Posts
Comments

Archive for the ‘Java’ Category

Hi everybody, JavaOne 2015 is already underway. For some reason my talks are all concentrated toward the end of the conference this year; in fact, three are on Wednesday! Here’s my talk schedule:

♦ API Design with Java 8 Lambdas and Streams [CON6851]
(with Brian Goetz)
Wed 2015-10-28 – 8:30am
Hilton Continental Ballroom 5
Slides: CON6851-API-Design-v2 (PDF)
Video: https://youtu.be/o10ETyiNIsM?t=24m (61 minutes)

♦ New Tricks for Old Dogs: Collections Enhancements in Java 8 [CON7432]
(with Mike Duigou)
Wed 2015-10-28 – 11:30am
Hilton Continental Ballroom 1/2/3
Slides: CON7432-Marks-CollectionsNewTricks-v3 (PDF)
See also JEP 269, “Convenience Factory Methods for Collections” (JDK 9 work-in-progress)

♦ Saving the Future from the Past: Innovations in Deprecation [CON6856]
(presented by Dr Deprecator)
Wed 2015-10-28 – 3:00pm
Hilton Continental Ballroom 5
Slides: CON6856-Marks-Deprecation-v3 (PDF)
Video: https://youtu.be/o10ETyiNIsM?t=6h54m41s (61 minutes)
News flash! JEP 277 “Enhanced Deprecation” has been posted.

♦ 20 Years of APIs: A Retrospective [CON6891]
Thu 2015-10-29 – 9:00am
Hilton Continental Ballroom 5
Slides: CON6891-Marks-API-Retrospective-v2 (PDF)
Video: https://youtu.be/0KlJSNb8GZU?t=26m25s (61 minutes)

Sorry, there’s no lambda tutorial (“Jump-Starting Lambda”) this year, nor is there a Lambda Hands on Lab. This is most unfortunate. I was planning to work with Simon Ritter (Twitter: @speakjava) on those sessions this year, with Simon taking the lead. Unfortunately, Simon was laid off from Oracle just a few weeks ago, leaving no time to rearrange the program or to find someone else to work on them. There are a number of Lambda and Streams talks that I can recommend, however:

♦ Programming with Lambdas [CON8366]
Venkat Subramaniam
Mon 2015-10-26 – 4:00pm
Hilton Continental Ballroom 5
Video: https://youtu.be/8RhwmJlZQgs?t=7h54m50s

♦ Journey’s End: Collection and Reduction in the Stream API [TUT5906]
Maurice Naftalin
Tue 2015-10-27 – 8:30am
Hilton Continental Ballroom 4

♦ Streams: the Real Powerhouse in Java 8 [CON8367]
Venkat Subramaniam
Tue 2015-10-27 – 11:00am
Hilton Continental Ballroom 4

♦ Effective Java Streams [CON7066]
Paul Sandoz
Tue 2015-10-27 – 2:30pm
Hilton Continental Ballroom 5
Video: https://youtu.be/iHHSa39p48I?t=6h15m55s

♦ Shooting the Rapids: Maximizing Performance of Java 8 Streams [CON5931]
Maurice Naftalin & Kirk Pepperdine
Wed 2015-10-28 – 3:00pm
Hilton Continental Ballroom 4

Enjoy the conference!

Read Full Post »

Since this year’s Java Day Tokyo 2015 is about to happen, I figure I should post my article about last year’s event. Unfortunately I won’t be able to attend this year. But last year I traveled to Japan for Java Day Tokyo 2014 and for a Japan Java User Group event. The trip was  packed with events. I brought my family along with me, and fortunately we did have a couple days to travel around Tokyo to relax and do some sightseeing.

JJUG CCC 2014 Spring, 18 May

The first event was the JJUG CCC Spring 2014 (Japan Java Users Group, Cross-Community Conference). This is a twice-per-year gathering of several JUGs from around Japan where they stage a full-day conference. It turned out that I was one of the keynote speakers! I was told there were over 300 people attending, making it one of the biggest JJUG events ever. Wow, I’m honored.

My presentation was Overview of Java 8 Lambda and Streams, which covered not only those topics but also default methods and method references. That’s a lot to cover, and I couldn’t go very fast because I had to pause after every sentence for consecutive translation. Still, people said they enjoyed the presentation and that they found it helpful.

Here are some pictures Yuichi Sakuraba took at the event. (He seems to be the designated conference photographer in Japan, when he’s not busy taking pictures of food.)

14250622516_2c7ed7d9d9_z
(photo: Yuichi Sakuraba, 2014-05-18, CC BY-NC 2.0, original on Flickr)

14271641832_8556fe4c5d_z
(photo: Yuichi Sakuraba, 2014-05-18, CC BY-NC 2.0, original on Flickr)

Yuichi has posted a Flickr photo set of the entire event, including a few more of me.

Java Day Tokyo, 22 May 2014

This was the main event. It was jam packed with sessions, including a set of keynotes in the morning, and five tracks in parallel in the afternoon. Here’s the agenda, and here are slides and videos from the subset of sessions that were recorded. I had two sessions in the afternoon: the first on Java 8 Lambdas,  and the second on Java 8’s new Streams API. Here are some pictures I took during the keynotes.

Nandini Ramani (former VP, Oracle Java Platform Group) and Shin Ishiguro (NEC) showing off the NEC PaPeRo robot featuring Embedded Java SE:

 

DSC_1497

DSC_1591

Stephen Chin and Cameron Purdy demonstrating the Lego Duke balancing on two wheels:

DSC_1501

DSC_1514

DSC_1536

That evening after a full day of sessions, there was a two hour “Ask the Experts” panel and I was on the panel. David Buck (Oracle JVM Sustaining) was pressed into service doing consecutive translation in both directions between the audience and the panelists. I think he did quite well considering he’s not a professional translator.

Not surprisingly (as Java 8 had just been released) most of the questions were about Lambdas and Streams. There were some pretty good questions. One question asked about some details of how lambdas are implemented. I replied that I’d try to be brief and hold my remarks to under half an hour. That got a laugh out of the audience (a Japanese audience — a first for me!). David did pretty well translating my answer, until I got to the part about the “lambda metafactory.” I’m not the real expert at this, though. Brian Goetz is, and he’s given a talk called Lambda: A Peek Under The Hood that explains the lambda implementation in great detail.

The following day, (not officially part of the conference) we had a hands-on lab in the Oracle offices where we let participants try their hand at a set of exercises that can be solved using Java 8 Lambdas and Streams.  This is similar to labs we’ve had at JavaOne and Devoxx and other conferences:

DSC_1615
Like most labs, after a brief introduction, most of the participants went heads-down and worked steadily on the problems. They must have been pretty good problems, since most people were still working on them when we ran out of time!


I’m sad to be missing this year’s Japan event. Make sure you go if you get a chance. It looks like it’ll be as good if not better than last year’s!

Read Full Post »

That was what I tweeted the other day. Of course, it’s an homage to this famous xkcd cartoon. The tweet was a link to this Stack Overflow question and my answer. This article provides a bit more background and describes a little adventure I had computing the probability I gave in the answer.

The background is that the original poster (OP) of the question was generating a million data objects and was storing them in a TreeSet. But the TreeSet ended up with only 975,000 elements in it. The obvious reason that there would be fewer elements in a set than were added is that some of the data objects are duplicates. Somebody asked about this, and the OP said the chance of generating duplicate data objects was “minuscule.” To somebody like me, this is like waving a red flag in front of a bull, so I had to go investigate. (You know, Duty Calls.)

I estimated that there was a possible space of 18,000,000 data objects, and the OP was generating 1,000,000 of them at random. What’s the possibility of there being at least one pair of duplicates among the generated objects? This is a variation of the Birthday Problem, also known as the Birthday Paradox. It’s not really a paradox. It is, however, quite counterintuitive how quickly the probability approaches certainty that there will be a duplicate as the number of trials increases.

Briefly, the birthday problem is, given a certain number of people, what’s the probability that two will have the same birthday? The probability reaches 50% at only 23 people, and at 70 people it has risen above 99.9%. Most people find this pretty surprising. Certainly the OP did; given 1,000,000 generated objects out of a space of 18,000,000, the probability is not minuscule at all, but is in fact astonishingly close to 100%.

It’s actually a bit easier to talk about the probability of there not being a duplicate, that is, the probability of the choices all being unique, so I’ll talk about that. (Of course, the probability of the choices being unique is simply 1.0 minus the probability of a collision.) The Wikipedia article gives a couple formulas for computing this probability. One is an approximation:

\displaystyle \left(\frac{d - 1}{d}\right)^{n(n-1)/2}

The second is a product involving a large number of factors:

\displaystyle \prod\limits_{k=1}^{n-1}(1 - \textstyle\frac{k}{d})

In both formulas, d is the number of possible values in the domain, and n is the number of elements chosen. Naturally, the closed form approximation involves many fewer computations, so let’s start with that one.

What should we use to do the computation? Well I’m an old Unix hack, so I immediately reached for the venerable bc program. First let’s try some of the cases from the original birthday problem to see if we have this right (bold italic text is the program’s output):

$ bc -l
(364/365)^(23*22/2)
.49952284596341798480
(364/365)^(70*69/2)
.00132609259546606814

These are only approximations, but they seem about right. Let’s try the exact computations:

p=1.0
for (k=1; k<23; k++) p *= (1 - k/365)
p
.49270276567601459277
p=1.0
for (k=1; k<70; k++) p *= (1 - k/365)
p
.00084042403484290862

The result for 23 people matches the figure given in the Wikipedia article (at least, to six decimal places) so it seems accurate. Great! Now let’s try the real problem.

d=18000000
n=1000000
((d-1)/d)^(n*(n-1)/2)
Runtime error (func=(main), adr=19): exponent too large in raise

Hm, that didn’t work. If the power operator isn’t working, let’s try the old trick of taking the logarithm, multiplying, and then exponentiating:

e(l((d-1)/d)*n*(n-1)/2)

I let this run at 100% CPU for five minutes and I didn’t get any output. I don’t know whether it was an infinite loop or what, but it certainly didn’t seem promising. All right then, let’s just try the exact computation:

p=1.0
for (k=1; k<n; k++) p *= (d-k)/d
p
0

Zero. Crap, underflow. The probabilities get pretty small, so I guess I shouldn’t be surprised. Let’s try Java instead.

static void doubleApprox() {
    double d = 18_000_000.0;
    double n =  1_000_000.0;
    System.out.println(Math.pow((d-1.0)/d, n*(n-1.0)/2.0));
}
0.0

Underflow again. At least it ran quickly instead of looping infinitely. Let’s try the exact computation:

static void doubleProduct() {
    int d = 18_000_000;
    int n =  1_000_000;
    double p = 1.0;
    for (int k = 1; k < n; k++) {
        p *= (double)(d - k) / (double)d;
    }
    System.out.println(p);
}
4.4E-323

Aha! Now we’re getting somewhere. I put this into the initial version of my answer and declared it done.

But there were a couple suspicious things nagging me about this result. First, the exponent of -323 seemed awfully familiar. Second, there are only two digits of precision. Usually a floating point double gives about 17 digits. It turns out that this result is very close to Double.MIN_VALUE, which is about 4.9E-324. When the numbers are this small, they are denormalized. As they get smaller, they have fewer and fewer digits of precision. With such a huge loss of precision, continued multiplication by a fraction such as (d – k) / d becomes highly inaccurate.

It turns out that this result of 4.4E-323 is incredibly inaccurate. (In fact, as we’ll see later, it’s off by ten thousand of orders of magnitude.) In order to combat the underflow problem, I put a little hack into the loop to scale up the partial product by 10 until it was above 1.0. That should keep the values well within range, so we avoid precision loss. Of course, I kept track of the number of times I scaled by 10. It’s negative because scaling up by 10 means a negative exponent. (I have no idea whether this is acceptable numeric computation practice, but it seemed to work out in the end.) Here’s the code to do that, and the result.

static void doubleScaled() {
    int d = 18_000_000;
    int n =  1_000_000;
    int scale = 0;
    double p = 1.0;
    for (int k = 1; k < n; k++) {
        p *= (double)(d - k) / (double)d;
        while (p < 1.0) {
            p *= 10.0;
            scale--;
        }
    }
    System.out.printf("%11.9fE%d%n", p, scale);
}
2.843374644E-12294

Ten to the minus twelve thousandth? Nah, that can’t be right. Can it?

I wasn’t sure how to verify this, so I talked to my friend and colleague Joe Darcy (blog, twitter). He suggested I use the floating-point mode of Java’s BigDecimal. Floating point? I thought BigDecimal only supported fixed-point arithmetic. In fact, there are variations of the BigDecimal operations that take a MathContext object, and if you set it up properly, it will perform floating point decimal arithmetic. Cool! Joe also mentioned that when used in this mode, BigDecimal stores its exponent as an int, so this should help avoid underflow.

Let’s try out the approximation first:

static void bdApprox() {
    int id = 18_000_000;
    int n  =  1_000_000;
    MathContext mc = new MathContext(10, RoundingMode.HALF_EVEN);
    BigDecimal d = new BigDecimal(id, mc);
    BigDecimal base = d.subtract(BigDecimal.ONE, mc).divide(d, mc);
    BigDecimal result = base.pow(n * (n - 1) / 2, mc);
    System.out.println(result);
}
622319181.9

WAT. This is totally wrong. Has Joe led me astray? Well, no. It turns out that BigDecimal.pow() takes an int argument as the exponent, and given n = 1,000,000 this clearly overflows an int. Whoops. All right then, let’s just go straight to the exact product computation:

static void bdExact() {
    int id = 18_000_000;
    int n  =  1_000_000;
    MathContext mc = new MathContext(20, RoundingMode.HALF_EVEN);
    BigDecimal prob = new BigDecimal(1, mc);
    BigDecimal d = new BigDecimal(id, mc);

    for (int k = 1; k < n; k++) {
        BigDecimal num = new BigDecimal(id - k, mc);
        prob = prob.multiply(num, mc)
                   .divide(d, mc);
    }

    System.out.println(prob);
}
2.8433746444606670057E-12294

Whoa. Look at that: the same answer, to as many significant digits as I printed out from the scaled double precision computation.

That’s a pretty amazing number. The probability of choosing 1,000,000 unique, random values from a space of 18,000,000 is ten to the minus fricken’ twelve thousand. That’s what I call minuscule. And it totally explains why the Stack Overflow poster was getting duplicates.

Math. It works, bitches.

And BigDecimal too.

Read Full Post »

The distinct() stream operation compares the stream’s elements using Object.equals(). That is, for any set of stream elements that are all equals() to each other, the distinct() operation will let just one of them through. However, sometimes you want the notion of “distinct” to be based on some property or other value derived from the stream element, but not the value itself. You could use map() to map the stream element into some derived value and use distinct() on those, but the result would be a stream of those derived values, not the original stream element.

It would be nice if there were some construct like

distinct(Function<T,U> keyExtractor)

that would call keyExtractor to derive the values that are compared for uniqueness, but there isn’t. However, it’s not too difficult to write your own.

The first insight is that you can think of the distinct() operation as a ​stateful filter​. It’s like a filter() operation, which takes a predicate that determines whether to let the element though. It’s ​stateful​ because whether it lets an element through is determined by what elements it has seen previously.

This state needs to be maintained somewhere. Internally, the distinct() operation keeps a Set that contains elements that have been seen previously, but it’s buried inside the operation and we can’t get to it from application code. But we could write something similar ourselves. The usual way to maintain state in Java is to create a class that has fields in which the state is maintained. We need a predicate, and that predicate could be a method on that class. This will work, but it’s rather cumbersome.

The second insight is that lambdas can capture local variables from their enclosing lexical environment. These local variables cannot be mutated, but if they are references to mutable objects, ​those objects​ can be mutated. Thus we can write a higher-order function whose local variables contain references to the state objects, and we can have our higher-order function return a lambda that captures those locals and does its processing based on the captured, mutable state.

This function will want to take a keyExtractor function that’s used to derive a value from each stream element. Conceptually we’ll want a Set to keep track of values we’ve seen already. However, in case our stream is run in parallel, we’ll want some thread-safe data structure. A ConcurrentHashMap is a simple way to do this, with each existing key representing membership in the set, and the value being a dummy object such as the empty string. (That’s how many Set implementations in the JDK work already.) Ideally we’d want to use an existing object as the dummy value and not create one each time. The empty string literal is used many times in the core JDK classes, so it’s certainly already in the constant pool.

Here’s what the code looks like:

public static <T> Predicate<T> distinctByKey(
        Function<? super T,Object> keyExtractor) {
    Map<Object,String> seen = new ConcurrentHashMap<>();
    return t -> seen.put(keyExtractor.apply(t), "") == null;
}

This is a bit subtle. This is intended to be used within a filter() operation, so we’re returning a lambda that’s a predicate that computes a boolean based on whether it’s seen the value before. This value is derived from the stream element by calling the key extractor function. The put() method returns the previous value in the map, or null if there was no value. That’s the case we’re interested in, so if it returns null, we want the predicate to return true just for this first time. Subsequent times it will return non-null, so we return false those times, so the filter operation won’t pass through the element in those cases. I had used putIfAbsent() at first, since it has first-time-only semantics, but it turns out to be unnecessary, and using put() makes the code a bit shorter.

Here’s how it’s used. Suppose we have a Book class that has fields for title and author, and the obvious constructor and getters, and we have a list of books that we want to process:

List<Book> list = Arrays.asList(
    new Book("This Side of Paradise", "F. Scott Fitzgerald"),
    new Book("The Beautiful and Damned", "F. Scott Fitzgerald"),
    new Book("The Great Gatsby", "F. Scott Fitzgerald"),
    new Book("Tender is the Night", "F. Scott Fitzgerald"),
    new Book("The Sound and the Fury", "William Faulkner"),
    new Book("Absalom, Absalom!", "William Faulkner"),
    new Book("Intruder in the Dust", "William Faulkner"),
    new Book("The Sun Also Rises", "Ernest Hemingway"),
    new Book("A Farewell to Arms", "Ernest Hemingway"),
    new Book("The Old Man and the Sea", "Ernest Hemingway"),
    new Book("For Whom the Bell Tolls", "Ernest Hemingway"),
    new Book("A Moveable Feast", "Ernest Hemingway")
);

If we wanted one book from each author, we could do this:

list.stream()
    .filter(distinctByKey(Book::getAuthor))
    .forEach(System.out::println);

The output from running this is:

Book[This Side of Paradise,F. Scott Fitzgerald]
Book[The Sound and the Fury,William Faulkner]
Book[The Sun Also Rises,Ernest Hemingway]

Since this is a sequential stream, the first book by each author is the one that ends up in the output. If we were to run this stream in parallel, it would still work “correctly” in that one book from each author would be output. However, which book is output would differ from run to run.

It takes a bit of tinkering to use higher-order functions to write stateful stream operations. You can see the evolution of my thinking by looking at my answer to a Stackoverflow question on this topic. I started out writing a class, but after chipping away at it a bit I realized a class was no longer necessary and a higher-order function could be used instead. This is a powerful technique that can be used to write all kinds of stateful stream operations. You just have to make sure to be careful they’re thread-safe.

Read Full Post »

JavaOne 2014 Schedule

I have a jam-packed schedule for JavaOne 2014. My sessions are as follows:


TUT3371 Jump-Starting Lambda (Tue 30 Sep 0830 Hilton Yosemite B/C)

This is my gentle introduction to lambda tutorial. Download presentation (PDF).

CON3374 Lambda Q&A Panel (Tue 30 Sep 1230 Hilton Yosemite B/C)

This panel session will explore the impact of Java 8 Lambdas on the Java ecosystem.

BOF6244 You’ve Got Your Streams On My Collections! (Tue 30 Sep 1900 Hilton Yosemite A)

Community discussion of collections and the new streams APIs.

IGN12431 Ignite Session (Tue 30 Sep 1900 Hilton Imperial A)

This is at the same time as the BOF, so my session will be later on, perhaps 2000 or so. Make sure to come; I have a little surprise planned!

CON3372 Parallel Streams Workshop (Wed 1 Oct 1000 Hilton Yosemite A)

Writing parallel streams code can be easy and effective, but you have to avoid some pitfalls. Download Presentation (PDF)

CON6377 Debt and Deprecation (Wed 1 Oct 1500 Hilton Yosemite A)

Given by my alter ego, Dr. Deprecator, this talk explores the principles and prescriptions of deprecation. Download Presentation (PDF)

HOL3373 Lambda Programming Laboratory (Thu 2 Oct 1200-1400 Hilton Franciscan A/B)

This is your chance to try out some lambda expressions and stream APIs introduced in Java 8, in order to solve a couple dozen challenging exercises. View Introductory Slide Presentation. Download Exercises (NetBeans Project).


See you in San Francisco!

Read Full Post »

Here’s the slide presentation I gave at the Silicon Valley JavaFX Users Group this evening, June 4th 2014:

Java 8 Lambda and Streams Overview Slides (PDF)

This is an approximately one-hour talk that covers lambdas, default methods, method references, and the streams API. It necessarily leaves out a lot of details, but it serves as a reasonably brief overview to the new features we’ve added in Java 8. (This is the same presentation I gave at the Japan JUG  a couple weeks ago.)

The Meetup page for the SVJUGFX event is here:

Meetup Page

The video replay isn’t available as of this writing, but I imagine that a link will be placed there once the replay is available.

Finally, the source code for the “Bug Dates” charting application I showed in the talk is here:

Bug Dates demo (github)

Read Full Post »

Another Shell Test Pitfall

We discovered another bug in a shell script test the other day. It’s incredibly simple and stupid but it’s been covering up errors for years. As usual, there’s a story to be told.

About a month ago, my colleague Tristan Yan fixed some tests in RMI to remove the use of fixed ports. The bug for this is JDK-7190106 and this is the changeset. Using a fixed port causes intermittent failures when there happens to be something else already running that’s using the same port. Over the past year or so we’ve been changing the RMI tests to use system-assigned ports in order to avoid such collisions. Tristan’s fix was another in the step of ridding the RMI test suite of its use of fixed ports. Using system-assigned ports requires using a Java-based test library, and these tests were shell scripts, so part of Tristan’s fix was also converting them to Java.

The converted tests worked mostly fine, except that over the following few weeks an occasional test failure would occur. The serialization benchmark, newly rewritten into Java, would occasionally fail with a StackOverflowError. Clearly, something must have changed, since we had never seen this error occur with the shell script version of the serialization benchmark. Could the conversion from shell script to Java have changed something that caused excessive stack usage?

It turns out that the serialization benchmark does use a larger-than-ordinary amount of stack space. One of the tests loads a class that is fifty levels deep in the class inheritance hierarchy. This wasn’t a case of infinite recursion. Class loading uses a lot of stack space, and this test sometimes caused a StackOverflowError. Why didn’t the stack overflow occur with the shell script test?

The answer is … it did! It turns out that the shell script version of the serialization test was throwing StackOverflowError occasionally all along, but never reported failure. It’s pretty easy to force the test to overflow the stack by specifying a very small stack size (e.g., -Xss200k). Even when it threw a StackOverflowError, the test would still indicate that it passed. Why did this happen?

After the test preamble, last line of the shell script invoked the JVM to run the serial benchmark like this:

$TESTJAVA/bin/java \
    ${TESTVMOPTS} \
    -cp $TESTCLASSES \
    bench.serial.Main \
    -c $TESTSRC/bench/serial/jtreg-config &

Do you see the bug?

The pass/fail status of a shell script test is the exit status of the script. By usual UNIX conventions, a zero exit status means pass and a nonzero exit status means failure. In turn, the exit status of the script is the exit status of the last command the script executes. The problem here is that the last command is executed in the background, and the “exit status” of a command run in the background is always zero. This is true regardless of whether the background command is still running or whether it has exited with a nonzero status. Thus, no matter what happens in the actual test, this shell script will always indicate that it passed!

It’s a simple matter to experiment with the -Xss parameter in both the shell script and Java versions of the test to verify they both use comparable amounts of stack space. And given that the test workload sometimes overflowed the stack, the fix is to specify a sufficiently large stack size to ensure that this doesn’t happen. See JDK-8030284 and the changeset that fixes it.

How did this happen in the first place? I’m not entirely sure, but this serialization benchmark test was probably derived from another shell script test right nearby, an RMI benchmark. Tristan also rewrote the RMI benchmark into a Java test, but it’s a bit more complicated. The RMI benchmark needs to run a server and a client in separate JVMs. Simplified, the RMI benchmark shell script looked something like this:

echo "Starting RMI benchmark server "
java bench.rmi.Main -server &

# wait for the server to start
sleep 10 

echo "Starting RMI benchmark client "
java bench.rmi.Main -client

When the serialization benchmark script was derived from the RMI benchmark script, the original author simply deleted the invocation of the RMI client side and modified the server-side invocation command line to run the serialization benchmark instead of the RMI benchmark, and left it running in the background.

(This test also exhibits another pathology, that of sleeping for a fixed amount of time in order to wait for a server to start. If the server is slow to start, this can result in an intermittent failure. If the server starts quickly, the test must still wait the full ten seconds. The Java rewrite fixes this as well, by starting the server in the first JVM, and forking the client JVM only after the server has been initialized.)

This is clearly another example of the fragility of shell script tests. A single character editing error in the script destroyed this test’s results!

Further Reading

  1. Testing Pitfalls (pdf). I gave this presentation at the TestFest around the time of Devoxx UK in March, 2013. This presentation has a lot of material on the problems that can arise with shell script tests in OpenJDK that use the jtreg test harness.
  2. Jon Gibbons (maintainer of jtreg) has an article entitled Shelling Tests that also explains some of the issues with shell script tests. It also describes the progress being made converting shell tests to Java in the langtools repository of OpenJDK.

Read Full Post »

This year’s JavaOne has begun! The keynotes were today, in Moscone for the first time since Oracle acquired Sun. It was a bit strange, having a little bit of JavaOne in the midst of Oracle OpenWorld. Red was everywhere. The rest of the week, JavaOne is in The Zone in the hotels by Union Square.

As usual, I’m involved in several activities at JavaOne. Oddly enough I’m not on any normal technical sessions. But I have one of almost everything else: a tutorial, a BOF, and a Hands-on lab.

Jump-starting Lambda Programming [TUT3877] – 10:00am-12:00pm Monday.

A gentle introduction to lambdas. This is early in the schedule, so you should start here, and then progress to some of the other, more advanced lambda sessions later in the conference.

[update] Slides available here: TUT3877_Marks-JumpStartingLambda-v6.pdf

Ten Things You Should Know When Writing Good Unit Test Cases in Java [BOF4255] – 6:30pm-7:15pm Monday.

Paul Thwaite (IBM) submitted this and invited me to contribute. I think we have some good ideas to share. Ideally a BOF should be a conversation among the audience members and the speakers. This might be difficult, as it looks like over 250 people have signed up so far! It’s great that there’s so much interest in testing.

[update] Paul has posted his slides.

Lambda Programming Laboratory [HOL3970] – 12:30pm-2:30pm Wednesday.

Try your hand at solving a dozen lambda-based exercises. They start off simple but they can get quite challenging. You’ll also have a chance to play with a JavaFX application that illustrates how some Streams library features work.

[update] I’ve uploaded the lab exercises in the form of a NetBeans project (zip format). Use the JDK 8 Developer Preview build or newer and use NetBeans 7.4 RC1 or newer.

Java DEMOgrounds in the Exhibition Hall – 9:30am-5:00pm Monday through Wednesday.

The Java SE booth in the DEMOgrounds has a small lambda demo running in NetBeans. I wrote it (plug, plug). I plan to be here from 2:00pm-3:00pm on Monday (the dedicated exhibition hours, when no sessions are running) so drop by to chat, ask questions, or to play around with the demo code.

Enjoy the show!

Read Full Post »

I gave a talk at Devoxx UK 2013 entitled Accelerated Lambda Programming. Here is the slide presentation from that talk.

There are just a few introductory slides in the slide deck, after which most of the talk consisted of live programming demos in NetBeans. Below is the sample code from the demo, cleaned up, merged into a single file, and updated for Lambda build b88. The conference was several weeks ago, and I did the demos using build b82. The APIs have changed a little bit, but not that much, certainly much less than the amount they changed in the few weeks leading up to b82.

(Here is a link to JDK 8 early access builds with lambda support. The lambda support is at this writing still being integrated into the JDK 8 mainline, so it may still be a few weeks before you can run this code on the mainline JDK 8 builds. Also, here is a link to NetBeans builds with lambda support. Most recent builds should work fine.)

I’ve included extensive annotations along with the code that attempt to capture the commentary I had made verbally while giving the talk. At some point the video is supposed to be posted, but it isn’t yet (and in any case subscription might be required to view the video).

The APIs are still in a state of flux and they may still change. Please let me know if you have trouble getting this stuff to work. When all the lambda APIs are integrated into the JDK 8 mainline, I’ll update the sample code here if necessary.

Enjoy!

package com.example;

import java.io.*;
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.*;

/**
 * Sample code from my "Accelerated Lambda Programming" talk at Devoxx UK,
 * March 2013. I went through most of the examples in the talk, but I think
 * I missed a couple. At least, I had intended to present all of the examples
 * shown here. (-:
 *
 * @author smarks
 */

public class AcceleratedLambda {
    static List<String> strings = Arrays.asList(
        "one", "two", "three", "four", "five",
        "six", "seven", "eight", "nine", "ten");

    // ========== SOURCES AND OPERATIONS ==========

    static void sources_and_operations() throws IOException {

        // Use a List as a stream source.
        // This just prints out each string.
        strings.stream()
               .forEach(x -> System.out.println(x));

        // Use an int range as a stream source. Note that parameter x is now
        // inferred to be an int whereas above it was a String.
        IntStream.range(0, 10)
                 .forEach(x -> System.out.println(x));

        // Use a string as a stream of chars
        // This prints integers 97, 98, 99, ... huh? The chars() method
        // provides an IntStream so println prints numbers.
        "abcdef".chars()
                .forEach(x -> System.out.println(x));

        // Cast values to char so that it prints 'a' through 'f'.
        "abcdef".chars()
                .forEach(x -> System.out.println((char)x));

        // Reads and prints each line of a text file.
        try (FileReader in = new FileReader("input.txt");
             BufferedReader br = new BufferedReader(in)) {
            br.lines()
              .forEach(x -> System.out.println(x));
        }

        // Change the terminal operation to collect the values
        // into a List.
        List<String> result =
            strings.stream()
                   .collect(Collectors.toList());
        System.out.println("result = " + result);

        // Collect the values into a Set instead of a List.
        // This will probably print the values in a different order,
        // since the set's iteration order is probably different from
        // the order of insertion.
        Set<String> result2 =
            strings.stream()
                   .collect(Collectors.toSet());
        System.out.println("result2 = " + result2);

        // Counts the number of values in the stream.
        long result3 =
            strings.stream()
                   .count();
        System.out.println("result3 = " + result3);

        // Check that every value matches a predicate.
        boolean result4 =
            strings.stream()
                   .allMatch(s -> s.length() < 10);
        System.out.println("result4 = " + result4);

        // Check that at least one value matches a predicate.
        boolean result5 =
            strings.stream()
                   .anyMatch(s -> s.length() > 5);
        System.out.println("result5 = " + result5);

        // Now add an intermediate operation: filter the stream
        // through a predicate, which passes through only the values
        // that match the predicate.
        List<String> result6 =
            strings.stream()
                   .filter(s -> s.length() > 3)
                   .collect(Collectors.toList());
        System.out.println("result6 = " + result6);

        // Map (transform) a value into a different value.
        List<String> result7 =
            strings.stream()
                   .map(s -> s.substring(0,1))
                   .collect(Collectors.toList());
        System.out.println("result7 = " + result7);

        // Take a slice of a stream.
        // Also try substream(n) and limit(n).
        List<String> result8 =
            strings.stream()
                   .substream(2, 5)
                   .collect(Collectors.toList());
        System.out.println("result8 = " + result8);

        // Add intermediate operations to form a longer pipeline.
        // The peek method calls the lambda with each value as it passes by.
        List<String> result9 =
            strings.stream()
                   .filter(s -> s.length() > 4)
                   .peek(s -> System.out.println("  peeking at " + s))
                   .map(s -> s.substring(0,1))
                   .collect(Collectors.toList());
        System.out.println("result9 = " + result9);

        // Most operations are stateless in that they operate on each
        // value as it passes by. Some operations are "stateful". The
        // distinct() operation builds up a set internally and passes
        // through only the values it hasn't seen yet.
        List<String> result10 =
            strings.stream()
                   .map(s -> s.substring(0,1))
                   .distinct()
                   .collect(Collectors.toList());
        System.out.println("result10 = " + result10);

        // The sorted() operation is also stateful, but it has to buffer
        // up all the incoming values and sort them before it can emit
        // the first value downstream.
        List<String> result11 =
            strings.stream()
                   .map(s -> s.substring(0,1))
                   .sorted()
                   .collect(Collectors.toList());
        System.out.println("result11 = " + result11);
    }

    // ========== SEQUENTIAL, LAZY, AND PARALLEL PROCESSING ==========

    // A fairly stupid primality tester, useful for consuming a lot
    // of CPU given a small amount of input. Don't use this code for
    // anything that really needs prime numbers.
    static boolean isPrime(long num) {
        if (num <= 1)
            return false;

        if (num == 2)
            return true;

        long limit = (long)Math.sqrt(num);
        for (long i = 3L; i <= limit; i += 2L) {
            if (num % i == 0)
                return false;
        }
        return true;
    }

    public static void lazy_and_parallel() {
        // Adjust these parameters to change the amount of CPU time
        // consumed by the prime-checking routine.
        long start = 1_000_000_000_000_000_001L; // 10^15 (quadrillion) + 1
        long count = 100L;

        // Sequential check. Takes about 30 seconds on a 2009 MacBook Pro
        // (2.66GHz Core2Duo). There should be four primes found.
        long time0 = System.currentTimeMillis();
        LongStream.range(start, start + count, 2L)
                  .filter(n -> isPrime(n))
                  .forEach(n -> System.out.println(n));
        long time1 = System.currentTimeMillis();
        System.out.printf("sequential: %.1f seconds%n", (time1-time0)/1000.0);

        // Truncate the stream after three results. The full range need not
        // be checked, so this completes more quickly (about 23 seconds).
        LongStream.range(start, start + count, 2L)
                  .filter(n -> isPrime(n))
                  .limit(3)
                  .forEach(n -> System.out.println(n));
        long time2 = System.currentTimeMillis();
        System.out.printf("limited: %.1f seconds%n", (time2-time1)/1000.0);

        // Run the full range in parallel. With two cores, this takes about
        // half the time of the first run (plus some overhead) completing
        // typically in 16 seconds. Note that results are probably returned
        // in a different order.

        // Where are the threads? A parallel stream is split into tasks
        // which are executed by the "common fork-join thread pool," new in
        // Java SE 8. See java.util.concurrent.ForkJoinPool.
        LongStream.range(start, start + count, 2L)
                  .parallel()
                  .filter(n -> isPrime(n))
                  .forEach(n -> System.out.println(n));
        long time3 = System.currentTimeMillis();
        System.out.printf("parallel: %.1f seconds%n", (time3-time2)/1000.0);
    }

    // ========== REDUCTION ==========

    static List<String> words = Arrays.asList(
        "Experience", "is", "simply", "the", "name",
        "we", "give", "our", "mistakes."); // Oscar Wilde

    // Compute the sum of the lengths of the words.
    // The non-streamy approach, using a for-loop.
    static void length0() {
        int total = 0;
        for (String s : words)
            total += s.length();
        System.out.println(total);
    }

    // An attempt at a streamy approach, mutating a captured local variable.
    // DOES NOT WORK. Captured locals cannot be mutated (they must be
    // effectively final).

    // If captured locals could be mutated, they would need to outlive
    // their enclosing scope, thus they'd have to reside on the heap. This
    // implies (a) they'd be visible from multiple threads and be subject
    // to race conditions; and (b) they'd be susceptible to memory leaks.
    // See: http://www.lambdafaq.org/
    //          what-are-the-reasons-for-the-restriction-to-effective-immutability/

//    static void length1() {
//        int total = 0;
//        words.stream()
//             .map(s -> s.length())
//             .forEach(len -> total += len);
//        System.out.println(total);
//    }

    // Work around the inability to mutate a captured local by using a
    // single-element array. DO NOT DO THIS. THIS IS BAD STYLE. This basically
    // buys into all the disadvantages local variables would have if they
    // were moved to the heap. Array elements cannot be synchronized, and
    // they cannot be volatile either, so it is essentially impossible to
    // write race-free algorithms with them. AGAIN, DO NOT DO THIS. YOU WILL
    // GET BURNED.

    static void length2() {
        int[] total = new int[0];
        words.stream()
             .map(s -> s.length())
             .forEach(len -> total[0] += len);
        System.out.println(total[0]);
    }

    // Work around the inability to mutate a captured local variable by
    // mutating an AtomicInteger. This is allowed, since the reference to
    // the AtomicInteger is final, but the AtomicInteger itself can be
    // mutated. What's more, it can be mutated safely from multiple threads.
    // This works, but is poor style, as it results in contention among
    // the threads all attempting to mutate the same variable. See
    // slides 8-9.

    static void length3() {
        AtomicInteger total = new AtomicInteger(0);
        words.stream()
             .map(s -> s.length())
             .forEach(n -> total.addAndGet(n));
        System.out.println(total.get());
    }

    // Summation using reduction. See slides 10-14.
    static void length4() {
        int total =
            words.stream()
                 .map(s -> s.length())
                 .reduce(0, (i, j) -> i + j);
        System.out.println(total);
    }

    // Use method reference instead of a lambda for addition.
    static void length5() {
        int total =
            words.stream()
                 .map(s -> s.length())
                 .reduce(0, Integer::sum);
        System.out.println(total);
    }

    // Use convenience method sum() instead of explicit reduce() method.
    // Note that we've switched from map() to mapToInt() here. The
    // plain map() results in Stream<Integer>, which works fine above,
    // but it does add boxing and unboxing overhead. Using mapToInt()
    // results in an IntStream which not only is more efficient, it also
    // has the convenience sum() method on it.
    static void length6() {
        int total =
            words.stream()
                 .mapToInt(s -> s.length())
                 .sum();
        System.out.println(total);
    }

    // ========== GROUPING ==========

    // These couple grouping examples illustrate "mutable reduction"
    // operations. (See the java.util.stream package documentation.)
    // Many kinds of reductions, such as summation, combine values to
    // create new values. Sometimes we want to build up a data structure
    // like a map. We could combine maps to create new maps, but this would
    // result in excessive copying. Instead, we perform careful mutation in
    // a collect() call at the end of a pipeline.

    // A "Collector" is an object that represents a set of functions that can
    // handle parallel mutation and combining of intermediate results. Of
    // course, the intermediate and final results must be thread-safe data
    // structures, if the reduction is done in parallel.

    // A full explanation of a Collector is beyond the scope of this example.
    // As set of prepared Collector objects can be obtained from the Collectors
    // utility class. We will show a particular kind of Collector that does
    // grouping. The idea is, for each value in the stream, a "classifier"
    // function is run over it. Typically, multiple values in the stream will
    // produce the same result from the classifier function. The values from
    // the stream are then gathered into a Map, whose keys are the results
    // of the classifier function, and whose values are lists of values that
    // correspond to the classifier results.

    // This example groups words by their first letter. Thus, given a stream
    // of strings
    //
    //     one two three four five six seven eight nine ten
    //
    // the resulting map would have key-value pairs
    //
    //     "t" => ["two", "three", "ten"]
    //     "f" => ["four", "five"]
    //     "o" => ["one"]
    //
    // and so forth.

    static void grouping1() {
        Map<String, List<String>> grouping1 =
        strings.stream()
               .collect(Collectors.groupingBy(s -> s.substring(0,1)));
        System.out.println("grouping1 = " + grouping1);
    }

    // The example above has a hard-coded policy of accumulating
    // the grouped stream values into a list. What if we didn't want
    // a list, but instead we wanted to combine the grouped values
    // together?

    // The groupingBy() method has an overload that takes a "downstream"
    // collector that takes each grouped value and combines it with
    // other values in the same grouping.

    // In this example we don't want to combine all the grouped values into
    // a list, but instead we want to get the sum of their lengths. To
    // do this, we use a similar groupingBy() call, but add a second
    // argument Collectors.reducing() to which we specify how to combine
    // (reduce) the values. For a reduction we have to provide an initial
    // value of zero; the second argument is how to get the length of a
    // single string, and the third argument is how to combine the length
    // of the current string with the running total so far. Note, this
    // reduction occurs *within* each group.

    // Thus, the result is Map whose values aren't lists, but instead are
    // integers representing the sums of the lengths of the strings in
    // that group:
    //
    //     "t" => 11
    //     "f" => 8
    //     "o" => 3
    //
    // and so forth.

    static void grouping2() {
        Map<String, Integer> grouping2 =
        strings.stream()
               .collect(Collectors.groupingBy(s -> s.substring(0,1),
                        Collectors.reducing(0, s -> s.length(), Integer::sum)));
        System.out.println("grouping2 = " + grouping2);
    }

    // Comment or uncomment the statements below to control
    // what you want to run.
    public static void main(String[] args) throws IOException {
        // sources_and_operations();
        // lazy_and_parallel();
        // length2();
        // length3();
        // length4();
        // length5();
        // length6();
        // grouping1();
        grouping2();
    }
}

Read Full Post »

I’m finally catching up with my backlog of items dating back to Devoxx UK 2013, which was in March!

There were a couple of testing-related events I participated in. The first was an OpenJDK TestFest, sponsored by the London Java Community, IBM, and Oracle. This wasn’t officially part of Devoxx. It was held at the Oracle offices in London the Saturday prior to Devoxx itself. There were several presentations; I gave a brief talk on OpenJDK Testing Pitfalls (slides). People spent some time hacking on actual tests, but I thought the presentations and the ensuing discussion were very helpful as well.

The other testing-related event I participated in was an OpenJDK BOF with Martijn Verburg. Hm, it’s entitled “OpenJDK Hack Session” but there wasn’t that much hacking there, though Martijn did demonstrate the new OpenJDK build system. I presented some additional material on Testing OpenJDK (slides). This presentation was less about writing tests for OpenJDK than about the difficulties we have testing OpenJDK. The biggest problem, I think, is with unreliable tests. One would hope that a failing test means that there is a bug in the system being tested. Unfortunately in OpenJDK we have a lot of tests that are only 99% reliable. If you run the test suite regularly, especially on all the platforms, that makes it very unlikely that you’ll get a test run with 100% of tests passing, even if there are no bugs in the system. Worse, there are bugs in the system, so test failures caused by actual bugs are mixed with spurious test failures.

You can see this in the test results that Balchandra Vaidya has been posting to the OpenJDK quality-discuss mailing list. See the jdk8 b88 test results posting, for example. If you click through some links to find the Results Archive page, you’ll see that there have been 12-17 failures out of 4,000 or so tests in the JDK test suite, for the past twenty or so JDK 8 builds. Worse, they haven’t been the same failures every time, since the code and tests are constantly being modified, and the set of tests being run is shifting around as well.

This is clearly a serious problem, one that I and others at Oracle hope to make progress on in the coming months.

Read Full Post »

« Newer Posts - Older Posts »