Posts tagged ‘Java’
The First Multi-Lingual, Valid Measure of CS1 Knowledge: Allison Tew Defends
Allison Elliott Tew has been working for five years to be able to figure out how we can compare different approaches to teaching CS1. As Alan Kay noted in his comments to my recent previous post on computing education research, there are lots of factors, like who is taking the class and what they’re doing in the class. But to make a fair comparison in terms of the inputs, we need a stable measure of the output. Allison made a pass in 2005, but became worried when she couldn’t replicate her results in later semesters. She decided that the problem was that we had no scientific tool that we could rely on to measure CS1 knowledge. We have had no way of measuring what students learn in CS1, in a way that was independent of language or approach, that was reliable and valid. Allison set out to create one.
Allison defends this week. She took a huge gamble — at the end of her dissertation work, she collected two multiple choice question exams from each of 952 subjects. If you get that wrong, you can’t really try again.
She doesn’t need to. She won.
Her dissertation had three main questions.
(1) How do you do this? All the standard educational assessment methods involve comparing new methods to old methods in order to validate them. How do you bootstrap a new test when one has never been created before? She developed a multi-step process for validating her exam, and she carefully defined the range of the test using a combination of text analysis and curriculum standards.
(2) Can you use pseudo-code to make the test language-independent? First, she developed 3 open-ended versions of her test in MATLAB, Python, and Java, then had subjects take those. By analyzing those, she was able to find three distractors (wrong answers) for every question that covered the top three wrong answers in each language — which by itself was pretty amazing. I wouldn’t have guessed that the same mistakes would be made in all three languages.
Then she developed her pseudo-code test. She ran subjects through two sessions (counter-balanced). In one session, they took the test in their “native” language (whatever their CS1 was in), and in another (a week later, to avoid learning effects), the pseudo-code version.
The pseudo-code and native language tests were strongly correlated. The social scientists say that, in this kind of comparison, a correlation statistic r over 0.37 is considered the same test. She beat that on every language.
Notice that the Python correlation was only .415. She then split out the Python CS1 with only CS majors, from the one with mostly non-majors. That’s the .615 vs. the .372 — CS majors will always beat non-majors. One of her hypotheses was that this transfer from native code to pseudo-code would work best for the best students. She found that that was true. She split her subjects into quartiles and the top quartile was significantly different than the third, the third from the second, and so on. I think that this is really important for all those folks who might say, “Oh sure, your students did badly. Our students would rock that exam!” (As I mentioned, the average score on the pseudo-code test was 33.78%, and 48.61% on the “native” language test.) Excellent! Allison’s test works even better as a proxy test for really good students. Do show us better results, then publish it and tell us how you did it!
(3) Then comes the validity argument — is this testing really testing what’s important? Is it a good test? Like I said, she had a multi-step process. First, she had a panel of experts review her test for reasonableness of coverage. Second, she did think-alouds with 12 students to make sure that they were reading the exam the way she intended. Third, she ran IRT analysis to show that her problems were reasonable. Finally, she correlated performance on her pseudo-code test (FCS1) with the final exam grades. That one is the big test for me — is this test measuring what we think is important, across two universities and four different classes? Another highly significant set of correlations, but it’s this scatterplot that really tells the story for me.
Next, Allison defends, and takes a job as a post-doc at University of British Columbia. She plans to make her exam available for other researchers to use — in comparison of CS1 approaches and languages. Want to know if your new Python class is leading to the same learning as your old Java class? This is your test! But she’ll never post it for free on the Internet. If there’s any chance that a student has seen the problems first, the argument for validity fails. So, she’ll be carefully controlling access to the test.
Allison’s work is a big deal. We need it in our “Georgia Computes!” work, as do our teachers. As we change our approaches to broaden participation, we need to show that learning isn’t impacted. In general, we need it in computing education research. We finally have a yardstick by which we can start comparing learning. This isn’t the final and end-all assessment. For example, there are no objects in this test, and we don’t know if it’ll be valid for graphical languages. But it’s the first test like this, and that’s a big step. I hope that others will follow the trail Allison made so that we end up with lots of great learning measures in computing education research.
Exciting new paper on MediaComp with Majors and Year-Later Results
Beth Simon just let me know that her paper has just been accepted to ITICSE 2010. She shared the submitted draft with me, and I’ve been biting my lip, wanting to talk about it here. Now that it’s accepted, I can talk about it, while still leaving the real thunder for Beth’s paper and her presentation this summer. For me, it’s exciting to see two year’s worth of data with CS majors, including following the students into their second year. Beth deals head-on with one of the criticisms of Media Computation (e.g., no, it’s not a tour of all-things-Java — you won’t cover as many language features as you used to) and provides the answers that really matter (e.g., you retain more students, they learn more about problem-solving, and they do really well in the next course). I’ll quote her abstract here:
Previous reports of a media computation approach to teaching programming have either focused on pre-CS1 courses or courses for non-majors. We report the adoption of a media computation context in a majors’ CS1 course at a large, selective R1 institution in the U.S. The main goal was to increase retention of majors, but do so by replacing the traditional CS1 course directly (fully preparing students for the subsequent course). In this paper we provide an experience report for instructors interested in this approach. We compare a traditional CS1 with a media computation CS1 in terms of desired student competencies (analyzed via programming assignments and exams) and find the media computation approach to focus more on problem solving and less on language issues. In comparing student success (analyzed via pass rates and retention rates one year later) we find pass rates to be statistically significantly higher with media computation both for majors and for the class as a whole. We give examples of media computation exam questions and programming assignments and share student and instructor experiences including advice for the new instructor.
Language Choice = f(Number of Copies)
Last night, a user reported a bug in our latest version of JES, the Jython IDE that we use in our Media Computation classes. In cleaning up the code for release, one of the developers renamed the short variable “pict” to “picture”–in all but one spot. The function that broke (with a “name not found” error in the Jython function) is writePictureTo, a really important function for being able to share the images resulting from playing with Media Computation. This was particularly disappointing because this release was a big one (e.g., moving from one-based to zero-based indexing) and was our most careful development efforts (e.g., long testing cycle with careful bug tracking). But at the end, there was a “simple clean-up” that certainly (pshaw!) wasn’t worth re-running the regression tests–or so the developer thought. And now, Version 3.2.1 and 4.2.1 (for zero and one-based indexing in the media functions) will be out later today.
This has got me wondering about the wisdom of developing an application used by hundreds, if not thousands, of students in Python (or Jython). I’ve done other “largish” (defined here, for a non-Systems-oriented CS professor, as “anything that takes more than three days to code”) systems in Python. I built a case library which generated multiple levels of scaffolding from a small set of base case material, called STABLE. Running the STABLE generator was aggravating because it would run for awhile…then hit one of my typos. Over and over, I would delete all the HTML pages generated so far, make the 5 second fix, and start the run all over. It was annoying, but it wasn’t nearly as painful as this bug — requiring everyone who downloaded JES 3.2/4.2 to download it again.
I’m particularly sensitized to this issue after this summer, where I taught workshops (too often) where I literally switched Python<->Java every day. I became aware of the strengths and weaknesses of each for playing around with media. Python is by-far more fun for trying out a new idea, generating a new kind of sound or image effect. But this bug wouldn’t have happened in Java! The compiler would have caught the mis-named variable. I build another “largish” system in Squeak (Swiki), which also would have caught this bug at compile time.
My growing respect for good compilers doesn’t change my attitude about good first languages for students of computing. The first language should be fun, with minimal error messages (even at compile time), with rapid response times and lots of opportunities for feedback. So where does one make the transition, as a student? Why is it important to have good compilers in one place and not in the other?
I am not software engineering researcher, so I haven’t thought about this as deeply as they have. My gut instinct is that your choice of language is a function (at least in part) of the number of copies of the code that will ever exist. If you’re building an application that’s going to live on hundreds, thousands, or millions of boxes, then you have to be very careful — correcting a bug is very expensive. You need a good compiler helping you find mistakes. However, if you’re building an application for the Web, I can see why dynamic, scripting languages make so much sense. They’re fun and flexible (letting you build new features quickly, as Paul Graham describes), and fixing a bug is cheap and easy. If there’s only one copy of the code, it’s as easy as fixing a piece of code for yourself.
First-time programmers should only be writing code for themselves. It should be a fun, personal, engaging experience. They should use programming languages that are flexible and responsive, without a compiler yelling at them. (My students using Java always complain about “DrJava’s yelling at me in yellow!” when the error system highlights the questionable line of code.) But they should also be told in no uncertain terms that they should not believe that they are creating code for others. If they want to produce application software for others, they need to step up to another level of discipline and care in what they do, and that usually means new tools.
I still strongly believe that the first course in computing should not be a course in software engineering. Students should not have to learn the discipline of creating code for others, while just starting to make sense of the big ideas of computing. The first course should be personal, about making code for your expression, your exploration, and your ideas. But when students start building code for others, engineering practice and discipline is required. Just don’t start there.
“Exploring Wonderland” is out: Encouraging transfer between Alice and Java
I just got my copy of the new book by Wanda Dann, Steve Cooper, and Barbara Ericson “Exploring Wonderland.”

I’m really interested to see how this book works in classrooms. As the title suggests, the book integrates Alice and Java programming with Media Computation. It’s not 1/2 Alice and 1/2 Java. Rather, both are integrated around the context of storytelling. You might use Media Computation to create grayscale images or sounds at different frequencies or echoes in your Alice stories. Or you might use Alice to create perfect greenscreens for doing chromakey in Media Computation. Students can put themselves into an Alice movie, or take Alice characters and have them interact with live action video. This isn’t Java to learn Java. This is Java as the special effects studio for Alice storytelling.
The order of the book goes back-and-forth. First, students use Alice to learn about variables and objects, then they do the same thing with turtles in Java. Back to Alice for iteration and conditionals, then see the same things in Java. There’s a real effort to encourage transfer between the two languages.
That explicit effort to transfer within a context is what makes this effort so interesting. Efforts that I’ve seen at Georgia Tech to teach two languages in a first course have failed. It’s just too hard to learn any one thing well to get it to transfer. The advantage of a contextualized computing education approach is that it encourages higher time-on-task — we know from studies at multiple schools with multiple contexts that students will do more with the context if they buy into it, if they’re engaged. Will storytelling work to get students to engage so that the first language is learned well enough to transfer to the second? And if so, do the students end up learning more because they have this deeper, transferrable knowledge?
Nudging Computing Education
I’m spending Father’s Day reading. Just finished Terry Pratchett’s Equal Rites (the first appearance of Granny Weatherwax, which I had never read before), and have now just started Nudge: Improving Decisions about Health, Wealth, and Happiness by Thaler and Sunstein. I’d heard of behavioral economics before, especially in the context of how these ideas are influencing the Obama administration. I’m recognizing implications for computing education as well.
The basic premise of behavioral economics is that people are bad decision makers, and those decisions are easily biased by factors like the ordering of choices. Consider the choice between a cupcake and a piece of fruit. The worse choice there only has consequences much later and the direct feedback (“You gained weight because you chose the cupcake!”) is weak. Thaler and Sunstein promote libertarian paternalism. The idea is that we want to offer choices to people, but most people will make bad choices. Libertarian paternalism suggests that we make the default or easiest choice the one which we (paternalistically) define as the best one — that’s a nudge. It’s not always easy to decide which is the best choice, and we want to emphasize making choices that people would make for themselves (as best as we can) if they had more time and information.
An obvious implication for computing education is our choice of first programming language. Alan Kay has pointed out many times that people are sometimes like Lorenz’s ducks, who were convinced that the Lorenz was their parent: people “imprint” on the first choice they see. Thaler and Sunstein would probably agree that the first language someone learns will be their default choice when facing a new problem. We want to make sure that that’s a good default choice.
How do we choose the first, “best choice” language? If our students are going to become software engineers, then choosing a language which is the default (most common, most popular) in software engineering would make sense: C++ or Java. But what if our students are not going to become software engineers? Then we’ve made their first language harder to learn (because it’s always harder as a novice to learn the tool used by experts), and the students don’t have the vocational aspirations to make the extra effort worthwhile. That choice might then lead to higher failure/withdrawal rates and students regretting trying computer science. Hmm, that seems familiar…
Another choice might be to show students a language in which the best thinking about computer science is easiest. For example, Scheme is a great language for pointing out powerful ideas in computer science. I believe that Structure and Interpretation of Computer Programming by Abelson and Sussman is the best computer science textbook ever written. It’s power stems, in part, from its use of Scheme for exemplifying its ideas.
The challenge of using Scheme is that it is not naturally the language of choice for whatever problem comes the student’s way. Sure, you can write anything in Scheme, but few people do, even people who know Scheme. Libraries of reusable tools that make it easy to solve common problems tend to appear in the languages that more people are using. If students were well-informed (or are/become informed), would they choose Scheme? If the answer to that question is “No,” the teacher appears coercive and constraining, and the course is perceived as being irrelevant. That’s another familiar story.
The ideas of Nudge have implications for teachers, too. I am on the Commission to design the new Advanced Placement (AP) Computer Science exam in “Computer Science: Principles.” (This exam is in contrast to the existing Level A CS AP exam in computer science programming in Java.) We just met for the first time this last week. There will be programming in the new APCS exam, and there’s interest in providing teachers with choices of what language they teach. Providing infinite choice makes it really hard to write a standardized, national exam. Teachers will likely be offered a menu of choices. How will those choices be ordered? How will teachers make these choices? While there are some wonderful high school teachers, there are too few high school CS teachers. The new APCS exam will only be successful if most of the teachers offering it are brand new to computer science. These teachers need help in making these choices, with reasonable default values, because they simply won’t have the experience yet to make well-informed choices.


Recent Comments