I’m acting as one of the ICER 2025 program chairs, and that has me thinking about reviewer workload in Computing Education Research. (This post is entirely my own personal views, and does not represent the views of the other organisers, ICER as a conference, SIGCSE, etc.)
When we think about reviewing, what we all want is to get our papers accepted a review system that is expert, fair, consistent, and provides reasoning behind the decisions. As we all know, no reviewers are paid; there is a quid pro quo that academics who write papers must also review papers in order to keep the system balanced. Since not everyone writing papers (e.g. early PhD students) is necessarily qualified to review, that means academics must in general write more reviews than they receive each year. That is a lot of reviewing to do, and so it is worth thinking about how much time that takes us all.
The status quo
I don’t teach which I figure gives me a bit more time than other people, and I am bad at saying no, so last year I reviewed or meta-reviewed for the SIGCSE Technical Symposium, SIGCSE Virtual, Koli Calling, UKICER, ICER, TOCE and CSE. I’ve also reviewed in recent years for ITiCSE, WiPSCE and CEP. So I have reviewed for most of the Computing Education Research venues (as reviewer and meta-reviewer), and most of the conferences have gradually become quite homogeneous in their review processes. Papers get reviewed using a review form with scores and text over a 3-4 week period, reviewers discuss for 1-2 weeks supervised by meta-reviewers, meta-reviewers summarise into a meta-review with a score and text, then the program chairs use all this info to choose which papers to accept.
All of this takes time for the people involved. And as the process has grown more involved, the time required has also grown. All of the changes have been made with good intentions: a more consistent and higher quality set of reviews. But designing the review system reminds me of designing software: everyone proposes new features, which are all small and make sense on their own, but if you add them all then the whole design becomes overburdened and eventually collapses in on itself. In the case of reviewing systems, people will just refuse to do it, and the system will fail to operate. I think we are on the verge of that happening, so it is very important for publication venues to consider the workload they are placing on their volunteers. I have some thoughts about reducing reviewer workload, some of which we’re implementing at ICER, and some of which might be considered by other venues.
The review form
There’s two attitudes to review forms:
- Here’s a single text box, and a score/recommendation dropdown. Go for it.
- Here’s a tightly structured review form with many sections and scores to fill in.
The first has definite advantages in terms of workload. The second is better at ensuring all the reviewers are all evaluating against the same criteria. ICER is very much in the second category. I think the review form can be seen as the central part of a publication venue: the criteria tell you what is valued and what kinds of paper are accepted. For this reason I never understood why conferences don’t post it publicly; this year we’ve posted the ICER review form on the website.
The ICER review form has gradually been evolving in ways people might not have appreciated. A few years ago reviewers (myself included!) found it frustrating to fill in the mandatory “Was theory used?” part for papers where it was inappropriate. That got changed already to say “… if appropriate”, and got merged into prior work. Parts have been added about reproducibility.
We’ve continued to change the form this year, but with an eye to shortening it. With some combining and rearranging we’ve removed two whole sections from the form. We have tried to apply more emphasis to a paper’s contributions and whether the paper’s claims align with its execution. Our feeling was that at ICER there was a slight bias towards papers that were executed well but perhaps did not make a major contribution, with a slight bias against papers that had a larger potential contribution but with more caveats. We’ll see how it turns out.
The length limit
There’s broadly two opinions on paper length limits: either specify one or say “should be reasonable according to the content”. TOCE are currently doing the latter, which I think for a journal is a reasonable choice. All conferences that I know of specify a limit. ICER is a bit of an outlier here, for two reasons:
- The limit was specified in words, not pages.
- It has a longer limit than everywhere else.
The idea of having a word limit rather than a page limit at ICER was a worthy experiment. It stops all kinds of busywork (shaving off those single trailing words at the end of paragraphs, shrinking figures) but introduces other problems. Papers can be engorged with figures which don’t count towards the limit (more on that below). And the fiddliness of counting words in a paper is completely crazy, and causes a lot of stress for authors (and work for chairs!) to check it. ICER ended up with 1.5 pages of explanatory text about how to do a word count. So we’re fixing the word count awkwardness (and figure explosion) by going back to a page limit.
I think having a longer limit suits ICER. My vision for ICER is that it should be the flagship conference for research papers, and the kind of depth that ICER papers go into is suited to a longer limit. If you’re over 6 pages double-column excluding references, there’s only a few venues to go to, and ICER is one. However: it’s clear that longer papers mean increased reviewer workload. How long should ICER papers be?
Some might say that even more detail leads to even higher quality papers, but first there is surely a limit to that (should we all be submitting PhD theses as papers?), and second, it becomes too much for reviewers. When you are asked to review for a journal, you are asked to review one paper a time. ICER asks for 4-6 reviews in a 3-4 week period. How are you going to expect quality reviews if the papers are all so long you barely have time to read them all?
For example, I co-wrote a paper in ICER 2023 that ended up 20 pages, double-column. That’s 29 pages in single-column — a journal length paper. Author-me was happy with all the detail we could provide, and it was all within the rules. Chair-me knows that author-me needs to be stopped; we can’t be sending out so many papers of that length to reviewers.
So: it’s time to push back. We’re starting out with an 18 page limit for single column (excluding references, but including everything else: figures, appendices, etc). As a compromise for those who feel this is constraining them we are allowing an optional extra 3 pages but you have to specifically argue for them in your submission form, and reviewers are allowed to ask for them to be removed. I’d like to reduce these limits further, but this year is at least a start in reversing direction on the issue of paper length.
Discussion periods
Personally, I am not sure that discussion periods are very helpful most of the time. Here’s the possibilities that usually seem to happen:
- Reviewers change their scores immediately to suit other reviews. Maybe they are junior or lack confidence, maybe it’s just the general human instinct to not cause conflict or not be the outlier. I am often reviewer 1 because I try to get reviews out of the way (even though in spirit I’m a reviewer 2), and I frequently see review 2 or 3 get entered, then re-entered 2-3 minutes later to move their score closer to mine. Maybe they’re swayed by my persuasive argument (doubt it!), but it’s so quick I think they usually just adjust their score. This defeats the purpose of the discussion period which is to discuss why we were so far apart.
- No-one really changes their mind after discussion. Maybe I’m too stubborn, but in general: I read the paper, I give my opinion and I stand by it. Occasionally I miss something (e.g. an inappropriate analysis) and will tweak my opinion. I don’t see other people changing their mind too often either. Usually disagreements are not specifically factual, it’s caused by people having different views on whether something is the right topic for the venue, or whether the flaws in the paper outweigh its contribution, etc. These are things on which reasonable people may differ, and at that point it’s either the chairs or meta-reviewers who must decide, but the authors discussing it is futile, and often grinds to a halt because of that.
- Review scores get changed (perhaps at the meta-reviewer’s insistence) to coalesce around one number, which creates an impression of consensus. This consensus usually extends to a reviewer saying “Ok, I’ll live with this being rejected if that’s what everyone else thinks, so I’ll change my score”. That could be done in another way besides a discussion period.
I suggest that the discussion period isn’t really doing a lot to help improve review quality, but does add workload to everyone involved. Maybe everyone else’s discussions are operating differently or my impression is wrong — as chair this year I’ll get to see them all, so I can see if it’s just me acting as a bad reviewer or meta-reviewer, and get a wider view. But my impression is that it just adds more time commitment and hassle for all involved, without a matching large improvement in review quality. I also note that journals don’t have discussion periods (often they don’t even let reviewers see each other’s reviews) and yet they are not generally criticised for a lack of review quality.
Binning the discussion period might be a bit too dramatic at ICER but we’re at least going to remove the obvious unnecessary part: there’s no point discussing if everyone already agrees. We’re hoping to at least do away with the futile “Hi, meta-reviewer here, you all seem to be in rough agreement but let’s start a discussion to see if we’ve missed anything” request. Discussion periods will only be used where they are needed, in order for the meta-reviewer to explore disagreements before writing their summary.
Review timelines
Having acted as meta-reviewer, editor, and program chair, I have seen the timelines that reviews come in on. In my experience, there’s two types of reviewer: the one who returns the reviews in the first week, and the one who returns the reviews in the last week before the deadline. I’m not passing judgement (both types of reviewer do good work), just making an observation. If you give 3 weeks, there’s hardly any reviews that come in during the middle week. If you give 5 weeks, you won’t see many during the middle 3 weeks. My flippant-yet-kinda-serious suggestion is that we should give everyone 2 weeks for reviews. It wouldn’t fly, of course: everyone would be up in arms about the incredibly tight timeline. But my personal bet is if that was just the way it was, it wouldn’t make a big difference to how most people organised their time for reviewing. But we’re not implementing this one at ICER!
Also: if you do file reviews early with a long period for reviewing, it makes the discussion period harder because it can be up to 3-4 weeks since you read the paper (alongside 5+ others) so it’s more work to go back and look at the paper to resolve any discussions.
Meta-reviewing
I’ve left my most extreme idea to last, and it’s one that may be better suited to smaller venues. Do you need meta-reviewers at all? I suggest that not every venue does. As I understand it, SIGCSE introduced them to try to increase consistency of reviewing. But SIGCSE is absolutely massive. If you are operating a regional conference that accepts 10 papers, do you need meta-reviewers?
One thing that I think is often overlooked is that meta-reviewers reduce your reviewer pool. If you ask all your most experienced volunteers to be meta-reviewers, your reviews will be from the less experienced volunteers, which might make it seem like meta-reviewers are contributing well by refining or improving these opinions… but what if you just had them acting as normal reviewers? Less workload involved all round, shorter timescales, and for a small conference the chairs can just resolve any conflicting reviews themselves.
Conclusion
I think the workload placed on reviewers and meta-reviewers has become too high, and it’s time for program chairs everywhere to look at how to reduce it. We’ve tried to make some first steps at ICER: shortening the review form, shortening the papers, removing unnecessary discussion. Other conferences may consider some or all of these, plus the more dramatic steps of doing away with discussion periods and/or meta-reviewers. These are my personal views, but happy to hear yours in the comments — either as an author, reviewer, or meta-reviewer.