Robo graders are a recent phenomenon developed to score and provide feedback for essay writing. The notion of automated essay grading was first proposed 50 years ago, but the computational power of the 1960s was insufficient to make the proposition a reality (2). Today, robo graders are being separately designed by a number of companies, such as Educational Testing Service, EdX and Pearson Education (1, 2).
With the technology now readily available, robo graders are inciting debate about the subjective paradigm of essay grading. Robotic graders are beneficial insofar as they can turn a potentially biased, time-consuming task into an objective matter of seconds. Mark Shermis, dean of the College of Education at the University of Akron, says, “automated essay scoring is nonjudgmental. And it can be done 24/7. If students finish an essay at 10 p.m., they get feedback at 10:01,” (2). The robot graders are capable of scoring around 30 writing samples in an hour, significantly shorter than grading time by a human being (1).
Shermis called the robot “nonjudgmental,” and in this regard, states that the robot is superior to humans. Teachers suffer from the law of marginal returns when they become tired and may not assign grades to essays as efficiently or fairly. Comparatively unimportant factors in effective writing, such as handwriting, are statistically an issue for human graders where the robot, reading only electronic copies, doesn’t distinguish neatness. “The reality is, humans are not very good at doing this,” said Steve Graham, a Vanderbilt University professor who researches essay-grading techniques (2).
The ETS, an organization that administers standardized tests such as the GRE, has implemented their version of the robo grader, called E-rater, in grading a few standardized tests like the TOEFL, a test of English fluency taken by foreign students (1, 2). However, human grading supersedes the robotic program for high-stakes exams, like those required for college admissions, such as the SAT and ACT (2). EdX, a nonprofit group created as a joint venture between Harvard and MIT, is about to launch a new free Internet service to grade student essays without teacher input; recent studies have shown that, in general, the robot’s scoring often strongly agrees with that of a human (3). For example, study by Attali, Bridgeman and Trapani in 2010 found that the E-rater’s scoring agreement with a human rater on the TOEFL and GRE exams was higher than the agreement of two disparate human graders (4).
Despite the efficiency and seeming accuracy of the robot’s standards for writing, there remain loopholes and inconsistencies in the robots’ grading algorithms. For example, the E-rater favors longer paragraphs and sentences, connecting words like “however,” and “moreover,” and a polysyllabic lexicon. And, it favors sentences that do not begin with words such as “or” and “and” (1). The director of writing at the MIT, Les Perelman, fed an essay to the robo grader with a variety of amusingly incorrect sentences, though playing to the robot’s parameters of looking for long words and sentences, and received a top score. The robot prefers proper grammar and sentence complexity to logical coherence, topicality and subjective parameters such as “flow” (5).
Shermis says that, for example, a paper on Christopher Columbus could still be rated highly even if it sermonized about how Queen Isabella sailed with 1492 soldiers to the Island of Ferdinand, an obvious fallacy with correct keywords used incorrectly (2). The computer also doesn’t have a grading scale for the “beauty” of writing; in this regard, Thomas Jehn, the director of the Writing Program at Harvard College, worries that students won’t endeavor to write well and creatively, with metaphors or analogies, if these aren’t part of the grading criteria. In response, David Williamson, senior research director at ETS argues, says that the robots aren’t meant to identify a great poet, but rather to measure how effectively a writer can argue and communicate in a basic way. According to the most recent national writing assessment, effective communication is a skill that three out of four high school students lack; robot graders would be able to identify these flaws efficiently and return feedback to assist the writer in developing his or her skills (2).
Though the technology behind robo grading is viable and its results have been shown to agree with human grading, it is still a long way from full implementation. Robot readers still have too many potentially exploitable algorithmic loopholes to be utilized as the standard for essay grading. Nevertheless, they can serve as a useful tool for some schools, where robo grading is simply a means to improve a draft and a teacher grades the final edition of a paper. If advancements in the technology can create a system that discourages students from playing to the machine’s grading parameters, robo-graders can become a constructive to improve their writing.
References
1. Michael Winerip, Facing a Robo-Grader? Just Keep Obfuscating Mellifluously (20 April, 2013). Available at http://www.nytimes.com/2012/04/23/education/robo-readers-used-to-grade-test-essays.html?pagewanted=all&_r=0 (19 April 2013).
2. Stephanie Simon, Robo-readers: The New Teachers’ Helper in the U.S. (19 April, 2013). Available at http://www.reuters.com/article/2012/03/29/us-usa-schools-grading-idUSBRE82S0ZN20120329 (29 March, 2013).
3. Jessica Smith, Robo-graders Like Long Words, Not So Big on Intellectual Coherence (20 April, 2013). Available at http://sciencereview.berkeley.edu/robo-graders-like-long-words-not-so-big-on-intellectual-coherence/ (8 May, 2012).
4. Frequently Asked Questions About the E-rater Technology (20 April, 2013). Available at http://www.ets.org/erater/about/faq/
5. Sarah Gardner, Beware the Rise of the Robo-grader (20 April, 2013). Available at http://www.marketplace.org/topics/tech/beware-rise-robo-grader (April 5, 2013).