What is Automated Essay Scoring?
Automated essay scoring (AES) is an important application of machine learning and artificial intelligence to the field of psychometrics and assessment. In fact, it’s been around far longer than “machine learning” and “artificial intelligence” have been buzzwords in the general public! The field of psychometrics has been doing such groundbreaking work for decades.
So how does AES work, and how can you apply it?
What is automated essay scoring?
The first and most critical thing to know is that there is not an algorithm that “reads” the student essays. Instead, you need to train an algorithm. That is, if you are a teacher and don’t want to grade your essays, you can’t just throw them in an essay scoring system. You have to actually grade the essays (or at least a large sample of them) and then use that data to fit a machine learning algorithm. Data scientists use the term train the model , which sounds complicated, but if you have ever done simple linear regression, you have experience with training models.
There are three steps for automated essay scoring:
- Establish your data set. Begin by gathering a substantial collection of student essays, ensuring a diverse range of topics and writing styles. Each essay should be meticulously graded by human experts to create a reliable and accurate benchmark. This data set forms the foundation of your automated scoring system, providing the necessary examples for the machine learning model to learn from.
- Determine the features. Identify the key features that will serve as predictor variables in your model. These features might include grammar, syntax, vocabulary usage, coherence, structure, and argument strength. Carefully selecting these attributes is crucial as they directly impact the model’s ability to assess essays accurately. The goal is to choose features that are indicative of overall writing quality and are relevant to the scoring criteria.
- Train the machine learning model. Use the established data set and selected features to train your machine learning model. This involves feeding the graded essays into the model, allowing it to learn the relationship between the features and the assigned grades. Through iterative training and validation processes, the model adjusts its algorithms to improve accuracy. Continuous refinement and testing ensure that the model can reliably score new, unseen essays with a high degree of precision.
Here’s an extremely oversimplified example:
- You have a set of 100 student essays, which you have scored on a scale of 0 to 5 points.
- The essay is on Napoleon Bonaparte, and you want students to know certain facts, so you want to give them “credit” in the model if they use words like: Corsica, Consul, Josephine, Emperor, Waterloo, Austerlitz, St. Helena. You might also add other Features such as Word Count, number of grammar errors, number of spelling errors, etc.
- You create a map of which students used each of these words, as 0/1 indicator variables. You can then fit a multiple regression with 7 predictor variables (did they use each of the 7 words) and the 5 point scale as your criterion variable. You can then use this model to predict each student’s score from just their essay text.
Obviously, this example is too simple to be of use, but the same general idea is done with massive, complex studies. The establishment of the core features (predictive variables) can be much more complex, and models are going to be much more complex than multiple regression (neural networks, random forests, support vector machines).
Here’s an example of the very start of a data matrix for features, from an actual student essay. Imagine that you also have data on the final scores, 0 to 5 points. You can see how this is then a regression situation.
How do you score the essay?
If they are on paper, then automated essay scoring won’t work unless you have an extremely good software for character recognition that converts it to a digital database of text. Most likely, you have delivered the exam as an online assessment and already have the database. If so, your platform should include functionality to manage the scoring process, including multiple custom rubrics. An example of our FastTest platform is provided below.
Some rubrics you might use:
- Supporting arguments
- Organization
- Vocabulary / word choice
How do you pick the Features?
This is one of the key research problems. In some cases, it might be something similar to the Napoleon example. Suppose you had a complex item on Accounting, where examinees review reports and spreadsheets and need to summarize a few key points. You might pull out a few key terms as features (mortgage amortization) or numbers (2.375%) and consider them to be Features. I saw a presentation at Innovations In Testing 2022 that did exactly this. Think of them as where you are giving the students “points” for using those keywords, though because you are using complex machine learning models, it is not simply giving them a single unit point. It’s contributing towards a regression-like model with a positive slope.
In other cases, you might not know. Maybe it is an item on an English test being delivered to English language learners, and you ask them to write about what country they want to visit someday. You have no idea what they will write about. But what you can do is tell the algorithm to find the words or terms that are used most often, and try to predict the scores with that. Maybe words like “jetlag” or “edification” show up in students that tend to get high scores, while words like “clubbing” or “someday” tend to be used by students with lower scores. The AI might also pick up on spelling errors. I worked as an essay scorer in grad school, and I can’t tell you how many times I saw kids use “ludacris” (name of an American rap artist) instead of “ludicrous” when trying to describe an argument. They had literally never seen the word used or spelled correctly. Maybe the AI model finds to give that a negative weight. That’s the next section!
How do you train a model?
Well, if you are familiar with data science, you know there are TONS of models, and many of them have a bunch of parameterization options. This is where more research is required. What model works the best on your particular essay, and doesn’t take 5 days to run on your data set? That’s for you to figure out. There is a trade-off between simplicity and accuracy. Complex models might be accurate but take days to run. A simpler model might take 2 hours but with a 5% drop in accuracy. It’s up to you to evaluate.
If you have experience with Python and R, you know that there are many packages which provide this analysis out of the box – it is a matter of selecting a model that works.
How effective is automated essay scoring?
Well, as psychometricians love to say, “it depends.” You need to do the model fitting research for each prompt and rubric. It will work better for some than others. The general consensus in research is that AES algorithms work as well as a second human, and therefore serve very well in that role. But you shouldn’t use them as the only score; of course, that’s impossible in many cases.
Here’s a graph from some research we did on our algorithm, showing the correlation of human to AES. The three lines are for the proportion of sample used in the training set; we saw decent results from only 10% in this case! Some of the models correlated above 0.80 with humans, even though this is a small data set. We found that the Cubist model took a fraction of the time needed by complex models like Neural Net or Random Forest; in this case it might be sufficiently powerful.
How can I implement automated essay scoring without writing code from scratch?
There are several products on the market. Some are standalone, some are integrated with a human-based essay scoring platform. ASC’s platform for automated essay scoring is SmartMarq; click here to learn more . It is currently in a standalone approach like you see below, making it extremely easy to use. It is also in the process of being integrated into our online assessment platform, alongside human scoring, to provide an efficient and easy way of obtaining a second or third rater for QA purposes.
Want to learn more? Contact us to request a demonstration .
- Latest Posts
Nathan Thompson, PhD
Latest posts by nathan thompson, phd ( see all ).
- What is an Assessment-Based Certificate? - October 12, 2024
- What is Psychometrics? How does it improve assessment? - October 12, 2024
- What is RIASEC Assessment? - September 29, 2024
Online Assessment
Psychometrics.
Automated Essay Scoring
- © 2022
- Beata Beigman Klebanov 0 ,
- Nitin Madnani 1
Educational Testing Service, USA
You can also search for this author in PubMed Google Scholar
Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)
2658 Accesses
5 Citations
1 Altmetric
This is a preview of subscription content, log in via an institution to check access.
Access this book
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
Tax calculation will be finalised at checkout
Other ways to access
Licence this eBook for your library
Institutional subscriptions
About this book
Similar content being viewed by others.
Automated Essay Scoring Systems
Automated Essay Feedback Generation in the Learning of Writing: A Review of the Field
Table of contents (13 chapters), front matter, introduction, should we do it can we do it.
Beata Beigman Klebanov, Nitin Madnani
Getting Hands-On
Building an automated essay scoring system, from lessons to guidelines, a deep dive: models, features, architecture, and evaluation, generic features, genre- and task-specific features, automated scoring systems: from prototype to production, evaluating for real-world use, further afield: feedback, content, speech, and gaming, automated feedback, automated scoring of content, automated scoring of speech, fooling the system: gaming strategies, summary and discussion, looking back, looking ahead, back matter, authors and affiliations, about the authors, bibliographic information.
Book Title : Automated Essay Scoring
Authors : Beata Beigman Klebanov, Nitin Madnani
Series Title : Synthesis Lectures on Human Language Technologies
DOI : https://doi.org/10.1007/978-3-031-02182-4
Publisher : Springer Cham
eBook Packages : Synthesis Collection of Technology (R0) , eBColl Synthesis Collection 11
Copyright Information : Springer Nature Switzerland AG 2022
Softcover ISBN : 978-3-031-01054-5 Published: 12 November 2021
eBook ISBN : 978-3-031-02182-4 Published: 31 May 2022
Series ISSN : 1947-4040
Series E-ISSN : 1947-4059
Edition Number : 1
Number of Pages : XX, 294
Topics : Artificial Intelligence , Natural Language Processing (NLP) , Computational Linguistics
- Publish with us
Policies and ethics
- Find a journal
- Track your research
IMAGES
COMMENTS
Automated essay scoring. Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete ...
Automated essay scoring (AES) is an important application of machine learning and artificial intelligence to the field of psychometrics and assessment. In fact, it’s been around far longer than “machine learning” and “artificial intelligence” have been buzzwords in the general public!
Automated essay scoring (AES) is a computer-based assessment system that automatically scores or grades the student responses by considering appropriate features. The AES research started in 1966 with the Project Essay Grader (PEG) by Ajay et al. (1973).
coring of EssaysSemire DikliIntroductionAutomated Essay Scoring (AES) is defined as the computer technology that evaluates and scores the written prose (Shermis & Barrera, 2002; Shermis & Burstei. , 2003; Shermis, Raymat, & Barrera, 2003). AES sys-tems are developed to assist teachers in low-stakes classroom assessment as well as testing ...
The first widely known automated scoring system, Project Essay Grader (PEG), was conceptualized by Ellis Battan Page in late 1960s (Page, 1966, 1968).PEG relies on proxy measures, such as average word length, essay length, number of certain punctuation marks, and so forth, to determine the quality of an open-ended response item.
Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational ...
Automated essay scoring (AES) and automated writing evaluation (AWE), which rely on artificial intelligence (AI)-based natural language processing to score and give real-time writing-focused feedback to students as they write, have the potential to save teachers time and, perhaps, make it more feasible to give students more writing opportunities.
This book discusses the state of the art of automated essay scoring, its challenges and its potential. One of the earliest applications of artificial intelligence to language data (along with machine translation and speech recognition), automated essay scoring has evolved to become both a revenue-generating industry and a vast field of research, with many subfields and connections to other NLP ...
Accordingly, automated essay grading (AEG) systems, or automated essay scoring (AES systems, are de fined as a computer-based process. of applying standardized measurements on open-ended or ...
This new volume is the first to focus entirely on automated essay scoring and evaluation. It is intended to provide a comprehensive overview of the evolution and state-of-the-art of automated essay scoring and evaluation technology across several disciplines, including education, testing and measurement, cognitive science, computer science, and ...