Can AI really evaluate ‘ideas’ and ‘topic development’ in student essays? - MetaMetrics Inc.
Skip to main content
Post Category:
International, MetaMetrics
Post Tags:
AI
Assessing With AI
Writing

Can AI really evaluate ‘ideas’ and ‘topic development’ in student essays?

Assessing With AI: Insights From the Machine Learning Minds at MetaMetrics

Assessing With AI: Insights From the Machine Learning Minds at MetaMetrics

Unlocking the Potential of AI to Score Student Writing

AI has amazed us over the last two years. But, we have all seen it make some comic (sometimes tragic) mistakes. Can it be trusted to evaluate students’ written assignments or essays? If so, how does it work? What are its benefits and limitations? Will it always be accurate?

These are the big questions that we’ll be answering in the first ten posts in this blog series.

Alistair Van Moere

Alistair Van Moere, Ph.D.
President
MetaMetrics


Can AI really evaluate ‘ideas’ and ‘topic development’ in student essays?

Many people wonder whether AI algorithms can score writing constructs such as creativity, ideas, and topic development. Sure they can! In fact, it’s been possible for over two decades (though some people struggle to believe this). Plus, it has improved further in recent years due to breakthroughs in natural language processing (NLP).

Let’s drill into this by describing some techniques we use to score ideas and content. To score content, the AI has to be able to understand the meaning of words and phrases.

Understanding the Meaning of Words

When we build models to score essays we like to look at the actual words the student used, but we also look beyond the words to get at the meaning that the student wants to convey. Look at this example:

  1. Mariners initially circumnavigated the globe in the sixteenth century.
  2. Seafarers first sailed around the world in the 1500s.

Notice two things: (i) the student who wrote #1 had a more sophisticated command of vocabulary; (ii) both students #1 and #2 are conveying the same meaning. This would be apparent to a teacher. And, it would be apparent to a properly developed AI scoring model.

A scoring model like this needs two kinds of algorithms:

  • A set of algorithms to evaluate the actual words: are they concrete or abstract, common or academic, acquired earlier or later in school, applied appropriately in context or not? We have numerous textual analysis techniques to answer questions like these.
  • And, another algorithm that analyzes the meaning. This can be done using a technique called embeddings. Basically, this is a way to abstract from the printed word itself and look at the underlying meaning of the word in its context. By using embeddings, for example, your Google search engine knows when you want to search for a river bank versus a financial bank. It doesn’t think of bank as a printed word. Rather, it thinks of that word as an idea unit within a context.

These techniques account for the meaning of words and phrases. But what about topic development, where ideas are discussed throughout an essay?

Evaluating Topic Development

So, we can turn printed words and phrases into idea units. Now, extend that to sentences. We can take an entire sentence and abstract it into an idea unit. We can do the same for entire paragraphs. Cool, huh?

When we do this, we can analyze the essay using a variety of techniques. For example, we can compare the proximity of idea units. Does one sentence conceptually follow the other? Are the ideas in these sentences cohesively linked within the paragraph? Is the essay tied together as a whole? We can also represent ideas like a mind map where you have branches and nodes – like leaves on the branches of a tree. If the nodes are close together then the topics are well connected.

What I am trying to convey here is that – yes! – we can see how well a student develops their topic or argument through the course of an essay.

A thoughtful modeling process – such as the one we use at MetaMetrics – includes all these micro-constructs in the AI scoring model so that we can provide good feedback to students about their writing.

Caveats and Limitations

Let’s wrap this up with a few caveats and limitations.

First, the proof is in the pudding. At the end of the day, we must check our AI model by comparing AI scoring and teacher scoring. Does the AI scoring closely match expert teacher expectations? We should not care how fancy-schmancy the AI stuff is – only when the AI scoring is very well-validated can we move forward and evaluate our students with it.

Second, how about the social appropriateness of ideas? If there is an objective right or wrong answer, this can of course be scored with an AI model. But what if students are asked for their opinion, and they provide a response that is controversial? Let’s say a student provided a very well-written essay, with supporting ideas and arguments, in favor of cutting down every tree in the world. Hmm. The best solution would be to defer to the teacher, and not let the AI make an ethical or social evaluation.

Third, the AI scoring models need to be built with care and responsibility, and customized to your tasks and scoring criteria. That’s why MetaMetrics is here to help you on your AI writing scoring journey.

Lexile® measure of this blog post: 930L

Looking for other posts in this series? 

Access All Assessing With AI Posts


Add MetaMetrics® Writing AccuRater to Your Literacy Programs

MetaMetrics has decades of experience analyzing text and is excited to announce MetaMetrics® Writing AccuRater, the state-of-the-art in AI analysis of student writing.

Are you an edtech or assessment company? We’d love to power your learning program. If you are interested in incorporating writing into your learning activities or assessment, then we are ready to listen to your needs. Contact our team to discuss.