What is the difference between ‘explainable AI’ and ‘unexplainable AI’ for scoring student writing?

Assessing With AI: Insights From the Machine Learning Minds at MetaMetrics

Unlocking the Potential of AI to Score Student Writing

AI has amazed us over the last two years. But, we have all seen it make some comic (sometimes tragic) mistakes. Can it be trusted to evaluate students’ written assignments or essays? If so, how does it work? What are its benefits and limitations? Will it always be accurate?

These are the big questions that we’ll be answering in the first ten posts in this blog series.

Alistair Van Moere

Alistair Van Moere, Ph.D.
President
MetaMetrics

What is the difference between ‘explainable AI’ and ‘unexplainable AI’ for scoring student writing?

We often think of AI models as being either understandable and explainable or unknowable and difficult to explain. The choice between these AI models matters — for reasons I’ll explain in a minute. But first, let’s look at the two kinds of models in more detail.

Explainable AI

Explainable models tend to come from the more traditional field of statistics. Think about the home you live in or a home you want to buy, and how much it costs (ah, that never-fail topic for dinner party conversation!). You can probably boil it down to a formula that makes sense. Let’s see: how many bedrooms; how many square feet; lot size; distance from the town or city center; how desirable the schools are, etc. You can plug these numbers into a statistical model, and voila, it will estimate your home price. Fun fact: the best predictor of a home’s value is the value of the neighboring homes!

If you take this approach, you can actually see how the statistical model is working. For example, you can rank the importance of each factor —maybe ‘square feet’ is more important when you are close to the town center but less important when you are far from the town center. So, this kind of model is explainable because we know what features are in the model and how impactful they are.

It’s the same for analyzing student writing. I can build a statistical model where I can see what factors are being scored. Instead of lot size and bedrooms, I would measure whether the student is using words appropriately in context, whether their ideas are well supported by opinions, and whether their message is confused or clear. (There are techniques for doing this, which we’ll cover in another blog). And I can see how these are ‘weighted’ in the model.

So, that’s a kind of explainable AI. And if you read our first blog post in this series, you’ll recognize this is a feature-based scoring approach. In contrast, unexplainable AI is usually associated with the more modern deep learning models.

Unexplainable AI

Deep learning models are used in a variety of situations, but most people interact with them when they use large language models such as generative AI. They are “deep” because they have many models in layers, one below the other. In practical terms, you cannot write out the formula as you would for a traditional model. And, you cannot follow your data through the models. The data you enter becomes transformed, abstracted, and untraceable.

Engineers know what their models are designed to do, but they cannot say exactly why the models reached the conclusions that they did. All the engineers can say is: “Well, the AI took the data like ‘lot size’ and ‘distance from the town center’, and it used data for millions of homes, and it somehow made sense of that data, and the AI gave your home value as $350,000. Okay?”

The same is true for a model that is designed to score essays. “Well, it looked at the words you wrote, and based on many other essays it analyzed, it thinks you deserve a 4 out of 6. Okay?”

Why is it important?

For essay scoring, both explainable and unexplainable approaches attempt to simulate what a teacher would do when evaluating students’ writing. But, there are at least four reasons why it is important to choose the right scoring model.

Trust. If we want people to trust our AI scoring, then it is very unsatisfying to hear “you got that score because the computer said so”. We should be able to point to the writing, point to the score, and draw a link between them with a rationale.
General Data Protection Regulation (GDPR). This law states that European Union citizens have a right, among other things, to know how their data is being processed, why decisions are made by AI based on their data, to have the algorithms explained to them at least at a high level, and the right to be able to question the results of AI-powered outcomes. This sentiment, if not the specific law, has spread to many countries outside the EU.
Bias. Deep learning models can be based on large datasets that have implicit biases. These are hard to detect. Explainable models are created in a more hands-on way, which means we can check for bias throughout the modeling process. For example, in feature-based scoring, we can check each feature individually for bias.
Predictability. We want our results to be consistent and predictable. A large language model, depending on how it is set up, might give different scores or different feedback for the same essay, depending on what you ask of it. Whereas in explainable, feature-based approaches we can have more control over the consistency of results.

So, depending on your learning or assessment context, this could be very important. An unexplainable model could work just fine for a low-stakes classroom context, where the teacher just wants to give quick feedback to a student. However, a more explainable approach would be appropriate in a more rigorous context, where the feedback or scores are used for decision-making. Either way, our team at MetaMetrics is here to help you think through your solution.

(Final note: some people use the terms black box and glass box for AI algorithms. I prefer to avoid these, so as not to reinforce stereotypes).

Lexile® measure of this blog post: 960L

Looking for other posts in this series?

Access All Assessing With AI Posts

Add MetaMetrics® Writing AccuRater to Your Literacy Programs

MetaMetrics has decades of experience analyzing text and is excited to announce the MetaMetrics® Writing AccuRater, the state-of-the-art in AI analysis of student writing.

Are you an edtech or assessment company? We’d love to power your learning program. If you are ready to incorporate writing into your learning activities or assessment, then we are ready to listen to your needs. Contact our team to discuss.