5 min read

Should we allow LLMs into the classroom?

It's not that long ago that I wanted to keep LLMs out of the classroom. What changed?
Should we allow LLMs into the classroom?
Photo by John Schnobrich / Unsplash

Before diving more deeply into using LLMs in education, it makes sense to think about whether it's actually desirable to put these tools in the classroom. Just over two years ago, I made the point that no, we should not:

What I think we shouldn't do is bring ChatGPT into the classrooms ourselves. I am sure there are fun and even instructive applications out there, but it is an ethically dubious technology. Even if its externalities don't bother you, it is worth considering that ChatGPT is essentially a plagiarism engine – it took content from the internet, chewed on it for a while and is now regurgitating the mash-up for a fee. I am not sure how we can argue against plagiarism if we use a tool like that ourselves.

It wasn't just plagiarism-by-example that concerned me - I included the box below to specify what I meant by 'ethically dubious'.

🧠
Why ethically dubious? Well, ChatGPT3 was trained on written content taken from unsuspecting humans (with all their biases) in an energy-costly process. Content is 'cleaned' by workers in low-wage countries who are then psychologically scarred by the dark recesses of the internet they need to work through. And that's just development: once deployed, LLMs will not only automate your inbox and take your meeting notes, but will also generate online scams, targeted harassment and will "flood the zone with shit". This 'disruption' may not deserve immediate enthusiastic participation.

None of these concerns have gone away.

I know there are claims that LLMs are energy efficient now, but in my experience these are always based on the erroneous assumption that energy costs are driven by querying the models, rather than by training them. If that were the case, then the energy costs of your laptop running a local LLM would be a good proxy of its environmental impact, but unfortunately it's not. As a recent piece on Ars Technica demonstrated, we don't know much about the energetics of LLMs, but it's safe to say that the bulk of the energy expenditure comes from training neural networks in GPU clusters. Using the tools then is a market incentive to keep training new models, which feeds into all the concerns listed in the box above.

But to be honest, my main concern about LLM use is more fundamental: regardless of how we develop LLMs, we are tempted to ascribe mental capacities to chatbots which they simply do not have. Their design (or better: their designers) hijack our social cognition to make us infer there is a chatbot with thoughts and feelings. That is, after all, what expressive language implies. The late Daniel Dennett therefore called LLMs and related technology "counterfeit people", and argued that their development was a grave crime. Deliberately putting such machines in an educational context carries the real risk of students and teachers considering them to be person-like agents that can "reason" or "think along", but if that is a false premise, usage of the tool is on shaky ground.

This particular concern also came up in a recent debate between computational linguistics professor Emily Bender and OpenAI's Sébastien Bubeck. They spoke on the topic of understanding in LLMs and covered key epistemic concerns one might have about LLMs.

In her opening, professor Bender claims "[LLM output] only makes sense because we're making sense of it." I am not sure I fully agree, because at the other end of the LLM pipeline stand the people who wrote the training corpus and who also put sense into their words, but the point remains that between the human and the LLM, it's only the human who is concerned with meaning at all and who will do the cognitive labour of sense-making and understanding.

This asymmetry is not salient if you're using the technology. If you're a student who wishes to learn about a topic and you use an LLM to generate an overview, you will presume there is a "mind behind the text", as Bender puts it. If you're a teacher who deploys LLMs in your course, you will be similarly tempted to consider the LLM as a colleague of sorts. Again, the shape of LLM chatbots is that of counterfeit people, and none of us is entirely resilient against that.

Following this line of reasoning, you might conclude that the only 'responsible use of LLMs in education' is not using them at all. After all, who wants to risk teachers and students believing there's an extra intelligence in the classroom, if that so-called intelligence is faking understanding? Who wants to teach students to delegate substantive tasks to a system that is ultimately about linguistic form? Who wants to risk a machine that lacks understanding to become an automated teaching assistant or student mentor?

I'd be very much in the no-thank-you camp, if it were not for the fact that regardless of all of this, people are actually throwing such tasks at LLMs. There are reports stating that student use has reached 88% in 2024 - 2025. At my own workplace, I see colleagues using it for substantive, scientific work, such as experimental design in neurobiology, thought experiments in physics and statistical analysis on behavioural data. Dutch scientific grant provider NWO allows applicants to use GenAI, if done 'responsibly'. As implied by a recent study discussed on this blog, knowledge workers across sectors are incorporating GenAI in their workflow.

Additional point

An additional point one might make is that the last years have seen progress in terms of reliability, due to the introduction of retrieval-augmented generation, increased prompt space, chain-of-thought reasoning and other innovations. There's indeed remarkable engineering going on and I will not deny that the systems of today are better suited for epistemic purposes than those of two years ago. However, as long as LLMs lie at the core of this technology, I believe proper reasoning is an illusion. I hope to come back to this topic at a later time.

This means two things.

First of all, despite its shaky epistemic foundations, the tool is evidently not useless. People would not be using it at all if it did not deliver to some extent. That does not mean the hype is justified (and there are signs we're heading into the next, more pessimistic stage of the hype cycle already), but it suggests the technology has a purpose. As dr. Bubeck says during the debate, even LLMs that lack understanding can in principle help humans gain an understanding, and that is definitely a key use in practice.

Secondly, given that they have some value in the workplace, learning about LLMs and how to use them is important for the next generation of knowledge workers. Exactly because the proper use cases are controversial, that's something you want to deal with in formal education – which means bringing the technology into the classroom.

My opinions on LLMs have not shifted that much during the past two years, but despite my misgivings I see it's important to figure out what 'responsible use' can mean, exactly, and how students and teachers alike can make use of the technology. For me, that starts with recognising that LLMs are a fallible technology for cognitive work, precisely because they are not capable of reasoning and understanding, just of generating utterances that look like it.