In a recent LinkedIn post, Jason Bell, author of two books on machine learning, made a strong argument against calling errors in large language models (LLMs) “hallucinations.” He points out that when an LLM generates incorrect information, it’s not hallucinating at all—it’s simply following its design: performing stochastic sampling from probability distributions learned during training. This process involves mathematical operations like attention weights, layer normalizations, and softmax functions across billions of parameters. The model predicts the next token based on context, with some randomness from factors like temperature settings. There’s no “false perception” or mental slip-up; it’s pure statistics at work.
I could not agree more with Bell. The term “hallucination” doesn’t just mislead—it fundamentally misrepresents how LLMs operate, and it’s time we ditched it for more accurate language. A common question that comes up in discussions is, “How does your software control hallucinations?” It’s an understandable query, especially for those new to the field, but it often highlights an opportunity to dive deeper into the mechanics of these models. Let’s break down why it makes no sense to frame these outputs as hallucinations and explore better ways to describe them.
Why “Hallucination” Makes No Sense: It’s Not Perception, It’s Probability!
The word “hallucination” comes from human psychology and neurology, where it describes experiencing something that’s not there—like seeing or hearing things due to a sensory or cognitive malfunction. Think of it as a breakdown in how the brain processes reality, often linked to conditions like schizophrenia or sensory deprivation. But LLMs aren’t brains; they have no senses, no consciousness, and no “reality” to misperceive. They don’t “see”, “feel”, “imagine” or “think” about anything. Instead, they generate text by calculating probabilities based on patterns in their training data.
Here’s why this analogy falls apart:
1. No Perception Involved: Hallucinations in humans stem from faulty sensory input or processing. LLMs don’t have senses—they’re trained on text data alone. When they produce wrong info, it’s because the data had gaps, biases, or ambiguities, leading the model to fill in with the most statistically likely (but factually off) continuation. For example, if training data rarely covers a niche topic, the model defaults to broader patterns, which might be inaccurate. This isn’t a “false vision”; it’s a probabilistic guess.
2. It’s By Design, Not a Malfunction: True hallucinations signal something wrong in the system, like a bug or illness. But LLM errors are expected outcomes of their architecture. Autoregressive models (the core of most LLMs) build outputs token by token, where each step compounds on the last. A slight probabilistic drift early on can lead to entirely fabricated sequences later—a side effect inherent to this sequential, probability-driven structure. Parameters like temperature introduce deliberate randomness for creativity, increasing the chance of divergence from facts. Calling this a “hallucination” implies a fixable glitch, but it’s a natural byproduct of the architecture, not something you can fully “control” without rethinking the foundational design.
3. Anthropomorphism Creates Confusion: By using a human-like term, we attribute intent or awareness where there is none. LLMs don’t “know” they’re wrong; they output with confidence based on math, not cognition. This sets unrealistic expectations—people think AI is “mostly reliable with occasional slips,” when accuracy is always on a spectrum tied to data quality and context. It distracts from real issues, like sparse training data or the need for better uncertainty detection, turning a technical challenge into a quirky metaphor.
In short, “hallucination” anthropomorphizes a statistical process, making it sound like a human error rather than a designed limitation. It’s like calling a dice roll that doesn’t match your bet a “hallucination”. Reality is that the dice are working fine; your expectations and misunderstandings are the issue!
A Few Voices Echoing the Call for Change
Bell’s view isn’t unique. For instance, an arXiv paper titled “AI Hallucinations: A Misnomer Worth Clarifying” argues that these outputs are better seen as erroneous fabrications, not perceptual illusions, and pushes for clearer terms in research. Similarly, a piece in Psychology Today notes that AI creates “convincing falsehoods by design,” without any sensory basis, so the hallucination label unnecessarily humanizes machines.
On social platforms like X, users like lcamtuf (@lcamtuf) prefer “confabulation,” explaining that hallucinations involve sensory disorders, which don’t apply to LLMs’ “plausible bullshit.” These examples show a building consensus: the term hinders precise discussion.
Better Ways to Talk About It: Focus on the Stats
So, what should we say instead? Bell suggests “distributional sampling under uncertainty,” which captures the essence—outputs reflect probabilistic draws from data distributions, with reliability dropping in low-data areas.
Other options include:
• Confabulation: Fabricating details to fill memory gaps (borrowed from psychology but without the sensory angle).
• Fabrication or Factual Inconsistency: Straightforward labels emphasizing the output’s unreliability.
• Probabilistic Error or Ungrounded Generation: Highlights the statistical roots and lack of real-world anchoring.
In practice, this shift encourages solutions like retrieval-augmented generation (RAG), where models pull from verified sources, or improved uncertainty metrics to flag risky outputs.
Conclusion: Precision Builds Better AI
Jason Bell’s post is another clear signal: let’s stop using “hallucination” and adopt language that reflects the reality of LLMs as probabilistic tools. By doing so, we clarify expectations, focus on meaningful improvements, and avoid confusing metaphors. If you’re working with AI, start today—describe errors for what they are: statistical outcomes, not sensory slips. After all, accurate language is a precursor to accurate progress. What term resonates with you?