Closed-weight providers like OpenAI, Anthropic, and Google sell recently trained models with a compelling pitch: these models contain an approximation of important, timely data. They know things. They've read the internet. They have a knowledge cutoff measured in weeks, not years.

But I want to argue something that might sound paranoid until you think about it: LLM-based applications should refrain from presenting anything written by an LLM as informational unless directly corroborated. The "ground truth" these models supposedly contain isn't ground truth at all. It's a hall of mirrors—institutional messaging reflected back at us, contaminated by synthetic content, and fundamentally incapable of distinguishing fact from plausible fiction.

This post covers three related problems:

  1. The reliability problem: LLM outputs aren't trustworthy, and the industry knows it
  2. The institutional capture problem: Models disproportionately reflect what corporations and governments want you to know
  3. The training order problem: We might be building these systems backwards

The reliability problem

Hallucinations are a feature, not a bug

OpenAI's own research from September 2025 admits that hallucinations stem partly from training incentives. Standard benchmarks treat model responses like multiple-choice tests: an "I don't know" scores zero, while a guess has a non-zero chance of being correct. Models are optimized to be good test-takers, not honest about their uncertainty.

The numbers are stark. OpenAI's comparison of their newer "thinking" models versus older ones reveals the trade-off:

Model Abstention Rate Accuracy Rate Error Rate
gpt-5-thinking-mini 52% 22% 26%
OpenAI o4-mini 1% 24% 75%

The older model appears more accurate because it guesses constantly. But it's wrong 75% of the time it gives an answer. The newer model is more humble—it abstains half the time—but when it does answer, it's wrong only 26% of the time. This is progress, but it's also an admission: even state-of-the-art models hallucinate at rates that would be unacceptable in any other information system.

When answers are instantly testable

If you can test answers instantly and there is no penalty for being wrong (only time and cost), the right objective is expected time or cost to first correct answer. Let:

Then:

Abstentions don't help in this setup; they still consume a call, and only c moves the needle. The only thing that matters is the correctness rate per unit time or cost.

Using OpenAI's numbers (source: https://openai.com/index/why-language-models-hallucinate/):

So if both models have similar latency and price, o4-mini wins on expected time/cost despite a worse error rate. If one model is slower or more expensive, the decision rule is simple: pick the model that maximizes c / t for speed or c / k for cost.

ArXiv research on hallucination frames this as statistical inevitability. Facts that appear only once in a training corpus ("singletons") are difficult for models to memorize accurately. The minimum hallucination rate correlates with the proportion of these rare facts in the training data. If you ask about something uncommon, the model is more likely to make something up.

Citation accuracy is abysmal

You might think retrieval-augmented generation (RAG) solves this. Give the model access to sources, let it cite them, problem solved. But DeepTRACE, a tool for auditing AI research systems, found that citation accuracy in "deep research" agents ranges from 40% to 80%. That means even when models cite sources, they're wrong 20-60% of the time. The citations look authoritative. They're formatted correctly. They often point to real documents. But the claims attributed to those documents are frequently unsupported or fabricated.

This isn't a solved problem. The OpenFactCheck framework and tools like DeepEval and RAGAS exist specifically because the industry needs automated ways to verify LLM outputs. The existence of an entire evaluation stack for fact-checking AI is itself evidence that the industry doesn't trust what these models produce.

Model collapse: the snake eating its tail

Here's where it gets worse. A Nature study demonstrated that AI models trained on the outputs of previous AI models experience "model collapse"—a degenerative process where performance, diversity, and accuracy all decline. The mechanism is simple: when a model is trained on outputs from its predecessors, it inherits and amplifies their errors, biases, and hallucinations.

IBM's explainer on model collapse describes two stages:

The internet is now saturated with AI-generated content. Humans in the Loop calls this a 2025 crisis: the vast majority of "clean" human data has already been harvested, and what remains is increasingly polluted with synthetic text. Forbes argues that human-produced content and expert oversight are now crucial to prevent collapse.

The Harvard Journal of Law & Technology has even published on the "Right to Uncontaminated Human-Generated Data," treating pre-2022 content as a kind of pristine resource that may have legal protection implications. Companies that possess datasets collected before the generative AI boom may hold permanent competitive advantages—not because their data is better, but because it's human.

When closed-weight providers sell you "freshness"—a recent knowledge cutoff, training on current events—what exactly are they training on? The 2025 internet is not the 2020 internet. It's contaminated. The "ground truth" they're selling may be partially synthetic, and there's no way to audit it because they won't tell you what's in the training data.

The institutional capture problem

The corporatocratic babelfish

Here's my more controversial claim: closed-weight models act as corporatocratic babelfish. They know everything about every corporation and what every corporation wants everyone to know. The same goes for governments. This isn't conspiracy; it's a natural consequence of how training data is collected and weighted.

Research published on SSRN documents "The Predominant Use of High-Authority Commercial Web Publisher Content to Train Leading LLMs." Key datasets like C4, WebText2, and RefinedWeb are disproportionately composed of content owned by commercial publishers—news organizations, media conglomerates, corporate websites. This is deliberate. Developers want "high-quality" text, and institutional sources produce well-formatted, grammatically correct, professionally edited content.

But this creates a systematic bias. Models mirror the editorial standards and perspectives of professional media rather than the broader public. They reflect institutional framing because that's what they've been fed.

Think about the volume asymmetry. Corporations and governments produce enormous amounts of text:

This content is well-formatted, "high quality" by web scraping standards, and represents what institutions want you to know. It's not balanced by an equivalent volume of critical analysis, whistleblower accounts, or independent journalism. The training data is structurally biased toward the institutional voice.

Source framing creates systematic bias

Research published in Science found that LLMs exhibit systematic bias based on the perceived origin of information. Attributing a statement to a Chinese individual significantly lowered agreement scores across multiple models. The researchers call this "AI nationalism"—models are influenced by the geopolitical framing of their training data.

ArXiv analysis of political content in training data found that left-leaning documents predominate in many open-source training corpora. But the more interesting finding is about framing: left-leaning and right-leaning documents often discuss identical topics using different sources of legitimacy. Models trained on these imbalanced mixtures internalize these frames, which can lead to "hallucinated" political consensus or the reinforcement of narrow viewpoints.

Studies on LLM sentiment show models assign more positive sentiment to Western politicians' names even when the surrounding text is identical. The bias isn't learned critically—it's absorbed as background assumption.

The opacity problem

As LLMs have transitioned from research projects to valuable intellectual property, companies have stopped publishing details about their training corpora. The Future of Free Speech's 2025 AI Report documents this opacity. We can't audit the ratio of corporate-to-independent content. We can't verify claims about data quality or contamination. We're asked to trust that the "ground truth" is trustworthy.

Meanwhile, research on bias propagation shows that biases from filtering pipelines persist even when text is later rewritten. The institutional framing gets baked in at the data selection layer and survives downstream processing. The model has internalized a worldview before any safety tuning begins.

Adversarial manipulation is easy

CMU researchers demonstrated "LLM Whisperer" attacks where subtle synonym replacements in prompts can increase the likelihood of an LLM mentioning a target concept—like a specific brand or political party—by up to 78%. These altered prompts are virtually indistinguishable from original text to human users. Corporate actors with resources can shape model outputs through careful prompt engineering at scale.

There's also emerging evidence of "data grooming"—deliberate pollution of training datasets to influence the long-term behavior of future models. If you can get your content into the training pipeline, you can influence what the model "knows" and how it frames information. Corporations have every incentive to do this, and the resources to execute it.

The training order problem

We might be building these backwards

Here's where I want to propose something that sounds naive but might be important: training order matters, and we're doing it wrong.

Current approaches to AI safety follow a pattern:

  1. Train on everything at once (pretraining on massive web scrapes)
  2. Apply RLHF/safety tuning afterward
  3. Hope the model can distinguish reliable from unreliable

The Alignment Forum's curriculum and 80,000 Hours' AI safety syllabus both emphasize ordered progression—but for teaching humans about AI, not for training the models themselves. When we educate humans, we teach values and critical thinking before exposure to propaganda and institutional messaging. We don't hand children corporate press releases and expect them to develop skepticism on their own.

What if models need to learn skepticism before learning about institutions?

I'm suggesting something like:

  1. First: Train on content about human values, relationships, empathy—what actually matters to people
  2. Then: Train on critical analysis, propaganda recognition, logical fallacies—the Orwell layer
  3. Only then: Train on institutional content—corporate communications, government statements, official sources

This would mean the model learns to question before it learns what to question. The skeptical framework would be foundational, not a post-hoc patch applied through RLHF.

Why this might matter

Research on values-driven AI curriculum shows that embedding ethics into technical training—rather than treating it as a standalone module—produces better outcomes for human students. The same principle might apply to models.

Work on critical thinking in AI education emphasizes "strategic skepticism"—teaching people to recognize AI-generated manipulation and verify information rather than accepting outputs at face value. Current models lack this capacity because they were never taught it as a foundation.

The Digital Resistance curriculum proposes teaching students about deepfakes, hallucinations, and verification before they become dependent on AI tools. We recognize that humans need this scaffolding. Why would models be different?

The love-before-1984-before-Google principle

I'll put this more provocatively: a model should learn about love before learning about Orwell's 1984, and only then should it learn about Google.

The order encodes priorities. If a model's first exposure to human knowledge is corporate press releases and government statements, those sources get weighted as foundational. The institutional voice becomes the default. Critical perspectives become deviations from baseline.

But if a model first learns what humans actually care about—relationships, meaning, suffering, joy—then institutional messaging becomes something to evaluate against human values, not the standard against which human values are measured.

This isn't how current models are trained. They're trained on web scrapes weighted toward "quality" content, which means institutional content. The corporate voice isn't an input to be questioned; it's the voice the model learns to speak.

What this means for applications

If you're building applications on top of LLMs, here's what I think follows:

Never present LLM output as factual without corroboration

This should be the default for any serious application. LLM outputs are claims, not facts. They should be verified against authoritative sources before being presented to users as information. The model's confidence is not evidence of accuracy—in fact, higher confidence often correlates with higher hallucination rates because the model is guessing instead of abstaining.

Be suspicious of institutional framing

When an LLM provides information about a corporation or government, ask: where did this framing come from? The model isn't neutrally reporting facts; it's reflecting whatever framing dominated its training data. For high-stakes decisions, seek primary sources and independent analysis.

Treat "freshness" as a bug, not a feature

Recent training cutoffs mean recent contamination. A model trained on 2025 web content has ingested more synthetic text than a model trained on 2022 content. "Freshness" might mean "more polluted."

Build verification into your pipeline

Use tools like OpenFactCheck, DeepEval, or RAGAS to systematically verify outputs. Don't ship LLM claims to users without automated fact-checking. The latency cost is worth the reliability gain.

Consider the source of your model's "knowledge"

Closed-weight providers won't tell you what's in their training data. Open-weight models at least allow some audit. If institutional bias matters for your use case, you might need to fine-tune on carefully curated data or use RAG with sources you trust.

Conclusion

Ground truth isn't a safe bet. The models we're building:

When a closed-weight provider sells you a model with a recent knowledge cutoff, they're selling you a black box trained on a polluted internet, weighted toward corporate and government messaging, with no way to verify what it "knows" or where that knowledge came from.

Maybe we need models that learn to question before they learn what to question. Maybe training order matters. Maybe the foundational layer should be human values and critical thinking, not institutional press releases.

Until then, treat every LLM output as a claim requiring verification. The babelfish speaks fluently, but it speaks in the voice of whoever fed it.