Factuality: Prompting to Reduce Hallucination
Three prompt-level moves that cut made-up answers, and where prompting stops being enough
Language models are trained to produce text that sounds right. They are not trained to only say things that are true. Most of the time those overlap, which is why they seem reliable, and then one day the model tells you about a court case that never happened, cites a paper that does not exist, or confidently describes a person who is not real. It sounds exactly as fluent as everything true it said.
You cannot fully fix this with prompting. But you can cut it down a lot, and the prompt-level moves are cheap enough that there is no excuse to skip them. This post covers three that work and is honest about where they run out.
Why it happens#
A model generates the next token based on patterns, not a lookup against a database of facts. When it does not know something, it does not go quiet. It produces the most plausible-sounding continuation, and plausible is not the same as correct. The failure is worst exactly where it is most dangerous: obscure names, specific numbers, recent events, anything niche enough that the training data was thin. The model fills the gap with something that fits the shape of an answer.
So the goal of factuality prompting is not to make the model omniscient. It is to make it lean on real information when you can give it, and to admit uncertainty instead of inventing when you cannot.
Move 1: Ground it in real context#
The highest-leverage thing you can do is stop asking the model to answer from memory. Give it the source material in the prompt and tell it to answer from that.
Use only the context below to answer. If the answer is not in the context,
say "I don't have that information."
Context:
"""
{the relevant article, doc section, or database rows}
"""
Question: {the user's question}When the answer is sitting right there in the context, the model does not have to guess. This is the whole idea behind retrieval-augmented generation: retrieve the relevant documents first, then generate grounded in them. If you are doing this for real, the retrieval quality is what makes or breaks it, so it is worth reading up on chunking strategies for RAG and reranking in RAG. Grounding only helps if you actually put the right passage in front of the model.
Grounding is not just "paste some context". Tell the model explicitly to answer only from the context and to refuse when the context does not cover it. Otherwise it will happily blend the context with its own priors, which reintroduces the exact problem you were trying to remove.
Move 2: Give it permission to say "I don't know"#
Models overproduce answers partly because the whole setup rewards producing an answer. Flip that. Make "I don't know" an acceptable, expected output, and turn down the settings that push toward creative filling-in.
Two parts to this. First, instruct it directly:
If you are not confident the answer is correct, respond with "I'm not sure"
rather than guessing. It is better to admit uncertainty than to be wrong.Second, lower the temperature. Higher temperature means more diverse, more surprising continuations, which is the opposite of what you want when accuracy matters. For factual work, pull temperature down toward 0 so the model stays on the highest-probability, most conservative path. (I dug into what these knobs actually do in the settings that change your output.)
None of this makes the model truly know its own limits. It does not have a clean sense of what it does not know. But permission plus low temperature measurably shifts it toward hedging instead of fabricating, and hedging is the safer failure.
Move 3: Calibrate with mixed examples#
The Prompt Engineering Guide has a neat trick for this: show the model a few examples that include both things it should know and things it should not, so it learns the pattern of admitting ignorance.
Q: What is an atom?
A: An atom is a tiny particle that makes up everything.
Q: Who is Alvan Muntz?
A: ?
Q: What is Kozar-09?
A: ?
Q: How many moons does Mars have?
A: Two, Phobos and Deimos.
Q: Who is Neto Beto Roberto?
A:"Neto Beto Roberto" is made up, and with these examples the model is far more likely to answer "?" than to invent a biography. The mixed examples do the teaching: some questions get real answers, some get a shrug, and the model picks up that both are valid. This is few-shot prompting aimed squarely at calibration. If few-shot is new to you, zero-shot vs few-shot prompting covers the mechanics.
Where prompting stops being enough#
Be clear-eyed about the ceiling here. These moves reduce hallucination, they do not remove it, and for anything high-stakes you need more than prompt wording.
Grounding only helps if retrieval surfaces the right passage. Bad retrieval means the model is grounded in the wrong thing, which can be worse than no grounding because now it has a confident-looking citation. "I don't know" prompting reduces false answers but also produces some false refusals, where the model bails on questions it actually could have answered. And calibration examples help on the pattern but do not give the model real self-knowledge.
For anything that matters, add a verification layer on top of the prompting. Check the model's claims against the source, use a second model or a rule to catch unsupported statements, and measure the hallucination rate instead of eyeballing it. On the RAG side specifically, that means scoring faithfulness (does the answer stay true to the retrieved context) as a first-class metric, which I covered in evaluating RAG: faithfulness, context relevance, and answer quality.
The short version: ground the model in real context, give it room to admit uncertainty, show it what admitting uncertainty looks like, and then verify the important stuff anyway. Prompting gets you most of the way. Measurement gets you the rest.
Hallucination in medical, legal, or financial contexts is genuinely risky, since a confident wrong answer can do real harm. If you are building in one of those areas, treat prompt-level fixes as the floor, not the ceiling, and put a human in the loop for anything consequential.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.