Repeating a user’s instruction within the same prompt can dramatically increase the correctness of large language model outputs. Recent research demonstrates accuracy gains of up to 76 percentage points on fast, non‑reasoning models, with no extra latency or cost, offering a simple, zero‑cost optimization for developers.
What the Study Tested
Models and Benchmarks
The experiment evaluated seven popular LLMs, including Gemini 2.0 Flash, Gemini Flash Lite, GPT‑4o, GPT‑4o‑mini, Claude 3 Haiku, Claude 3.7 Sonnet, and DeepSeek V3. Each model was run through ten benchmark suites covering a range of tasks.
- ARC
- OpenBookQA
- GSM8K
- MMLU‑Pro
- MATH
- NameIndex (custom)
- MiddleMatch (custom)
How Repeating Prompts Improves Accuracy
Transformer‑based causal models generate tokens left‑to‑right, so a single appearance of the question may be far from the initial instruction in the token stream. By copying the entire prompt a second time, the question reappears later, giving the model a second chance to align the instruction with the query. This “reinforcement” effect boosts answer quality without altering model internals.
Why Non‑Reasoning Models Benefit Most
Non‑reasoning models prioritize speed and lack internal chain‑of‑thought processing. They do not naturally restate the problem, so external repetition provides the missing rehearsal step. In contrast, reasoning‑oriented models already generate internal rephrases, making additional prompt copies largely redundant.
Practical Implications for Developers
Implementing the technique is straightforward: duplicate the user’s request in the prompt payload. Because the token count only doubles the original text, API usage fees and inference latency remain essentially unchanged. This makes the method ideal for high‑throughput applications such as customer support, data extraction, and real‑time recommendation.
Limitations and Future Directions
The observed gains apply primarily to non‑reasoning models and tasks that can be answered directly from the prompt. Complex multi‑turn dialogues or deep reasoning challenges may not see the same improvement. Ongoing research will explore how other prompt‑structuring tricks—such as delimiters or JSON formatting—interact with repetition.
Bottom Line
In a landscape where model upgrades often require substantial compute, a simple copy‑paste of the prompt can lift accuracy by up to 76 points. Repeating the prompt twice offers a cost‑free, low‑effort boost that developers can adopt immediately to extract more performance from existing LLM deployments.
