Google Shows Prompt Repetition Boosts LLM Accuracy by 76%

Repeating a user’s instruction within the same prompt can dramatically increase the correctness of large language model outputs. Recent research demonstrates accuracy gains of up to 76 percentage points on fast, non‑reasoning models, with no extra latency or cost, offering a simple, zero‑cost optimization for developers.

What the Study Tested

Models and Benchmarks

The experiment evaluated seven popular LLMs, including Gemini 2.0 Flash, Gemini Flash Lite, GPT‑4o, GPT‑4o‑mini, Claude 3 Haiku, Claude 3.7 Sonnet, and DeepSeek V3. Each model was run through ten benchmark suites covering a range of tasks.

ARC
OpenBookQA
GSM8K
MMLU‑Pro
MATH
NameIndex (custom)
MiddleMatch (custom)

How Repeating Prompts Improves Accuracy

Transformer‑based causal models generate tokens left‑to‑right, so a single appearance of the question may be far from the initial instruction in the token stream. By copying the entire prompt a second time, the question reappears later, giving the model a second chance to align the instruction with the query. This “reinforcement” effect boosts answer quality without altering model internals.

Why Non‑Reasoning Models Benefit Most

Non‑reasoning models prioritize speed and lack internal chain‑of‑thought processing. They do not naturally restate the problem, so external repetition provides the missing rehearsal step. In contrast, reasoning‑oriented models already generate internal rephrases, making additional prompt copies largely redundant.

Practical Implications for Developers

Implementing the technique is straightforward: duplicate the user’s request in the prompt payload. Because the token count only doubles the original text, API usage fees and inference latency remain essentially unchanged. This makes the method ideal for high‑throughput applications such as customer support, data extraction, and real‑time recommendation.

Limitations and Future Directions

The observed gains apply primarily to non‑reasoning models and tasks that can be answered directly from the prompt. Complex multi‑turn dialogues or deep reasoning challenges may not see the same improvement. Ongoing research will explore how other prompt‑structuring tricks—such as delimiters or JSON formatting—interact with repetition.

Bottom Line

In a landscape where model upgrades often require substantial compute, a simple copy‑paste of the prompt can lift accuracy by up to 76 points. Repeating the prompt twice offers a cost‑free, low‑effort boost that developers can adopt immediately to extract more performance from existing LLM deployments.