Google Shows Prompt Duplication Boosts LLM Accuracy to 97%

Prompt duplication—typing the exact same prompt twice—has been shown to dramatically raise large language model accuracy, with some tasks climbing from the low‑20 percent range to over 90 percent and peak scores reaching 97 percent. The technique works without added latency or output length changes, offering a simple, cost‑effective boost for many non‑reasoning LLM applications.

Study Design and Tested Models

The research evaluated four major LLM families across roughly 70 benchmark tasks. Each task was run twice: once with a single prompt and once with the prompt duplicated back‑to‑back.

  • Gemini 2.0 Flash and Flash Lite
  • GPT‑4o and GPT‑4o‑mini
  • Claude 3 Haiku and Claude 3.7 Sonnet
  • DeepSeek V3

Variants such as three repetitions (“x3”) and explicit cues (“Let me repeat that:”) were also explored.

Why Prompt Duplication Improves Accuracy

Most causal LLMs process text left‑to‑right, so early context can fade before the model sees the full question. Repeating the prompt gives the model a second pass where the entire question appears after the context, allowing full attention to all tokens. This effectively provides a bidirectional‑like view without altering the model’s architecture.

Best Use Cases for Prompt Duplication

The technique shines on low‑ or non‑reasoning tasks such as information extraction, single‑choice classification, and constraint matching. Tasks that already use chain‑of‑thought prompting (e.g., multi‑step math) see smaller gains because the model already iterates internally.

Practical Trade‑offs and Cost Considerations

Duplicating a prompt doubles input token count, leading to higher API usage fees for token‑priced services. However, inference latency remains unchanged because the longer input is processed in a single forward pass, unlike multi‑call prompting strategies.

Impact on Developers and Enterprises

For applications like document classification, FAQ answering, or simple data extraction, a single line of code that repeats the prompt can deliver measurable quality improvements. Aligning prompt structure with the model’s causal attention pattern extracts more reliable outputs without waiting for next‑generation model releases.

Future Directions

Potential next steps include adaptive repetition—dynamically deciding whether to repeat a prompt based on task difficulty—and combining duplication with other low‑cost tricks such as soft prompting or token‑level weighting. The core message remains clear: a minimal tweak can yield maximal payoff.