Basil et al 2025 - Persona Prompting

Citation

Basil, S., Shapiro, I., Shapiro, D., Mollick, E., Mollick, L., & Meincke, L. (2025). Prompting Science Report 4: Playing Pretend - Expert Personas Don’t Improve Factual Accuracy. Generative AI Labs, The Wharton School, University of Pennsylvania. SSRN 5879722.

Source Information

  • Authors: Savir Basil, Ina Shapiro, Dan Shapiro, Ethan Mollick, Lilach Mollick, Lennart Meincke
  • Institution: Wharton Generative AI Labs / WHU-Otto Beisheim School of Management
  • Type: Technical report (fourth in “Prompting Science” series)
  • URL: https://papers.ssrn.com/abstract=5879722

Method Summary

Tested expert personas, domain-matched experts, and negative-capability personas across six models (GPT-4o, GPT-4o-mini, o3-mini, o4-mini, Gemini 2.0 Flash, Gemini 2.5 Flash) on two graduate-level benchmarks:

  • GPQA Diamond - 198 PhD-level questions (biology, physics, chemistry)
  • MMLU-Pro subset - 300 questions (engineering, law, chemistry)

Key methodological features:

  • 25 independent trials per question per model-prompt condition
  • Temperature 1.0 (production settings)
  • Zero-shot prompting (no examples)
  • Multiple correctness thresholds (100%, 90%, 51%)

Key Findings

  1. Expert personas show no consistent benefit: In-domain expert personas (“you are a physics expert” for physics questions) did not improve accuracy over baseline (no persona) for 5 of 6 models

  2. Low-knowledge personas harm performance: “Toddler” prompt reduced accuracy in 4 of 6 models; negative effects scale with implied lack of knowledge

  3. Domain matching doesn’t help: Tailoring personas to match question domain shows no consistent benefit

  4. Role constraint refusal: Gemini 2.5 Flash frequently refused to answer when given mismatched expert personas, depressing measured accuracy

  5. One exception: Gemini 2.0 Flash on MMLU-Pro showed improvements for all five expert personas (appears model-specific, not generalizable)

Limitations Noted

  • Subset of available models
  • Academic benchmarks may not reflect all real-world use cases
  • Specific set of personas across limited domains
  • Focused on factual accuracy; personas may serve other purposes

Practical Implications

“Organizations may get more value from iterating on task-specific instructions, examples, or evaluation workflows than from simply adding expert personas to prompts.”

Extracted Content

Connection to Other Sources

  • Reinforces 07-molecule—elicitation-design-principle from Maier 2025: prompt structure matters, but not in ways commonly assumed
  • Complements Kong et al. 2024 (showed some persona improvements) and Zheng et al. 2024 (showed no reliable improvements)
  • Part of broader “Prompting Science” series including reports on chain-of-thought effectiveness and prompt engineering contingency