Basil et al 2025 - Persona Prompting
Citation
Basil, S., Shapiro, I., Shapiro, D., Mollick, E., Mollick, L., & Meincke, L. (2025). Prompting Science Report 4: Playing Pretend - Expert Personas Don’t Improve Factual Accuracy. Generative AI Labs, The Wharton School, University of Pennsylvania. SSRN 5879722.
Source Information
- Authors: Savir Basil, Ina Shapiro, Dan Shapiro, Ethan Mollick, Lilach Mollick, Lennart Meincke
- Institution: Wharton Generative AI Labs / WHU-Otto Beisheim School of Management
- Type: Technical report (fourth in “Prompting Science” series)
- URL: https://papers.ssrn.com/abstract=5879722
Method Summary
Tested expert personas, domain-matched experts, and negative-capability personas across six models (GPT-4o, GPT-4o-mini, o3-mini, o4-mini, Gemini 2.0 Flash, Gemini 2.5 Flash) on two graduate-level benchmarks:
- GPQA Diamond - 198 PhD-level questions (biology, physics, chemistry)
- MMLU-Pro subset - 300 questions (engineering, law, chemistry)
Key methodological features:
- 25 independent trials per question per model-prompt condition
- Temperature 1.0 (production settings)
- Zero-shot prompting (no examples)
- Multiple correctness thresholds (100%, 90%, 51%)
Key Findings
-
Expert personas show no consistent benefit: In-domain expert personas (“you are a physics expert” for physics questions) did not improve accuracy over baseline (no persona) for 5 of 6 models
-
Low-knowledge personas harm performance: “Toddler” prompt reduced accuracy in 4 of 6 models; negative effects scale with implied lack of knowledge
-
Domain matching doesn’t help: Tailoring personas to match question domain shows no consistent benefit
-
Role constraint refusal: Gemini 2.5 Flash frequently refused to answer when given mismatched expert personas, depressing measured accuracy
-
One exception: Gemini 2.0 Flash on MMLU-Pro showed improvements for all five expert personas (appears model-specific, not generalizable)
Limitations Noted
- Subset of available models
- Academic benchmarks may not reflect all real-world use cases
- Specific set of personas across limited domains
- Focused on factual accuracy; personas may serve other purposes
Practical Implications
“Organizations may get more value from iterating on task-specific instructions, examples, or evaluation workflows than from simply adding expert personas to prompts.”
Extracted Content
- 05-atom—persona-capability-dissociation
- 05-atom—asymmetric-persona-effects
- 05-atom—role-constraint-refusal
- 05-molecule—persona-utility-principle
Connection to Other Sources
- Reinforces 07-molecule—elicitation-design-principle from Maier 2025: prompt structure matters, but not in ways commonly assumed
- Complements Kong et al. 2024 (showed some persona improvements) and Zheng et al. 2024 (showed no reliable improvements)
- Part of broader “Prompting Science” series including reports on chain-of-thought effectiveness and prompt engineering contingency