Demographic Persona Conditioning Is Necessary for Signal
LLMs prompted without demographic personas produce higher distributional similarity to human surveys, but dramatically lower predictive signal.
Without persona details, correlation attainment drops from ~90% to ~50%. The distributions look right (high KS similarity) but the rankings are wrong. Models rate everything more positively and fail to differentiate between products.
The persona information gives the model something to condition on. Without it, responses converge toward a generic positive sentiment that matches aggregate human positivity bias but loses the discriminative power needed for concept ranking.
This has implications beyond survey simulation: when LLMs need to differentiate between options rather than just generate plausible-sounding output, they need context to anchor their reasoning.
Age and income conditioning replicate human patterns well. Gender, region, and ethnicity show inconsistent effects, suggesting training data has uneven coverage of these demographic-behavior relationships.
Related:, 07-molecule—elicitation-design-principle