Semantic Similarity Rating (SSR)
A method for converting unstructured text responses into structured Likert-scale distributions using embedding similarity.
How it works:
- Elicit a free-text response from an LLM (or human)
- Embed the response using a text-embedding model
- Compare to pre-defined reference anchor statements (one per scale point)
- Compute cosine similarity between response and each anchor
- Generate probability distribution over scale points proportional to similarity scores
The key insight: a textual response rarely maps to exactly one rating. “I’d probably buy it, the price isn’t too bad” could be a 4 or a 5 depending on interpretation. SSR captures this ambiguity as a distribution rather than forcing a single value.
This is essentially using embedding space as a semantic bridge between natural language and structured measurement.
Related: 05-atom—direct-likert-regression-problem, 07-molecule—vectors-vs-graphs