Semantic Similarity Rating (SSR)

A method for converting unstructured text responses into structured Likert-scale distributions using embedding similarity.

How it works:

  1. Elicit a free-text response from an LLM (or human)
  2. Embed the response using a text-embedding model
  3. Compare to pre-defined reference anchor statements (one per scale point)
  4. Compute cosine similarity between response and each anchor
  5. Generate probability distribution over scale points proportional to similarity scores

The key insight: a textual response rarely maps to exactly one rating. “I’d probably buy it, the price isn’t too bad” could be a 4 or a 5 depending on interpretation. SSR captures this ambiguity as a distribution rather than forcing a single value.

This is essentially using embedding space as a semantic bridge between natural language and structured measurement.

Related: 05-atom—direct-likert-regression-problem, 07-molecule—vectors-vs-graphs