The Thought-Action-Observation Pattern

Context

LLM agents need to both reason about tasks and take actions in environments. Pure reasoning hallucinates. Pure action-taking lacks strategic coordination.

Problem

How do you structure agent behavior so that reasoning informs action and action informs reasoning, without the overhead of explicit reasoning at every step?

Solution

Interleave three types of content in the agent’s trajectory:

Thought: Internal reasoning that doesn’t affect the environment. Used for decomposing goals, tracking progress, handling exceptions, synthesizing observations, and determining next steps. Thoughts update the agent’s working context without triggering environment responses.

Action: Task-specific operations that affect the external environment. Search queries, navigation commands, object manipulations. Actions produce observable consequences.

Observation: Environment feedback from actions. Search results, state descriptions, error messages. Observations provide grounding that prevents reasoning from drifting into hallucination.

The key insight: thoughts are “free” (they don’t cost environment interactions, but they update the context that conditions future action generation.

When to Use

  • Dense pattern (thought-action-observation every cycle): Knowledge tasks requiring verification, multi-hop reasoning, fact synthesis
  • Sparse pattern (thoughts only at decision points): Sequential decision tasks with many routine actions, where strategic reasoning matters at inflection points

Consequences

Positive:

  • Interpretable trajectories, humans can inspect reasoning and distinguish internal vs. external knowledge
  • Reduced hallucination through external grounding
  • Flexible reasoning density based on task structure
  • Human-editable behavior through thought modification

Negative:

  • Structural constraints can cause reasoning loops (47% of failures)
  • Requires well-designed action spaces
  • Prompt engineering for thought placement is task-specific

Related: 07-molecule—act-to-reason-reason-to-act, 05-atom—reasoning-grounding-tradeoff, 07-molecule—ui-as-ultimate-guardrail