LLMs Excel at Decomposition but Struggle with Computation

Large language models are remarkably good at breaking complex problems into sequential reasoning steps but often produce incorrect final answers due to arithmetic errors.

This pattern is well-documented: a model might correctly identify that solving a problem requires subtracting expenses from income, multiplying quantities, and then comparing results, yet still produce the wrong number when executing those calculations. The reasoning chain can be flawless while the computation fails.

The insight: treat decomposition and computation as separable tasks. LLMs handle the first; external tools (code interpreters, calculators) handle the second. This is the core principle behind Program-of-Thoughts and Program-Aided Language Models.

Related: 05-molecule—tool-delegation-pattern, 05-molecule—chain-of-thought-prompting