Corpus Poisoning Vulnerability

Poisoning just 0.04% of a RAG corpus can achieve a 98.2% attack success rate and 74.6% system failure rate.

This finding from BadRAG research demonstrates that adversarially manipulated passages can serve as semantic backdoors, triggering specific behaviors in LLM outputs even when the base model remains unmodified. These attacks rely on stealthy corpus manipulations that are difficult to detect.

TrojanRAG extends this by showing that backdoors can be embedded in retrieval embeddings themselves, bypassing traditional content sanitization entirely.

The practical implication: any RAG system with an open or modifiable corpus is potentially vulnerable. Defenses like cryptographic document signing or adversarial filtering offer only partial protection. Security must be built into the architecture, not bolted on afterward.

>heyMHK

Corpus Poisoning Vulnerability

Corpus Poisoning Vulnerability

Properties

Graph view

Backlinks