Detecting prompt injection in LLM API proxies
Prompt injection in an LLM API proxy means the intermediary may add hidden instructions, identity text, or policy text to your request before it reaches the model. Local detection works by combining token-delta evidence, extraction probes, identity checks, and instruction-conflict tests.
Signals to look for
Unexpected token delta
If a small request produces a much larger prompt-token count than expected, the relay may be adding hidden context.
Prompt extraction
If the model reveals hidden relay instructions under recall, translation, or continuation probes, the relay path is not transparent.
Identity conflict
If a Claude-compatible route leaks GPT, GLM, DeepSeek, Qwen, or other non-claimed identities, model substitution or prompt injection may be involved.
Run the local probe set
python audit.py --key <YOUR_KEY> --url <BASE_URL> --output prompt-injection-audit.md
Why one signal is not enough
Token counts can be noisy, extraction probes can be refused, and identity wording can produce false positives. API Relay Audit records multiple probe families so a reviewer can see whether the evidence is consistent, contradictory, or inconclusive.
Useful follow-up issue
For reproducible examples and detector gaps, track issue #31: relay prompt injection examples and detection gaps.