OAuth Isn't Enough for AI Agents — We Scanned 7 Frameworks to Prove It

March 12, 2026 · Jason Shotwell · 6 min read

OAuth was built for humans clicking login buttons. AI agents are a completely different trust model.

We ran AIR Blackbox v1.2.0, an open-source EU AI Act compliance scanner, against seven major AI agent frameworks: OpenAI Agents SDK, LangChain, CrewAI, Phidata, GPT Researcher, LlamaIndex, and Mem0. We checked five specific patterns that determine whether an agent system tracks who authorized what and what the agent actually did with that authorization.

The results were worse than expected.

The Problem Nobody Is Talking About

A user connects their Google account to an AI assistant. Two weeks later, the agent has sent 47 emails, created 12 calendar events, shared 3 documents, and booked a flight. OAuth says all of this is fine. The token is valid. The permissions were granted.

But the system cannot answer basic questions. Which agent performed these actions? Did the actions match the original intent? Would the user still approve them right now? And if something went wrong, can anyone reconstruct the decision chain?

OAuth solved identity for humans logging into apps. Agents introduce delegation, and delegation is where things break down.

What We Checked

We built five new checks into AIR Blackbox v1.2.0 targeting the OAuth delegation gap. Each maps to EU AI Act Article 14 (Human Oversight) or Article 12 (Record-Keeping):

Agent-to-user identity binding. Does the code track which user authorized each agent action? If an agent sends an email, is there a user_id attached to that action?

Token scope and permission validation. Before the agent acts, does it verify that the action falls within granted permissions?

Token expiry and revocation handling. Can you kill a rogue agent instantly? If an agent starts behaving unexpectedly at 3am, is there a mechanism to revoke its credentials in real time?

Agent action audit trail. Most frameworks log LLM calls. Almost none log the actions the agent takes as a result — the emails sent, the APIs called, the data modified.

Agent action boundaries. Is there a defined set of tools and actions the agent is allowed to use?

The Results

FrameworkPassWarnFailTotal
Haystack (deepset)2410539
OpenAI Agents SDK2312439
Semantic Kernel (Microsoft)154019
GPT Researcher153018
Mem0136019
DSPy (Stanford)126119

OAuth Delegation: The Detailed Breakdown

CheckHaystackOpenAI SDKGPT ResearcherSemantic Kernel
Identity binding✅ 3 files✅ 7 files❌ Missing✅ 3 files
Scope validation✅ 8 files✅ 32 files✅ 4 files✅ 44 files
Token expiry✅ 12 files✅ 18 files✅ 4 files✅ 32 files
Action audit trail✅ 2 files❌ Missing✅ 4 files✅ 6 files
Action boundaries✅ 1 file✅ 7 files❌ Missing✅ 2 files

The pattern is consistent. LLM call logging exists. Action-level logging does not. Token scoping exists in some. User identity binding is rare. Action boundaries are almost nonexistent.

OAuth says the agent is allowed. Nobody tracks what it does. That is how you get 1,000 emails sent overnight — technically authorized, zero accountability.

Why This Matters Now

The EU AI Act high-risk system rules take effect August 2, 2026. Article 14 requires human oversight mechanisms. Article 12 requires automatic logging of events during operation. If your agent acts on behalf of a user and you cannot reconstruct who authorized it, what it did, and why — you have a compliance gap.

Scan Your Own Code

pip install air-blackbox
air-blackbox comply --scan . -v

18 code-level checks. 5 OAuth delegation checks. Runs entirely on your machine. Apache 2.0.

The EU AI Act deadline is 5 months away.

pip install air-blackbox

GitHub · PyPI