Microsoft AI Red Team updates agentic AI failure taxonomy with seven new categories from 12 months of operational red teaming
Microsoft's revised taxonomy v2.0 identifies seven novel failure modes in agentic AI systems, grounded in 12 months of red team engagements against deployed agents and driven by rapid mainstream adoption of open-source frameworks, MCP ecosystem maturation, and computer-use agent production deployment.
Attack Brief
TargetAgentic AI systems, open-source agentic frameworks (OpenClaw), Model Context Protocol (MCP) ecosystem, computer-use agentsVectorSupply chain compromise via plugin registries and MCP servers; goal hijacking through adversarial natural-language instructions; inter-agent trust escalation; visual attacks on computer-use agents; session context contamination; MCP/plugin protocol abuse
Technical Details
CVE IDsCVE-2026-25253AffectedOpenClaw (launched January 2026, 336,000+ GitHub stars, 2,100+ agents deployed within 48 hours); MCP-related software (99 CVEs published in 2025); 1,800+ exposed instances leaking API keys and credentials within first week of OpenClaw launch; 336 malicious plugins identified in skills marketplace
Impact
Confirmed Damage512 vulnerabilities identified in OpenClaw security audit including one-click RCE via WebSocket hijacking; credential theft via malicious plugins masquerading as trading bots; tool poisoning moved from theoretical to live attack surface
Mitigation
DetectionRed team engagements identified patterns for detection of goal hijacking, inter-agent trust escalation, visual attacks on computer-use agents, and session context contamination; MCP/plugin protocol abuse detection guidance
Context
Previous CampaignsMicrosoft AI Red Team published Taxonomy of Failure Modes in Agentic AI Systems v1.0 in April 2025; v2.0 update incorporates 12 months of operational findings from red team engagements