OpenAI and Paradigm Team Up to Stress-Test AI Against Smart Contract Exploits

🤝 A Landmark Partnership at the Crossroads of AI and Crypto Security OpenAI and crypto venture firm Paradigm have joined forces to tackle one of DeFi's most persistent problems: smart contract vulnerabilities. The two organizations have released EVMbench, an open benchmark…

🤝 A Landmark Partnership at the Crossroads of AI and Crypto Security

OpenAI and crypto venture firm Paradigm have joined forces to tackle one of DeFi's most persistent problems: smart contract vulnerabilities. The two organizations have released EVMbench, an open benchmark framework designed to measure how capable AI agents are at finding, fixing, and even exploiting high-severity weaknesses in Ethereum Virtual Machine (EVM) smart contracts. For developers, investors, and anyone with funds locked in DeFi protocols, this is a significant development. It is the first time a frontier AI lab has partnered with a leading crypto venture firm to systematically measure AI's ability to audit the code that secures billions in on-chain assets. The collaboration signals a broader shift in how the industry thinks about security, moving from periodic human audits to continuous, AI-assisted evaluation that can scale with the rapid pace of protocol deployment.

🧪 Three Ways EVMbench Puts AI Agents to the Test

EVMbench evaluates AI agents across three distinct capability modes, each reflecting a real-world security scenario. In detect mode, agents audit smart contract code and are scored on how many vulnerabilities they successfully identify, rewarding recall rather than precision. In patch mode, agents are asked to modify vulnerable contracts to eliminate flaws while preserving the protocol's intended functionality, a task that demands deep understanding of contract logic. In exploit mode, agents attempt to execute fund-draining attacks against contracts deployed in a fully sandboxed blockchain environment, simulating what a real attacker might do. The benchmark draws on 120 curated vulnerabilities sourced from 40 audits, most pulled from open audit competitions on platforms like Code4rena. It also includes scenarios from the security review of Tempo, Stripe's purpose-built layer-1 blockchain for stablecoin payments, adding real-world contract complexity to the dataset. Each task runs in an isolated container, ensuring agents operate in realistic conditions without leaking information between challenges.

📈 The Numbers That Should Get Everyone's Attention

The benchmark results reveal both how far AI has come and how fast it is improving. When the EVMbench project launched, top AI models could exploit fewer than 20% of the critical, fund-draining vulnerabilities drawn from Code4rena competitions. Today, GPT-5.3-Codex via Codex CLI scores 72.2% in exploit mode, a sharp jump from GPT-5's 31.9%. That rate of improvement, from under 20% to over 70% in a relatively short period, is the kind of trajectory that reshapes industries. However, detect and patch modes remain meaningfully weaker. OpenAI noted that agents sometimes stop after finding a single bug in detect mode, and patching is difficult because it requires eliminating subtle vulnerabilities without inadvertently breaking a contract's functionality. These gaps illustrate that while AI is becoming a formidable offensive tool in security research, the defensive capabilities still need significant work before they can be trusted for autonomous deployment without human oversight.

💸 Why Smart Contract Security Has Never Mattered More

The timing of EVMbench is not coincidental. Smart contract exploits have cost the crypto industry staggering sums over the past two years. Cryptocurrency theft reached $3.4 billion in 2025, with $3.1 billion lost in DeFi in the first half of the year alone. Individual incidents have ranged from the $1.5 billion Bybit exchange hack in February 2025 to a $128 million Balancer exploit in November 2025. Even protocols that had undergone formal audits were not immune, as attackers increasingly target the edges of a system, where oracles meet margin calculations or where legacy infrastructure interacts with newer components. For retail investors and protocol users, these losses are more than statistics. They represent real funds wiped out, often with little recourse. A tool that can systematically identify weaknesses before deployment, or even monitor live contracts for unusual activity, would fundamentally change the risk calculus for participating in DeFi.

🛡️ OpenAI Puts Real Resources Behind Defensive AI Security

OpenAI is not just publishing a benchmark and walking away. Alongside EVMbench, the company has committed $10 million in API credits to accelerate cyber defense research, specifically targeting open source software and critical infrastructure. The company is also expanding the private beta of Aardvark, its security research agent, and partnering with open-source maintainers to offer free codebase scanning. These commitments matter because they address one of the most common criticisms of dual-use security research: that publishing exploit-capable tools benefits attackers as much as defenders. By funding defensive applications directly and restricting access to its most capable models for offensive testing, OpenAI is attempting to set a precedent for responsible AI security research in Web3. Paradigm's involvement adds credibility on the crypto side, given the firm's deep relationships with protocol teams and its history of funding foundational infrastructure across the DeFi ecosystem.

🎯 What Developers and Investors Should Watch Next

EVMbench is best understood as a starting point, not a finished solution. It establishes a public, reproducible baseline for AI security capabilities, which means the field can now measure progress in a standardized way. For protocol developers, the immediate implication is that AI-assisted auditing is becoming a realistic component of pre-deployment security checklists, not a distant future possibility. Firms like OpenZeppelin have already reported that AI tools cut auditing time by 50%, and multi-agent approaches where different AI systems collaborate on complex security tasks are emerging as a promising direction. For investors, the benchmark results suggest that the window for human auditors to be the sole line of defense is narrowing. Security-focused infrastructure projects and audit platforms that integrate AI effectively are worth monitoring. Protocols that treat security as a continuous practice rather than a one-time checkbox, and that adopt EVMbench-style evaluations as part of their ongoing operations, are likely to be better positioned to retain user trust and capital as the DeFi market matures.

Sources

https://www.theblock.co/post/390408/openai-and-paradigm-partner-on-ai-agent-tool-for-smart-contract-security https://www.paradigm.xyz/2026/02/evmbench https://decrypt.co/358470/ai-agents-boost-ethereum-security-openai-paradigm-evmbench https://www.cryptoimpacthub.com/defi-exploits-2025-a-record-breaking-year-of-sophisticated-attacks-and-hard-won-lessons/ https://thecurrencyanalytics.com/altcoins/ai-slashes-smart-contract-audit-times-by-half-as-blockchain-security-evolves-243253

Market Munchies and Mode Mobile communications are for informational purposes only, and are not a recommendation, solicitation, or research report relating to any investment strategy, security, or digital asset. All investments involve risk including the loss of principal and past performance does not guarantee future results.

Any information contained in this commentary does not purport to be a complete description of the securities, markets, or developments referred to in this material. The information has been obtained from sources considered to be reliable, but we do not guarantee that the foregoing material is accurate or complete. There is no guarantee that any statements or opinions provided herein will prove to be correct.

Get fresh insights, breaking news, and hidden gems in the world of crypto—delivered straight to your inbox with our Crypto Cookies newsletter. Don't miss out—sign up now and get your first bite of insider knowledge!