An open-source MCP-native AI agent that chains exploits, not just lists them.
The Gap Nobody Talks About
Every developer in 2025 has an AI pair programmer. Claude Code writes your functions, Copilot catches your typos, Cursor helps you navigate a codebase you inherited at 9am on a Monday. The tooling for writing software has been completely reinvented.
Security hasn't.
Sure, there are LLM wrappers that will tell you to "check for SQL injection" or generate a generic OWASP checklist. But that's not penetration testing â that's a textbook with a chat interface. Real pentesting is about chaining â finding the leaked API key in a JavaScript bundle, using it to trigger an SSRF, pivoting to cloud metadata, and landing account takeover. It's adversarial reasoning, not search.
numasec is the first open-source project I've seen that's actually built for that adversarial loop, not bolted onto it.
What numasec Actually Does
The pitch is blunt: "Like Claude Code, but for pentesting." That framing is either incredibly confident or a recipe for disappointment. After digging through the repository, I'd say it earns more of it than you'd expect from a 33-star project.
Here's the concrete setup: you clone the repo, install the Python tooling via pip install numasec, build the TypeScript agent layer with Bun, and launch an interactive TUI. You pick your LLM â DeepSeek, Claude, GPT, Ollama, any OpenAI-compatible endpoint â type pentest https://yourapp.com, and the agent takes over.
Under the hood, numasec ships with 33 security tools and 34 attack templates, coordinated by a deterministic planner based on the CHECKMATE paper from late 2024. This is the architectural detail that separates numasec from "I asked GPT-4 to hack this site." The CHECKMATE methodology pins the methodology down deterministically â the AI handles analysis and adaptation, not the attack sequence. That's a meaningful distinction. It means the agent isn't hallucinating a pentest methodology on the fly; it's executing a structured plan with LLM-powered reasoning filling the gaps.
The tool coverage is legitimately broad. On the injection side: SQL (blind, time-based, union, error-based), NoSQL, OS command injection, SSTI, XXE, GraphQL introspection, and CRLF. On authentication: JWT attacks including alg:none, weak HS256, and kid path traversal; OAuth misconfiguration; credential spraying; IDOR; CSRF; privilege escalation. Client and server-side: XSS in all three flavors, SSRF with cloud metadata detection, CORS misconfigs, path traversal, HTTP request smuggling, race conditions, file upload bypass.
Every finding gets a CWE ID, CVSS 3.1 score, OWASP Top 10 category, and a MITRE ATT&CK technique. That's not fluff â that's the difference between a finding that gets filed and a finding that gets fixed.
The MCP Architecture Is the Real Story
Here's what I think most people will miss in a first pass: numasec isn't just an AI that runs security tools. It's MCP-native.
Model Context Protocol is the same extensibility layer that Claude Code and Cursor use. numasec ships its 33 built-in tools over MCP and lets you connect any external MCP server. This means if you've built custom tooling for your internal attack surface â say, a proprietary scanner for your API gateway â you can wire it in without forking the project. Same protocol, same interface.
This is genuinely forward-thinking architecture. Most security automation tools are monolithic and extension-hostile. numasec is betting that MCP becomes the standard for agentic tool composition, and that bet looks increasingly reasonable in 2025.
The stack is a hybrid: Python for the security tooling layer, TypeScript/Bun for the agent runtime. You can install via pip install numasec, pull a Docker image (docker run -it francescosta/numasec), or build from source. The CI is live on GitHub Actions and the release tagging looks active â the latest push was April 2026, so this isn't an abandoned resear