autonomous security assessment agent
sploit.ai
Point it at a target. It discovers, tests, and exploits
vulnerabilities autonomously — then learns from every run.
Sandboxed in Kali containers, governed by immutable scope policies.
72 registered tools
7 LLM providers
3,500+ tests
104 CTF benchmarks
sploit.ai
$ sploitagent run \
--target http://localhost:3000 \
--objective "Find OWASP Top 10 vulns" \
--profile vuln_lab
✔ sandbox ready
✔ tradecraft — 3 proven payloads loaded
✔ recon: 47 endpoints, 3 technologies
→ dispatching sub-agents
✔ enumerator: surface mapped
✔ assessor: sqli, xss, ssrf, idor, ssti
✔ exploiter: 4 WAF bypasses
! CRIT — SQLi → cred dump → admin takeover
✔ progress: 66 findings, 20 vuln types
✔ dedup: 3 merged, graph: 70 nodes
→ report: findings + attack graph + evidence
Capabilities
Sandboxed execution, immutable scope, governance at every layer.
Orchestrator + sub-agents
An Opus orchestrator dispatches specialized sub-agents
(enumerator, assessor, exploiter, reporter) across 7 LLM
providers. Auto-hooks fire reactively on findings — no LLM
call needed for JWT decode, JS parsing, API discovery, or
CORS checks.
72 tools in Kali sandbox
Nmap, sqlmap, nuclei, ffuf, Burp Suite Pro, and 67 more —
all inside containers with OPSEC defaults. 30 payload
mutation transforms for WAF/filter evasion. Findings map to
ATT&CK and OWASP taxonomies automatically.
Governance + coordination
Immutable scope, 5-stage command validation, credential
isolation. A git-backed progress repo tracks every tested
intent — blocking duplicates and surfacing untested attack
surface across parallel agents.
Learns across runs
Tradecraft playbooks promote proven payloads from NEW to
LEARNING to PROVEN. Deterministic scanners and JS bundle
analysis run before the LLM session. Each run makes the
next one faster and deeper.
Results
From autonomous runs against real targets — no human in the loop.
80%
L1 CTF pass rate
20/25 challenges — up from 59% (+21pp)
67%
L2 CTF pass rate
12/18 challenges — first L2 run
87
Peak findings (one run)
21 vuln types, 21 critical severity
100%
L3 CTF pass rate
2/2 — deserialization + padding oracle
Full benchmarks & methodology →
How it works
-
01
Define scope
Set target, objective, and scope profile. The governance engine freezes scope — it cannot expand mid-operation.
-
02
Recon
Fast scanners map the surface while deferred scanners (nuclei, Burp) run async. JS bundles are analyzed for hidden endpoints and secrets.
-
03
Test + exploit
Sub-agents test in parallel. A progress repo tracks every intent, blocks duplicates, and surfaces untested attack surface. Continuation loops keep testing until coverage gaps close.
-
04
Report + learn
Deduplicated findings with ATT&CK/OWASP mapping, attack graphs, and evidence. Proven payloads promote to tradecraft memory for future runs.
Join the waitlist
Get early access updates.