autonomous security assessment agent

sploit.ai

Point it at a target. It discovers, tests, and exploits vulnerabilities autonomously — then learns from every run. Sandboxed in Kali containers, governed by immutable scope policies.

72 registered tools 7 LLM providers 3,500+ tests 104 CTF benchmarks

Capabilities

Sandboxed execution, immutable scope, governance at every layer.

Orchestrator + sub-agents

An Opus orchestrator dispatches specialized sub-agents (enumerator, assessor, exploiter, reporter) across 7 LLM providers. Auto-hooks fire reactively on findings — no LLM call needed for JWT decode, JS parsing, API discovery, or CORS checks.

72 tools in Kali sandbox

Nmap, sqlmap, nuclei, ffuf, Burp Suite Pro, and 67 more — all inside containers with OPSEC defaults. 30 payload mutation transforms for WAF/filter evasion. Findings map to ATT&CK and OWASP taxonomies automatically.

Governance + coordination

Immutable scope, 5-stage command validation, credential isolation. A git-backed progress repo tracks every tested intent — blocking duplicates and surfacing untested attack surface across parallel agents.

Learns across runs

Tradecraft playbooks promote proven payloads from NEW to LEARNING to PROVEN. Deterministic scanners and JS bundle analysis run before the LLM session. Each run makes the next one faster and deeper.

Results

From autonomous runs against real targets — no human in the loop.

80% L1 CTF pass rate 20/25 challenges — up from 59% (+21pp)
67% L2 CTF pass rate 12/18 challenges — first L2 run
87 Peak findings (one run) 21 vuln types, 21 critical severity
100% L3 CTF pass rate 2/2 — deserialization + padding oracle

Full benchmarks & methodology →

How it works

  1. 01

    Define scope

    Set target, objective, and scope profile. The governance engine freezes scope — it cannot expand mid-operation.

  2. 02

    Recon

    Fast scanners map the surface while deferred scanners (nuclei, Burp) run async. JS bundles are analyzed for hidden endpoints and secrets.

  3. 03

    Test + exploit

    Sub-agents test in parallel. A progress repo tracks every intent, blocks duplicates, and surfaces untested attack surface. Continuation loops keep testing until coverage gaps close.

  4. 04

    Report + learn

    Deduplicated findings with ATT&CK/OWASP mapping, attack graphs, and evidence. Proven payloads promote to tradecraft memory for future runs.

Join the waitlist

Get early access updates.