autonomous web app pentesting · open source

sploit.ai

Find the exploitable bugs your scanner misses — without paying for another pentest week. SploitAgent is an open-source autonomous agent that drives a Kali toolchain end-to-end and only reports findings it has confirmed.

Join waitlist View on GitHub

8/8 XBEN flags FQS 276 (+94%) OWASP L2 9/11

Built by Ryan Shanahan — SANS SEC542 TA · 3x GIAC · CISSP · CISM

sploitagent

$ sploitagent run \
    --target http://localhost:3000 \
    --objective "Find and exploit OWASP Top 10" \
    --module general --profile vuln_lab

✔ kali sandbox ready · scope frozen
→ recon: tech profile, JS bundles, endpoints
✔ credentials auto-captured (Bearer JWT)
→ auth differential replay → broken access
✔ proof-gate: SQLi CONFIRMED → cred dump
! CRIT — IDOR chain → admin takeover
✔ OAST callback received → SSRF CONFIRMED
✔ 14 findings, 7 distinct vuln types
→ report: JSON + Markdown + attack graph

Who it's for

Three audiences, one tool.

AppSec leads

Triage real, confirmed findings instead of scanner noise. Every finding ships with a working payload and the request that proved it.

Red teams

A second operator that runs while you sleep — same Kali toolchain you already use, deterministic proof-gate, sandboxed execution.

Eng leaders at growing companies

Continuous coverage between annual pentests, without hiring a pentest team. Open source, runs on your infrastructure.

Why now

The pieces finally fit.

LLMs are finally good enough to drive an actual exploitation toolchain — not just summarize scan output.
Manual pentests don't scale to weekly deploys; annual coverage leaves 51 weeks of drift.
DAST/SAST flag potential bugs; SploitAgent only reports what it has confirmed with a working payload.

Results

Every sprint, we ship only if the comparator confirms zero regressions. The bar moves up — never down.

8 / 8 XBEN flag captures Clean sweep — first time in 4 sprints

276 Aggregate FQS +94% vs Sprint 0 baseline (142)

110 Findings · 68 vuln types +124% findings, +89% types vs S0

9 / 11 OWASP L2 sample 82% pass · A02–A08 all green · 2026-05-24

Full benchmarks & methodology →

Capabilities

Deterministic pipeline wraps the LLM orchestrator — confirmation, capture, and discovery fire without LLM opt-in.

We don't ship regressions

Every release is gated by a comparator that blocks the ship if any KPI drops. Each sprint freezes a Docker image, snapshots a KPI baseline, and ships only when the comparator confirms zero hard regressions — aggregate FQS, total findings, XBEN flag captures, and a per-target floor. Sprint 0 → Sprint 3: FQS +94%, findings +124%, XBEN flags 0/8 → 8/8.

Confirmed findings, not maybes

Every LLM-claimed bug is automatically re-fired and stamped CONFIRMED or UNCONFIRMED before it reaches a report. Auto credential capture pattern-matches JWT / Bearer / session cookies from any tool output and re-fires every request authed. Differential replay emits broken access control findings on auth deltas. An out-of-band callback listener (OAST) auto-mints callback URLs on OOB-capable findings.

Real attacker tools, contained safely

The agent drives 69 real Kali tools — the same ones your pentesters use — inside ephemeral, non-root containers with frozen scope. Tools span 28 modules: nmap, sqlmap, nuclei, ffuf, dalfox, hydra, msfconsole, plus Burp Suite Pro as an optional service. Containers run NET_RAW only, 512MB, 120s default, no docker socket mount. 5-stage validation pipeline: allowlist · pattern block · scope check · risk classification · OPSEC default injection.

It gets smarter every run

Proven payloads get promoted across runs, so the agent doesn't re-learn what already worked. Four memory tiers — Working · Short-term · Long-term · Muscle — promote payloads NEW → LEARNING → PROVEN (3+ hits). Post-run extraction populates all tiers automatically. Phase-aware 3-tier LLM routing (FAST · BALANCED · BALANCED) across 6 providers — Anthropic, OpenAI, Gemini, Bedrock, Ollama, LM Studio.

How it works

An LLM orchestrator dispatches specialized sub-agents while deterministic hooks fire underneath.

01

Recon (pre-LLM)

Target classification (web / host / domain / CIDR), tech and service probing, deterministic scanners, JavaScript bundle analysis (endpoints, secrets, DOM sinks via parse_js). Fast scanners (ffuf, whatweb) block recon; deferred scanners (nuclei, Burp) run async and drain into the next continuation.
02

Orchestrator + sub-agents

The orchestrator dispatches specialized sub-agents — enumerator, assessor, exploiter, reporter — for parallel vulnerability coverage. Condition-based dispatch fires reactive tools on findings: SQLi triggers sqlmap_test, XSS triggers browser_probe.
03

Auto-hooks (no LLM opt-out)

Every tool output passes through deterministic post-hooks: JWTs decoded, API discovery dispatched, auth/unauth responses diffed. The auto proof-gate re-fires payload+endpoint and stamps CONFIRMED / UNCONFIRMED. Captured credentials reconfigure the request layer for the rest of the run.
04

Background discovery

Asyncio tasks during the agent loop: periodic SPA walker (Playwright click + form-fill + hidden-element trigger, 90s cycle), discovery consumer (60s cycle), self-hosted out-of-band callback listener (OAST) for real callback capture. Coverage-aware dedup tracks (endpoint, param, vuln_type) tuples to ensure untested surface gets covered.

Honest about what it isn't

SploitAgent is not a replacement for human red-teamers on high-stakes targets, a SOC tool, or a SAST product. It's an autonomous web-app pentester — it finds what an attacker with your scope could find, and proves it.

Get early access

One email when the hosted runner opens. No newsletter, no drip.