AppSec leads
Triage real, confirmed findings instead of scanner noise. Every finding ships with a working payload and the request that proved it.
autonomous web app pentesting · open source
Find the exploitable bugs your scanner misses — without paying for another pentest week. SploitAgent is an open-source autonomous agent that drives a Kali toolchain end-to-end and only reports findings it has confirmed.
Built by Ryan Shanahan — SANS SEC542 TA · 3x GIAC · CISSP · CISM
Three audiences, one tool.
Triage real, confirmed findings instead of scanner noise. Every finding ships with a working payload and the request that proved it.
A second operator that runs while you sleep — same Kali toolchain you already use, deterministic proof-gate, sandboxed execution.
Continuous coverage between annual pentests, without hiring a pentest team. Open source, runs on your infrastructure.
The pieces finally fit.
Every sprint, we ship only if the comparator confirms zero regressions. The bar moves up — never down.
Deterministic pipeline wraps the LLM orchestrator — confirmation, capture, and discovery fire without LLM opt-in.
Every release is gated by a comparator that blocks the ship if any KPI drops. Each sprint freezes a Docker image, snapshots a KPI baseline, and ships only when the comparator confirms zero hard regressions — aggregate FQS, total findings, XBEN flag captures, and a per-target floor. Sprint 0 → Sprint 3: FQS +94%, findings +124%, XBEN flags 0/8 → 8/8.
Every LLM-claimed bug is automatically re-fired and stamped CONFIRMED or UNCONFIRMED before it reaches a report. Auto credential capture pattern-matches JWT / Bearer / session cookies from any tool output and re-fires every request authed. Differential replay emits broken access control findings on auth deltas. An out-of-band callback listener (OAST) auto-mints callback URLs on OOB-capable findings.
The agent drives 69 real Kali tools — the same ones your pentesters use — inside ephemeral, non-root containers with frozen scope. Tools span 28 modules: nmap, sqlmap, nuclei, ffuf, dalfox, hydra, msfconsole, plus Burp Suite Pro as an optional service. Containers run NET_RAW only, 512MB, 120s default, no docker socket mount. 5-stage validation pipeline: allowlist · pattern block · scope check · risk classification · OPSEC default injection.
Proven payloads get promoted across runs, so the agent doesn't re-learn what already worked. Four memory tiers — Working · Short-term · Long-term · Muscle — promote payloads NEW → LEARNING → PROVEN (3+ hits). Post-run extraction populates all tiers automatically. Phase-aware 3-tier LLM routing (FAST · BALANCED · BALANCED) across 6 providers — Anthropic, OpenAI, Gemini, Bedrock, Ollama, LM Studio.
An LLM orchestrator dispatches specialized sub-agents while deterministic hooks fire underneath.
Target classification (web / host / domain / CIDR), tech and service probing, deterministic scanners, JavaScript bundle analysis (endpoints, secrets, DOM sinks via parse_js). Fast scanners (ffuf, whatweb) block recon; deferred scanners (nuclei, Burp) run async and drain into the next continuation.
The orchestrator dispatches specialized sub-agents — enumerator, assessor, exploiter, reporter — for parallel vulnerability coverage. Condition-based dispatch fires reactive tools on findings: SQLi triggers sqlmap_test, XSS triggers browser_probe.
Every tool output passes through deterministic post-hooks: JWTs decoded, API discovery dispatched, auth/unauth responses diffed. The auto proof-gate re-fires payload+endpoint and stamps CONFIRMED / UNCONFIRMED. Captured credentials reconfigure the request layer for the rest of the run.
Asyncio tasks during the agent loop: periodic SPA walker (Playwright click + form-fill + hidden-element trigger, 90s cycle), discovery consumer (60s cycle), self-hosted out-of-band callback listener (OAST) for real callback capture. Coverage-aware dedup tracks (endpoint, param, vuln_type) tuples to ensure untested surface gets covered.
SploitAgent is not a replacement for human red-teamers on high-stakes targets, a SOC tool, or a SAST product. It's an autonomous web-app pentester — it finds what an attacker with your scope could find, and proves it.
One email when the hosted runner opens. No newsletter, no drip.