We use AI in the triage layer of our pipeline. We do not use it to make final severity calls, write reports without human review, or replace manual exploitation. The reasons for those boundaries are worth spelling out — they are most of what determines whether AI in security is a force multiplier or a liability.
#Where it earns its keep
AI does three things well in our pipeline: deduplication across heterogeneous tool outputs, severity-band scoring against historical labeled data, and natural-language summarization of long evidence trails into a working hypothesis a human can verify.
- ▸Deduplication: collapse 14 scanner outputs claiming the same root cause into a single record.
- ▸Scoring: rank candidates by likely exploitability given asset context — feature inputs include CVSS, asset criticality, blast radius, and historical patterns from labeled past findings.
- ▸Summarization: produce a one-paragraph hypothesis a human tester can confirm or refute in minutes.
#Where the human still decides
Final severity, PoC validity, and whether something gets a CVE filed are all human-decided. Anything that escalates outside the customer's account, anything that requires interpretation of business impact, anything where a wrong answer creates reputational risk — those stay human.
AI proposes. Humans dispose. We have not yet seen a model that handles severity calls under adversarial framing as reliably as a senior tester.
#What goes wrong
Two failure modes worth flagging. First, models confidently mislabel novel-shape findings as known-class duplicates — the syntactic similarity is high, the semantic difference matters. Second, models occasionally hallucinate PoC code that looks valid but does not actually exploit. Both are caught by the human verification layer, but both are why we run the human verification layer in the first place.
#Measuring impact
#Where this is going
The honest answer is: we don't know exactly. The capability curve is steep. What we do know: the boundary between 'AI proposes' and 'human disposes' will keep moving — and the discipline is to move it deliberately, with measurement, rather than reactively, in response to vendor pressure or hype.
Writing about modern penetration testing, continuous security, and the operational details of running offensive work at scale.