/ buyer's guide

How to buy AI pentesting.

A practical guide for security leaders evaluating continuous AI pentesting vendors. What to look for, what to push back on, and how to scope a pilot that actually proves value.

What this guide covers

AI pentesting is a category still defining itself. The word "AI" means very different things at different vendors, and the gap between demo and production value is larger than most buyers realize. This guide walks through the evaluation questions that separate vendors who've built something real from ones who've wrapped a scanner in a marketing layer.

The seven questions every CISO should ask

Question 01

Is there a human in the loop?

If not, false positives will drown your team. If yes, ask how often that human actually reviews findings (daily, weekly, or per-report) and who they are.

Question 02

What happens to false positives?

A vendor that can't describe their FP-suppression process is a vendor that ships noise. Look for named people, documented review steps, and a published FP rate.

Question 03

How is it safe in production?

"Trust us" is not an answer. Ask about scope enforcement, write simulation, exfil boundaries, and the specific actions the tool will and won't take without human approval.

Question 04

What does a real report look like?

Ask for a redacted sample. Look for named reviewers, CVSS breakdowns, repro steps, compliance mapping, and fix guidance. Not just a CVE list.

Question 05

How do retests work?

A retest should be fast, free, and automatic on a fix. If retests cost extra or require a new SoW, you're paying for the same bug twice.

Question 06

What about business logic?

Scanners can't find logic bugs. Good AI can find some; great AI + humans finds most. Ask for three concrete examples of logic flaws the vendor found on similar apps.

Question 07

Will it train on our data?

A firm "no" is table stakes. A vague "we may use aggregated, anonymized data for improving the model" means your findings are now their training set.

Bonus · How to pilot

Scope small, verify everything

Run a 4-week pilot on a single app. Compare findings with your last traditional pentest. Count false positives. Time the retest cycle. If the vendor can't handle scrutiny at that scope, they can't handle production.

Red flags to watch for

A few specific warning signs we've seen from vendors in this space: unverifiable benchmark claims ("we found 10x more bugs"); no named humans on engagements; a demo that only shows pre-recorded output; pricing that scales per-finding rather than per-engagement; inability to produce a sample report under NDA.

How CredShields answers each of these

We've written up our answers to each of the seven questions. Ask us for the document directly at [email protected], or read the platform tour and FAQ, which cover most of the ground in public.

/ invite-only

Ready to scope a pilot?

We'll help you run a 4-week evaluation against a real surface. You score it. If it works, we keep going.

Request access