James Stanley - Bot Forensics

Most threat intelligence bots are easy to fingerprint. And trying to be stealthy often makes it worse because imperfect anti-detection methods have extra fingerprint surface area of their own. We run an instrumented honeypot site that collects data on what these bots do, and we've just released an Instant Bot Test so you can see whether we flag your bot without even having to talk to us first.

You may want to see my previous post on this topic for more context on what we're doing.

Since that post we've sold a handful of reports, including to a couple of big names. And we now have a website at botforensics.com to advertise our services.

Anti-detection detection

One of the most interesting things we've learnt is that anti-detection techniques are very rarely successful in preventing your bot from being detected. Our collector site sees only an extreme minority (<0.1%) of sessions that could plausibly be real human users. Far from preventing a bot from being detected, anti-detection measures more often provide specific fingerprints about which bot it is based on which measures are in use. Some of these measures take us from "we think this is probably a bot" to "this is bot XYZ operated by Foocorp", which is kind of an own goal.

If you're going to run a bot with anti-detection measures in place (and you should, otherwise you'll trivially look like Headless Chrome), then you should definitely get a Bot Audit to make sure you aren't leaking any extra signals.

The Puppeteer stealth evasions are a great example of this. Lots of bots are browsing with these evasions applied (we even see bots using them outside Puppeteer), but we can detect the evasions themselves, which often leak more signal than we would expect to see absent the evasions.

We do take a canvas fingerprint because why not, but it turns out to be quite hard to definitively say that a given canvas is a bot unless you have enough data on real user sessions to rule out the possibility that it is a real user. While some people are very worried about canvas fingerprinting, a much stronger bot signal than the canvas fingerprint itself is if we read the pixel data out and it has random pixels in the wrong colour where it should be the same colour all over. And, worse, if we do the same thing twice in a row and get a different answer each time!

We noticed a bot operated by Microsoft that had some very specific identifying features, including references to some of their developers' real names. Microsoft have a fairly reputable bug bounty programme, so I tested the waters by reporting it on MSRC. But after sitting on it for 2 weeks they classified it as "not important" and declined to pay a bounty, so I won't make this mistake again. To Microsoft's credit, they have still not fixed it, which is consistent with considering it not important.

We are in some cases able to detect when bots are running on Kubernetes (thanks Feroz for the idea), and this also reveals some fingerprints that are unique to each Kubernetes cluster. This is a great signal because a.) hardly any real human users are browsing from inside Kubernetes, and b.) if 2 bots are running on the same Kubernetes cluster then it's a fair bet that they're operated by the same company. So far we have seen bots from 3 distinct Kubernetes clusters.

VirusTotal

We've been surprised by how few threat intelligence vendors are running their own fetching. There are 94 vendors listed on VirusTotal, but fewer than 50 genuinely distinct bots fetch our collector pages, so at most only a bit over half of those vendors are actually fetching the sites themselves. The others may outsource their fetching to a common third-party, or else they are simply consulting other threat intelligence vendors and not even doing classification themselves.

If you looked at enough VirusTotal results pages you could probably work out which ones always share the same classification, maybe we should do that.

One of our domains is now blocked on VirusTotal by 7 different vendors:

This is kind of a poor show. You can't classify a site as phishing just because it has "bank" in the domain and the page has a login form.

The litmus test for whether a site is phishing is whether you can name the site it is impersonating, and our collector site doesn't impersonate any real site.

Vexatious takedowns

We received our first takedown notices last week. To be honest, I expected this to happen sooner. The whole project is running on "disposable" infrastructure so that if it gets taken down it won't impact any of our other projects. But it would still be very inconvenient to have it taken down.

The takedown notices were sent to our hosting provider, who forwarded them to us. It's possible they were also sent to our domain registrar, who did not forward them to us but also did not act on them.

Here's the text from the first one:

Hello, We have discovered a Phishing attack on your network. URL: hxxps[:]//REDACTED/ IP's: REDACTED Threat Type: Phishing Threat Description: Banking credential harvesting page detected at REDACTED . The page presents a fake bank login form with a header that references BotForensics Collector Page and botforensics .com, which indicates branding inconsistent with any legitimate bank . The site is hosted on REDACTED infrastructure (IP REDACTED) and registered recently on 2026-02-17 via REDACTED, with privacy-protected WHOIS data . The HTML shows a typical login card for username and password, a Sign In” [sic] button, and scripted UI enhancements, including external scripts and images, plus a dynamic header bar . This combination is characteristic of a phishing attempt intended to harvest user credentials . The domain age is only about 0 .01 years, and the presence of a login form on a brand-tampering page hosted on a known hosting provider strongly suggests malicious intent . Registrar abuse contact is abuse[@]REDACTED and hosting provider abuse contact is abuse[@]REDACTED . Because high confidence phishing has been detected, the page should be reported to abuse contacts and blocked; while there can be legitimate educational use of such content, the page as presented is designed to harvest credentials rather than serve legitimate banking functionality . Domain Registrar: REDACTED ASN: REDACTED This email was sent automatically by QuariShield Automated Analysis. Reports are sometimes verified using AI, while this means reports are mostly valid, there may be some false positives. For more info: REDACTED We are well aware that you may not be able to take abuse reports sent to this email address, therefore if you could forward this email to the correct team who can handle abuse reports, it would be much appreciated. Please note, replies to this email are logged, but aren't always seen, we don't usually monitor this email for replies. To contact us if you have any questions or concerns, please email support@quarishield.com stating your Issue ID REDACTED Kind regards, QuariShield Cyber Security.

(Redactions mine, but yes the text is all run into one like that with no linebreaks).

A few highlights stand out:

The page presents a fake bank login form with a header that references BotForensics Collector Page and botforensics .com, which indicates branding inconsistent with any legitimate bank .

One would think that having branding "inconsistent with any legitimate bank" is evidence that you're not phishing? A phishing site would copy the bank's branding.

The HTML shows a typical login card for username and password, a Sign In” button, and scripted UI enhancements, including external scripts and images, plus a dynamic header bar . This combination is characteristic of a phishing attempt intended to harvest user credentials

Is it really?

hosted on a known hosting provider

What are the chances?

This email was sent automatically by QuariShield Automated Analysis. Reports are sometimes verified using AI

Very interesting. The takedown notices were sent by QuariShield. I emailed the QuariShield contact address and got a reply from the person operating it, and he seems friendly, and has whitelisted my collector page, which is helpful but in my opinion only part of the solution. How many other false positive takedown notices is he going to send for other websites?

From what I have been able to gather, QuariShield grabs URLs from public sources, and uses an LLM agent to classify them and automatically send takedowns.

On the one hand, yeah, it's not working very well yet and has a lot of false positives. On the other hand, just look at how far we've come. If you're running a traditional takedown provider: this is what's coming for you. People are spinning up (presumed) vibe-coded projects that now do fully-automated takedowns for sites that aren't even paying customers.

Conclusion

Your anti-detection techniques may not be as effective as you think. Try our Instant Bot Test to see if we flag your bot (and please let us know how we did). And the lesson from QuariShield is: AI is coming for you.