CAPTCHAs are software scripts that are used to test the incoming traffic on digital platforms to distinguish between human and non-human traffic.
The common CAPTCHA tests require users to either tick a checkbox, type correct letters from a distorted text image, or identify a specific type of picture from a set of images displayed.
What is a CAPTCHA?
In recent times, the number of users accessing digital platforms – websites, apps, consoles, and mobile – has increased exponentially. This large influx of users is good news for digital businesses looking to expand their consumer base.
However, there are bad actors who take advantage of this increase by trying to blend in with good users and gaining access to business networks for numerous criminal activities. To achieve scale, attackers use bad bots and scripts that can execute attack commands in no time.
CAPTCHA, an acronym for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’ is also known as a type of Human Interaction Proof (HIP). CAPTCHA tests assign easy-to-complete tasks, which humans can easily complete but computers would struggle with. These tests try to identify bot-driven traffic assuming that bots cannot interact with the tests like a human, especially activities such as filling up online forms, writing comments, reviewing products, or participating in polls.
Over the years, CAPTCHA has undergone various iterations but advancements in bot technology have rendered CAPTCHAs nearly ineffective in fighting intelligent bots that can mimic human behavior.
Uses of CAPTCHA
Globally, organizations lose millions of dollars every year to cyberattacks. It is estimated that in 2023, cybercrime costs¹ will reach $8 trillion and reach $10.5 trillion by 2025. To protect their businesses and consumers from bots exploiting their digital platforms, organizations are deploying CAPTCHAs. CAPTCHAs present tests to incoming users to ascertain whether the user is a human or a bot.
Some of the common uses of CAPTCHAs include:
- Maintain poll accuracy: CAPTCHAs prevent bots from voting repeatedly in a poll, which can result in the results of the poll getting skewed.
- Filter out noise: Websites that allow users to post comments use CAPTCHAs to prevent bots from bombarding the comment section with spam and noise.
- Fight new account fraud: At the account creation stage, CAPTCHAs can be used to prevent bots from generating fake accounts.
- Fight ticket scalping: CAPTCHA tests are used to stop scalper bots from hoarding tickets to big events. Often, attackers use bots to purchase several tickets of big events in a matter of a few seconds, and sell them at a premium.
Types of CAPTCHAs
There are several types of CAPTCHAs that are used to engage users on digital platforms. The most common form of CAPTCHA has a distorted image of alphabets and numbers, which a user must correctly identify and type in a text box.
However, based on the manner in which users interact with CAPTCHA tests, they can be broadly classified as:
- Text-based: Users must identify the alphabets and digits from a distorted image and correctly enter into a dialog box on their device screens. If the user is unable to decipher the image, they can request an alternate one.
- Image-based: Here users are presented with a grid of multiple pictures, based on a specific theme, and users must identify the related pictures. These visual challenges are easier than the text-based challenges.
- Audio-based: Since visually impaired users found it difficult to engage with text- and image-based CAPTCHAs, audio-based codes were developed. These CAPTCHAs are used along with text-based codes.
- Math-based: Basic tests based on mathematical calculations—addition, subtraction, multiplication, and division—are presented to the users. They must quickly do the math and enter the correct answer.
How is CAPTCHA generated?
A CAPTCHA test usually has two parts: first comprising a text, image, audio, or math question and the other a text box, where the user types the answer.
While building a CAPTCHA test, programmers use distorted forms of letters and numbers. These characters may be distorted in many ways such as stretching them, breaking their shapes, using different colors, or using dots. This is done to make it hard for bots to solve the CAPTCHA as bots can be trained to scan images and recognize regular shapes of alphabets and digits.
Some CAPTCHA challenges are created by including random strings of letters and numbers. This randomization prevents users from getting the same series of tests twice as well as to make brute forcing difficult. Programmers also create CAPTCHAs that involve pattern recognition, such as presenting a series of shapes and challenging the user to answer the next shape in the pattern.
Techniques used to create text-based CAPTCHAs
While creating CAPTCHA tests, the aim is to ensure humans find these tests easy to solve – with a success rate of at least 80%. On the other hand, bots must have a success rate of 0.01% or less. Further, metadata (information that machines can read but not humans) is excluded from CAPTCHA tests as it can result in bots breaking the CAPTCHA rather easily.
To create text-based CAPTCHAs, several techniques are used. Some of the commonly used techniques include:
- Gimpy: a random number of words are chosen and presented in a distorted manner
- EZ-Gimpy: only one word is used
- Gimpy-r: background noise is added to the selected random letters that have been distorted
- Simard’s HIP: selected letters and numbers are distorted with the use of arcs and colors
How do CAPTCHAs work?
At the most basic level, CAPTCHAs require users to identify characters from a distorted image to prove they are humans. If the user clears the challenge, they are allowed to further interact with the platform, else they are intercepted for further investigation.
CAPTCHAs are simple challenges that are used to prevent bots from interacting with a digital platform. These tests present information which the user must process and provide the correct answer for. This information may be in the form of distorted image, text, an image or a simple mathematical calculation.
The basic premise around which CAPTCHAs work is the difference in the manner in which humans and bots process the information presented to them. For instance, unlike humans who can process infinite information, bots are programmed scripts that can only process the information they have been trained to interact with. Therefore, when they encounter a piece of information they have not ‘seen’ before, they fail.
That said, in recent times, bots have been trained using advanced technologies which makes them capable enough to solve traditional CAPTCHAs with ease. As a result, CAPTCHAs underwent a series of iterations (the reCAPTCHA versions) to make it more difficult for bots to break them. However, continuous developments in bot technology have resulted in intelligent bots that can even mimic human behavior and interact with defense mechanisms in much the same way as humans would do.
Despite several iterations, CAPTCHAs have failed to keep pace with the intelligent bot revolution. As a result, bots are able to bypass most CAPTCHAs rather easily and continue to power several types of bot-driven attacks, namely: credential stuffing, account takeover, fake new account registration, web scraping, token cracking, IRSF fraud, and so forth.
What is reCAPTCHA?
The various iterations made to CAPTCHA challenges by Google are known by their versions called reCAPTCHA. These revised versions include reCAPTCHA v1, reCAPTCHA v2, and invisible reCAPTCHA. These reCAPTCHA versions usually have longer and repeated challenges and also use cookies to ascertain whether a user is a human or a bot.
Some of the ways these versions assess incoming users are as described below:
- Image recognition: These reCAPTCHAs use a directory of images that are labeled category-wise. Users must identify similar pictures or those that are different from a grid of pictures according to the instruction in the challenge
- Single checkbox: Users must click a checkbox on the web page to prove they are not bots. Depending on the user interactions, they may be allowed to proceed without further challenge (No CAPTCHA) else solve a challenge.
- General user behavior: The users face no challenges, instead, depending on their online interactions, they are allowed to proceed or need to solve a challenge for authentication.
When does a CAPTCHA come into play?
Digital businesses are obliged to protect their genuine users from bad actors. Therefore, many digital platforms automatically present CAPTCHAs as a proactive measure to protect their digital assets from bot-driven attacks.
In the other instances, CAPTCHA is triggered when an IP, which a hacker previously used, is detected. CAPTCHA may also come into picture when in addition to a regular internet connection a user is found using a proxy or when a user’s device may be infected and being used for attacks on websites. The most common trigger, however, is when a user behaves in a suspicious manner similar to that of a bot, such as clicking too many hyperlinks too quickly. Other triggers may include a user’s:
- browsing pattern
- key strokes
- mouse clicks
- time spent on a webpage
- time between mouse clicks
- time taken to complete a task (say filling out a form on the website)
Can CAPTCHAs effectively stop malicious bots?
There are several free and nearly free automatic solvers that can beat CAPTCHAs in no time. Using optical character recognition, bots can quickly clear undistorted text CAPTCHAs.
Further, the rise of intelligent bots has rendered traditional CAPTCHAs ineffective. CAPTCHAs are no match to today’s advanced bots that can mimic human behavior and pass on the attack baton to human attackers when confronted with superior anti-bot defenses, such as Arkose Labs. Attackers use a combination of bots and human fraud farms to scale up their attacks by circumventing defenses.
Drawbacks of CAPTCHA
Although CAPTCHAs were created with the aim of preventing bots from abusing digital platforms, they have become ineffective in their goal due to bots acquiring advanced capabilities.
Today, CAPTCHAs are no longer a fool-proof defense mechanism as they have failed to keep pace with the advancements in bot technology. They can be bypassed easily and may even add to the overall security costs. Businesses can no-longer rely on them for the level of protection needed against bots.
Some of the major drawbacks of CAPTCHAs are:
- Limits spam, but is unable to prevent it completely: With bots becoming smarter and acquiring human-like skills, CAPTCHAs at best can only limit basic bots. This means most of the bots can bypass these challenges easily.
- Bad user experience: CAPTCHAs introduce unnecessary friction for genuine users that causes frustration and degrades the user experience.
- Hard for people; easy for bots: The solve rates for bots is increasing whereas more and more people find it hard to solve CAPTCHAs, which can lead them to abandon the interaction altogether.
- Most of the CAPTCHAs are visual-based: This makes it difficult for visually impaired people to attempt them. Although audio CAPTCHAs were introduced as a workaround, most of the audio CAPTCHAs are difficult to understand.
- Blocks good traffic, companies lose money: With bots easily solving CAPTCHAs, they are allowed further access whereas good users are blocked or intercepted. Blocking good traffic can result in companies losing customers and money.
- Limited accessibility: Not all CAPTCHAs are accessible to all users, especially those with visual disabilities or using screen readers.
Even with the upgraded version of CAPTCHA – reCAPTCHA 2 – once the risk associated with an incoming user is assessed, security teams need to choose one of the following four actions:
- Challenge a user to test if they are human
- Hard block a user
- Enforce multi-factor authentication (MFA)
- Allow access to the requested resource
Challenging the user with a reCAPTCHA is of no use, as bots can easily defeat these same old challenges. Hard blocking a user or using MFA introduces unnecessary friction and MFA is an expensive proposition, while granting access may allow attackers to sneak in.
CAPTCHAs are no match to Arkose Labs
Today’s modern businesses cannot afford disruption to their operations and user experience due to bot-driven attacks. They need a robust bot detection solution that can deter bots for good and ensure long-term protection.
Arkose Labs deters all bot-driven attacks, reduces false positives and false negatives, without damaging user experience. Using Arkose Matchkey, a repository of thousands of context-based 3D challenges combined with Email Intelligence for bot mitigation, Arkose Labs accurately identifies bots and deters them permanently.
While good users find the challenges easy and fun to solve, bots and automated scripts fail instantly. This is because the challenges are resilient to automated solvers and being context-based, no amount of training can help machines to clear them at scale. Even fraud farms, click farms, and human sweatshops cannot clear these challenges at scale because the feedback loop between risk assessment and enforcement mechanism prevents them from doing so.
Arkose Matchkey challenges are accessible to all user demographics including, but not limited to: visual, auditory, motor, and cognitive categories. They are compliant with Web Content Accessibility Guidelines (WCAG), making them the best bot-deterrent for businesses across geographical locations.
Unlike CAPTCHAs or its revised versions that provide no visibility out of limited signals, Arkose Labs shares actionable insights which when combined with risk scores, and other digital intelligence enables security teams to make data-backed risk-based decisions and permanently stop bot attacks while maintaining a great user experience.