Saturday, October 27, 2012

The Spam Zombies and the Captcha Apocalypse

Spam zombies

Most captcha solutions are based on a text rendered hard to read (or to listen to) by adding some noise to it. And most of the time, this text is randomly generated and serves only one purpose: to stop the spam. Too many humans are challenging captchas all over the world at any time to let their brain process useless data.

What can we do to solve that underutilization of brain capacity?

Spam is no ham

Not too salty spam
A captcha is a test used to discriminate between people and computers (programs, bots and spambots alike). The word actually stands for Completely Automated Public Turing test to tell Computers and Humans Apart. In order to pretend to do that, many captchas systems have been developped, using scrambled texts, unsorted pictures, mathematical questions and other niceties.

Captcha advertising

Captcha advertising
Captcha type-in by Solve Media
Yes, they can! No human brain can be let deprived of a good propaganda, only the bots can surf free of ads. Web surfers are more and more ad-blind when they navigate the web, hence the marketers are always creating new banner formats and new ways of intermingling their ads within the true content. The captcha process comes handy to them as the users who use to skim the texts they are offered on the web are now on halt and need to focus on a task. What a better time and place to show them some ad? The result is 12 times better message recall than a banner ad.


We need brains
Well, as you may know, about 8% of male population is color deficient (when only 0.5% of women are affected), and nearly one in seven persons has dyslexia, which means has trouble reading or spelling. The whole spectrum of human disabilities is concerned by captchas, from intellectual disabilities (dyslexia, dyscalculia, ageometria…) to physical ones (mobility, visual or hearing impairment…).
So captchas should provide different ways to interact with them, letting a way to the actual user to select between them, be they about recognizing a written text or an audio part-of-speech, solving a simple mathematical equation, filtering out a shape or a picture…

Captchas with a meaning

There are some free captcha solutions available that can perform tasks instead of just solving puzzles for the sake of it.

reCaptcha is one of them. It helps Google digitize books, newspapers and radio shows. Some numbers to see the extent of it: reCaptcha, that's 200 million captchas solved each day, or 100 million digitized words per day, or 2.5 million books per year.

Asirra helps tagging pictures of cats and dogs. Ain't it cool? Well, at least Animal Species Image Recognition for Restricting Access (yes, it's an acronym) classifies pictures with accuracy, but it goes a bit further than that. These pictures are provided by PetFinder, a site dedicated to find a new home for homeless pets, so the tagging is actually used to help kittens and puppies find a new home and the site get more visibility.
See the research article Asirra: a Captcha that exploits interest-aligned manual image categorization (in pdf) to know more.

Civil Rights Captchas ask the user to take a stand regarding facts about human rights. A situation is described (for instance, “In Kosovo people are tortured in detention”) and the users are asked about their feelings (truly low, light or positive), the feeling word being a bit scrambled.
Available in English and Swedish, that service asks us a deep question about where to draw the line: can we ask the users to think a certain way to perform a specific task (to create an account for a webservice or to leave a comment on a site)?

Consent to your Captcha master

As stated by Jonathan Lung in his series of posts, excerpt from his paper Ethical and Legal Considerations of reCAPTCHA presented at the tenth annual conference on Privacy, Security and Trust (Paris, 2012):
Akin to Marx' notion of class consciousness, unless agents realize that they are being exploited, they cannot act rationally in their own best interests. Thus, if people are not informed how their solutions to reCAPTCHAs are being used, they are not in a position to give consent.
Your Captcha master
There is no freedom in solving a captcha from the user's perspective. Either they solve it and access the service behind that gate, or they don't, and their interaction is lost forever.
Likewise, if the users do not consent to the task performed by the captcha proposed to them, there is no way for them to bypass it.
Furthermore, crowdsourcing tasks through captcha raises questions about monopoly, labour laws and tax: the true price of business is paid by users performing captcha tasks (which are also marketable), some of them may be children, the need to solve the captcha can be seen as coercion, and even slavery, and no tax is paid on it (whereas Amazon Mechanical Turk tasks are taxable, as considered contract work). To know more about it, see the Part V (Legal) of Jonathan Lung's blog entries.

I am not Franz Otto Spamer!

Franz Otto Spamer
(He is!)

What is the point to put all the burden on the user from whom we want to get some interaction? If spambots and other Black Hats want to automatize some features you offer online, you should be able to handle it on the server side. This is where you can apply any heuristic you want to separate the tares from the wheat. It is a matter of both ergonomics and respect for your users.

Les Zombies du Spam et l'Apocalypse des Captchas (in French)
Os zumbis do Spam e o Apocalipse dos Captchas (in Portuguese)
Los Zombis del Spam y la Apocalipsis de los Captchas (in Spanish)


  1. I really enjoyed this intelligent CAPTCHA article. Was so pleased someone has higlighted accessibility which is something that is constantly looked over when in comes to CAPTCHAS. I am dyslexic and it takes me so long to decipher that wriggling CAPTCHA that I do not comment on blogs with them unless I feel it is something REALLY worth saying!
    I have started using CAPTCHA bypass browser extension software called rumola which fills them in for me. It seems to work perfectly!

    1. Rumola is quite an interesting view on captcha, as it crowdsources the captcha solving to humans in quite an invisible way. And as it's installed as a browser extension, you can easily forget the modern sweatshop behind it.

  2. Bitcoins for captchas2/11/13 1:06 PM

    Captchas are useless now as crowdtasking them instead of using AI algorithms works pretty well. Amazon turk, Crowdflower and other companies are payings microcents and even bitcoins to use our brains on them.

    1. This comment has been removed by the author.

    2. So captcha tasks are marketable, and companies are making a profit by giving microtasks to whoever's ready to perform them for tenths or hundredths of cents, and now even in virtual currencies. At least, these (micro)workers do give their consent.