Duolingo, reCAPTCHA, and a magnificent piece of crowdsourcing

Luis Von Ahn is a computer science professor at Carnegie Mellon University, but is perhaps best known as the creator of the free language learning app Duolingo, Apple’s 2013 iPhone app of the year.

Interestingly, Luis Von Ahn was part of the team that created CAPTCHA:

In the early years of his Ph.D. study, von Ahn had helped his advisor, CMU computer science professor Manuel Blum, develop a handy identity verification device known as a CAPTCHA. Think of those distorted words you’re asked to translate after attempting to log into your email too many times to verify that you’re human. Those are CAPTCHAs. Initially invented to help keep spambots out of chat rooms, these tests are effective because computers have a difficult time reading distorted text, while people are rather good at it.

What Von Ahn did next was a real stroke of genius:

Von Ahn watched the work on CAPTCHA and decided it had potential beyond distinguishing humans from robots — the extra 10 seconds people were taking to access their email and other accounts could be put to use. In 2006, von Ahn launched reCAPTCHA. Unlike its predecessor, reCAPTCHA challenged users with two distorted words to decode.

You’ve certainly run into a CAPTCHA that featured two words instead of words. That’s a reCAPTCHA and here’s why:

The brilliant twist is that this test isn’t just verifying your humanity; it’s also putting you to work on decoding a word that a computer can’t. The first word in a reCAPTCHA is an automated test generated by the system, but the second usually comes from an old book or newspaper article that a computer scanner is trying (and failing) to digitize. If the person answering the reCAPTCHA gets the first word correct (which the computer knows the answer to), then the system assumes the second word has been translated accurately as well.

So the whole purpose of that second word is not to keep you out, but to have you help validate that a word scanned by the system is, indeed, what they think it is. A gigantic, free crowdsourcing effort.

In 2009, Google acquired reCAPTCHA for an undisclosed amount (von Ahn says the sum was somewhere between $10 million and $100 million) and put the program to work on a tremendous scale, digitizing material for Google Books and the New York Times archives. In 2012, it was translating about 150 million distorted words a day.

“The CAPTCHA was really my idea,” says Blum. “Getting humans involved and getting them to help do this stuff was Luis’s idea. He was the one that pointed out, ‘Look how many hours have gone into building the Panama Canal or the Pyramids — and with all the people that are on the Web now, you can get a lot more hours.'”

Genius. Great article.