Tuesday, April 08, 2008

reCAPTCHA: Stop Spam, Read Books

I just discovered this site, http://recaptcha.net/. They do the regular "image verification" stuff you see to prevent spam on web sites, except they use scans from real books. They present two words: one for which the result is known and a second for which the OCR had low confidence. If you match the known word, they assume you're probably right on the unknown word, too (and they verify it multiple times to improve confidence). I think it's a great idea: as a side effect of preventing spam, they're digitizing old books. Cool.

No comments: