If you have ever filled out a form online you have probably encountered a CAPTCHA. They come in many shapes and sizes. I am going to detail methods I’ve used to block bots from taking advantage of forms and their pros and cons. If you use a technique that’s not below just comment, the more options the better. Validating humanness should not be hard, why do some people make it that way?
reCAPTCHA
Solve a CAPTCHA and help digitize books. reCAPTCHA is the human extension of a project designed to digitize old textbooks. While scanning often there is words that the ORC scanner cannot recognize. These words are then pumped into a central repository where they are randomly displayed on screen for a human to decipher. Each reCAPTCHA has two words, one known and one unknown. If you get the known word right it is assumed you got the unknown word correct. If at least two people get the unknown word correct then it is marked as deciphered. Simple concept and for a great cause.
The biggest complaint about reCAPTCHA or CAPTCHA in general is the complexity of the images produced, sometimes almost unreadable by humans (especially if you have used ticketmaster.com). At least reCAPTCHA has the ability to refresh and grab a new pair of words or even read the words aloud. If you are forced to use a CAPTCHA, reCAPTCHA is your best option in my opinion
The biggest issue in my book with reCAPTCHA is Javascript is required to show the image. Although most bots don’t use Javascript neither do some legitimate users. Not only ones with disabilities but also ones on mobile devices.
Simple Questions
What is 3+2? If you can answer a simple math problem you must be human. This method relies more on someone comprehending a sentence and placing the right string in the form field. It can range from simple math to just repeating a word.
It’s great because the only programming needed is to validate a field is submitted with specific information. The down side is when someone cannot answer the question. Making questions too difficult or obscure can result in frustrated users.
Hidden Field
Bots rarely render their forms in a browser so hiding a field with CSS (or Javascript) allows you to determine if the form was submitted without being looked at. Only someone in a browser would not see a specific field, so if that field has content its likely filled in with a bot and you can ignore it.
Bots often fill fields with junk or things that look like text and try to match up the “name” attributes with almost valid data so “first_name” might result in an actual name. But they cannot resist a good type=”text” field since it allows them a great deal of space to inject HTML or BB code. Hidden text fields work great with they are type=”text”. Downside of this approach is someone viewing your page with CSS turned off might get confused to see a field with no label just sitting there.
Check Referrer
Did the form submission come from your site? If not do you care about its contents? After a form is submitted checking where it was submitted from is as easy as $_SERVER['HTTP_REFERER'] with PHP and Request.ServerVariables(“HTTP_REFERER”) with ASP (I think, I despise ASP).
Bots often will pull down your form and submit it remotely which creates a blank referrer. No referrer often means the form was not submitted through your site. The plus side of this method is the user does not have to do anything extra, the downside is you have to be careful when throwing away data, it might be legitament.
Confirmation Page
Is this information correct? A confirmation page is probably one of the best ways to supply a form to a user and verify if they are human while still making them feel comfortable with the information they submitted. Bots almost certainly don’t analyze the “Thank You” page so if your form goes to a confirm page where they have to click another button to confirm and submit their information it will more than likely not be compromised by a bot. The only down side to this method is the user has to click a Submit button then a Confirm button, there is a likelihood you loose someone in the middle.
Just Deal With It
You could also do what most people do and just deal with it. Make sure your data has to go through an approval process before bring posted publicly or sent via email. This will ensure a human verifies the information on the form was submitted by a human. Too bad this is the most effective and time consuming option, there is no better way to spot a human than with a human (currently).
Some forms like login’s don’t interrupt your daily routine when captured by bots but a full inbox all the time can really get on your nerves. None of the methods above are foolproof so some spam may still get through but any one of the above is better than nothing. Don’t be left with an open form for bots to clog the internet, inbox and databases with.
Lets not forget about accessibility
While all these options are fine and dandy the first two require extra work on your users end. This throws the red flag to make sure they are accessible. Opting for an option that does not require any extra work on the users end is best but can lead to a higher spam rate. Weigh your options and determine which method is best for you, each form is unique and one solution cannot work for all situations.
Main image from XKCD.
The content of this post is licensed: ©2009 All Rights Reserved
















Pingback: Link bomb - June 3rd | Web Communications at Florida Intnernational University
Pingback: Computer Ninja’s Weblog » Do we really need CAPTCHA?
Pingback: How To Make a Contact Form with PHP, JQuery, HTML and CSS | Japo32