It’s a dark and dangerous world for the innocent web form. With spam messages outnumbering legitimate e-mails (circa 9 times) and web comments by something like 20 to one, an unprotected comment form doesn’t stand a chance of going unabused.

Automated spambots troll the web, filling out forms and spewing pablum about penny stocks, Viagra and the personal proclivities of Paris Hilton.

Habit #1: Leave No Trace

I’ve spent some time observing the behavior of spambots in their natural habitat (it’s a dank place that smells of old beans). A frequent strategy appears to be to have a human visitor go to the site, submit a form, and then script automated hits against that form. Spambots are not terribly bright, but they’re relentless. They submit again and again to the same URL, lobbing the fields and values for which they’ve been programmed.

Big Medium counters this by covering its tracks, never using the same field names twice. Every time you visit the page, all of the field names change. The field names are MD5 hashes of the page’s slug name, its database creation date and a server secret. A semi-obfuscated timestamp is mashed with this field name, creating a 50-digit field name that changes every second.

If the correct combination of field names are not received, the form submission is discarded.

What’s more, these field names are tied to that timestamp I mentioned. Even if once-valid fields are received, the form submission is discarded if the timestamp is more than 12 hours old. For forms older than 30 minutes, you’re prompted to re-submit. So the spambots’ visit-me-once strategy of programming for field names no longer works, at least not outside of the first 30 minutes.

It’s not super-difficult to defeat this technique by analyzing how the form changes from moment to moment, but it does require some custom coding, and as I’ll explain later, the incentives for doing so aren’t very high.

Alas, there are other spambots that don’t follow this strategy at all. These ’bots actually visit the page live and fill out all of the fields on the page, paying no particular heed to the field names. They require a different countermeasure…

Habit #2: Set Traps

These form-filling spambots typically fill out all form fields following simple rules based on field name and field type. These rules are similar, I assume, to the rules used by many browser’s auto-fill features to automatically enter your name, email address, phone, etc.

By creating our coded 50-digit field names, these spambots can no longer use field names to detect which field is which. They have to go on field type alone. This gives us an opportunity to set traps by adding dummy fields to the form. These fields are hidden via CSS from web site visitors, but the typical ’bot is not CSS-aware, so it doesn’t know that these dummy fields aren not meant to be filled.

If a value is submitted in any of these honey-pot fields, Big Medium discards the submission.

To make things a bit more challenging, Big Medium also mixes up the order of these dummy fields with each page build, which means that spambots can’t rely on a fixed order of real forms and dummy forms.

The main drawback here is a potential accessibility issue. For people using alternative browsers that do not support the CSS display:none rule, the hidden fields will not be hidden. Big Medium marks them as “Leave this field blank,” but of course, that’s not a great interface. Should our hapless visitor inadvertently enter a value in one of these fields, we’ve mistakenly bagged a real live human, not a spambot. Oops.

I’m a bit behind on my knowledge of page readers and other alternative browsers, so I’m not sure whether or not this is actually a significant issue. Comments are welcome.

Habit #3: Embrace Indirection

Big Medium’s forms all have submission URLs that point at the content page itself. At the last moment, when the form is submitted, Big Medium swoops in via JavaScript and sets the form to point to the real submission URL. Likewise, the form’s submission method is changed from “GET” to “POST.”

Most spambots don’t grok JavaScript, which means that their submissions will go to the wrong page.

Here again, we have a potential accessibility issue, however. If a visitor has JavaScript disabled in their browser, they can’t submit the form. Big Medium adds a note to JavaScript-disabled pages saying that JavaScript is required to submit the form. That’s not ideal, but in the end, I think requiring JavaScript is an acceptable price of participation if it helps to keep the environment spam-free.

Habit #4: Find the Prey’s Lair

Many spambots never actually visit the page with the form on it. They just submit directly to the submission URL, using their programmed fields. So, another test is to check the source of the form submission. Call it the spambot’s lair.

Spambots typically fib about the page where they’re coming from (the “referrer” URL). These fake referrer URLs are sometimes the site’s homepage URL or some other value unrelated to the page where the form actually lives. Big Medium checks these referrer URLs and discards the form if it smells bad.

Trouble is, even legitimate browsers often provide unreliable referrers, sometimes providing none at all. If no referrer is provided, Big Medium lets the submission slide, so this isn’t a very strong test, but hey, every little bit helps.

Habit #5: Ask Good Questions

Every safari gentleman should be a master of conversation. Big Medium does its best by posing a challenge question in its hunt for spambots. This is a very simple CAPTCHA test to see if the form submitter is human. The challenge question and its expected response can be customized to any site managed by Big Medium. As of this writing, this blog’s challenge question is, “What color is fresh grass?” If spambots are custom-programmed to answer, “green,” I can just change the question, and the spammer will have to come back to the site to adjust.

Half the battle, after all, is just making it hard to spam. Increasing the cost and difficulty of maintaining spambots for an individual site means that the spammer is more likely to move on to easier prey.

The logical flip side is that you should reduce incentives for doing that maintenance. If there’s not much to be gained by doing custom coding for your site, then the spammer will likely move on. Which brings us to…

Habit #6: Don’t Look Tasty

When dodging the slavering jaws of the spambots, it’s best if your site looks like a lousy supper. That means reducing the rewards of getting the spam message onto the site.

Even if spammers make the necessary code changes to help their spambots navigate the obstacles I’ve described above, they may simply decide that it’s not worth the extra work and maintenance if the incentives are too low.

My assumption is that spammers are in the hunt for a good Google PageRank score; it’s about boosting their results in search engines. Big Medium automatically adds the rel="nofollow" tag to all links. That tells Google and other search engines not to index the link, so posts in Big Medium comments get no Google juice.

Admittedly the pace of spam has only increased since the rel="nofollow" tag was introduced by Google and a group of blog software makers a couple of years ago. My optimistic hunch is that new apps that support this tag from the get-go (read: Big Medium) won’t be as big a target. We’ll see.

Habit(?) #7: The Most Dangerous Prey of All: Man

The strategies listed above are primarily aimed at thwarting automated spambots. They won’t do much to slow down a real person determined to share their recommendations for ringtones and rock-bottom mortgage rates. So it’s important to provide some additional protection against human spammers, as well as the handful of clever spambots that might manage to evade all of the other traps and tests.

Analyzing the actual content of the message is the only answer. That’s where Akismet comes in.

Big Medium has built-in support for the Akismet online anti-spam service. The brainchild of Wordpress phenom Matt Mullenweg, Akismet torture-tests comments with hundreds of tests and then tells Big Medium whether or not it’s spam. If a comment reeks of processed ham, Big Medium chucks it into the spam bin instead of posting it to the site. Spam comments are deleted automatically every 15 days.

On top of all of that, of course, you can also choose to review all comments that actually make it through the gauntlet before posting them to the site. That lets you eyeball and approve all of the remaining messages personally.