« Non-entropic | Main | Painful realization »

Obfuscation

It is relatively well known that any e-mail address which appears on a website is likely to attract spam. Spammers spider the web looking for strings that look like email addresses, and plug them in to the vile flow. I tested this by using multiple addresses on one of my domains; spam comes almost exclusively to the one I had on my website. Many message-board type websites mangle (or “munge”) the addresses of those who comment in order to keep the addresses from being machine-recognizable; that’s where you get spelled-out things like the addresses on comments at the PHP site, “user at domain dot tld” and the like.

On the other hand, it is considered good form to let people know how to reach you by e-mail, and it is user-friendly to have a clickable link with the mailto:address@domain.tld format, so visitors can just click the link to start a message.

There’s a balance, and it’s created by using spammers’ techniques against them. They frequently duck content filters by sending HTML content in an “encoded” format which is decoded by the mail reader but doesn’t have the magic trigger strings when the filter goes through the plain text. I’ve taken to doing the same thing with email addresses on websites.

I encode email addresses to entities. There are named entities for certain characters, like the ampersand (&) or em-dash (—) but one can use ASCII numbers to encode any character in the standard ASCII set, including numbers and letters and @ symbols. So address@domain.tld becomes address­@­domain­.­tld. This isn’t an email address to a spider combing the page for addresses; however, a browser will render it as though it was plainly typed.

This method only works as long as the spammers’ address-scrapers are relatively dumb. If they start decoding entities, we’re in trouble (again.)

I wrote a little Perl filter to encode these for me. If you’re using BBEdit (and if you’re not, why not?) put this in a file in your /Applications/BBEdit/BBEdit Support/Unix Support/Unix Filters/ folder. To encode a chunk of text, highlight what you want to encode, then go to the #! menu, look under “Unix Filters” and select whatever you named this file. I’m still using BBEdit 6.5, but I’m relatively certain it will still work with BBEdit 8. (Anyone care to send me a copy to test with?)

#!/usr/bin/perl -w

while (<>) {
    for ($i=0; $i < length($_); $i++) {
        $out = ord(substr($_, $i, 1));
        print "\&\#", $out, "\;";
    }
}

As an added bonus, if you run the script from the command line (echo "address@domain.tld" | ./entity_conv.pl) it will display the entities on standard output (with a trailing carriage return entity, &#10;, unfortunately.)

I haven’t a clue how this would work in Windows, and I’m not sure I want one. But it’s Perl, after all; it should be workable somehow.

Now Playing: Capsized from You Were Here by Sarah Harmer

Comments

Check out my orgs website. The tech guys at our funder gave us a cute little java script that also nicely obscures email addresses which still allowing users to reach us. Spiders haven’t figure it out…yet.

Post a comment