Keyword filtering considered harmful
There are three main ways of reducing spam load at the server level. One is the sort of process-based approach that greylisting takes, where the mail administrator takes advantage of spammers’ methods to distinguish their mail from legitimate mail. Another is host-based: the mail admin either identifies previous spam sources (by IP address) or subscribes to a list which attempts to centralize such identification, and rejects any mail originating (or relaying through) those addresses.
The third method is content-based. It surveys the message it has been asked to deliver, and (at its most elegant) evaluates the likelihood of that message being spam, based on its makeup, or (at its crudest) simply rejects a message based on the presence of particular keywords in the headers or body of the message.
The problem with this cruder kind of filtering is that it often fails to account for the chance that a keyword which one person might consider an infallible sign of spam could be part of a legitimate email message for someone else.
Such as, for example, an email newsletter run by one of our authors for others in the field of biological psychology. You know, when your field overlaps with a certain number of psycho-active drugs, odds are good you’re going to get some email which uses keywords often found in pharmacology spam.
Which explains why I’ve got a dozen or so bounces in my admin mailbox. Not enough university mail admins with enough imagination to predict that a neuroscience professor might get legitimate email with “Viagra” in the subject line. I’m debating whether it’s worthwhile to compose a nice message to the various admins pointing out the problem.
Now Playing: Radio Free Europe from Eponymous by R.E.M.