Gibberish Last Names Clogging up Subscriptions

I had been annoyed recently by an increasing number of SPAM subscriptions on my Thaidye.com web site. The script behind the form for these subscriptions and requests for rebate immediate send out a confirmation email to the address entered (as well as to the admin of the site) and add the visitor to a database.

Initially there were a few of those spam entries and I could easily go into the database and manually remove them. But they became more and more numerous, so the first step was to add a link to the email the admin received which allowed for an easy removal of that spam entry.

But finally it got so annoying and time-consuming that I tried to think of a more automated way to handle the spam entries. What they all had in common was

  • a good first name
  • a gibberish last name like NLMAkPJpIVyqCkCeuEh, YijhgzswktJTVWqXhmA orĀ MPSVPkfXInMzFYhEOpp
  • and a good email

The email was probably a good one because there were hardly any bounces for the automatic confirmation emails. That actually bothered me also because these poor recipients got some SPAM apparently coming from me.

Now the quest for me was to find something that all these spam entries had in common so that I could filter them somehow. Unfortunately php does not have a ‘gibberish’ function, so I had to come up with one of my own. Meditating over these entries I finally saw that these spam names often have longer sequences of consonants than would occur in valid names.

With a little bit of help from my friends at Google I came up with the following. With the hope that it might help somebody bothered by the same spammers, here the code snippet to filter those entries:

$first = $_REQUEST[first];
$last = $_REQUEST[last];
$gibberish = preg_match('/[bcdfghjklmnpqrstvwxz]{4,}/i', $first)
          || preg_match('/[bcdfghjklmnpqrstvwxz]{4,}/i', $last);
if (! $gibberish) {
    //do the regular processing
    }
else {
    // pretend every thing went find for the spammer
    // but don't really do anything
    }

Will see how many will slip through – gibberish with less than 4 consonants in a row.

What I am still curious about is ‘WHY’? What the spammer intends with these spam entries. I don’t see any way that could be beneficial to him/her. Discrediting me because the site sents out spam? But why then use gibberish in the last name? I am really curious.