Filtering base64 encoded spam

I hate spam, though I get an awful lot of it. About 1/3 of my email is spam, though on a bad day the ratio can be reversed. If you want to see just how much spam I get, I’ve used to have a nice graph of spam. To get rid of it I use a lot of filters. One of these is the Postfix¬†Body Checks¬†feature. This feature allows you to match lines in the body of the email and reject them at the server. I use Perl Compatible Regular Expression (PCRE) matching for the lines.

Recently though, I noticed a lot of spam, usually about Viagra, that was passing through my spam traps. I noticed the emails all talk about a small set of webservers, so I’ll just filter on the urls. It didn’t work.
SpamAssassin in the end gave me the hint.

X-Spam-Status: No, hits=0.0 required=5.0
       tests=BASE64_ENC_TEXT,EMAIL_ATTRIBUTION,HTML_60_70,
           HTML_IMAGE_ONLY_04,MIME_HTML_ONLY,PENIS_ENLARGE,REMOVE_PAGE
            version=2.53

It was base64 encoded email! That’s why my simple PCRE text matches would not work. So I needed to use something else.

This page is about how to filter on base64 text that appears in emails. I have used examples of PCRE and postfix but you can use this anywhere else, with the appropriate adjustments of where the files go and their syntax.

How to filter

A standard filter line in a postfix body_check file looks something like this:

//   REJECT

This is the old iframe hack that some spammers use to sneak URLs into your email.They are nice and clear and we just reject them. All we have to do now is change the stuff between the “toothpicks” // to what we want.

Here’s an example spam I got today, its offering the usual garbage this shonks usually offer. Remember if they don’t advertise ethically, it is often a sign of their entire operation.


In this case, I’ve decided I cannot be bothered getting any emails that advertise stuff on www.sellthrunet.net, I get enough junk already and it’s probably a front for spammers anyway, so I’ll filter on that domain. You need to make the string reasonably long as you are effectively cutting off parts of it.

Debian systems have this program called mimencode, some of you might have mmencode, which is part of the Metamail package. This does the base64 encoding for you.

So all you need to do is take the string you want filtered on, put it into mimencode and then put the resulting string into the postfix configuration. You need to do this three times, deleting a character at the front each time because base64 is done by cutting the strings up into groups of three characters each and you don’t know in advance if the your string is going to start at position 1,2 or 3.

gonzo$ echo -n "http://www.sellthrunet.net/" | mimencode
HR0cDovL3d3dy5zZWxsdGhydW5ldC5uZXQv
gonzo$ echo -n "ttp://www.sellthrunet.net/" | mimencode
dHRwOi8vd3d3LnNlbGx0aHJ1bmV0Lm5ldC8=
gonzo$ echo -n "tp://www.sellthrunet.net/" | mimencode
dHA6Ly93d3cuc2VsbHRocnVuZXQubmV0Lw==

Next you need to remove part of the encoded string at the end. Remember that 3 characters are encoded into 4 symbols. So character one contributes to symbol 1 and 2, two to 2 and 3 and three to 3 and 4. The = means the string was not a multiple of 3 and it needs padding. If the encoded string has no =, you can use it as-is, otherwise remove all = plus one more character at the end of the string. Remember that you are cutting off up to two characters from your regular expression from both ends so be careful it is still meaningful. The last string for example is only matching “tp://www.sellthrunet.ne” which still looks ok.

Finally, you can join the strings using the regular expression “or” symbol. Also be careful to escape any strings that use special regular expression characters. Base64 can have plus ‘+’ and slash ‘/’ which need
escaping with a backslash .

(HR0cDovL3d3dy5zZWxsdGhydW5ldC5uZXQv|dHRwOi8vd3d3LnNlbGx0aHJ1bmV0Lm5ldC|dHA6Ly93
d3cuc2VsbHRocnVuZXQubmV0L)

I have a bypass line in my setup so usually any lines that are base64 encoded are bypassed, so if you have the same thing make sure this line goes before your bypass line or it will never match. We also need to tell postfix to use case sensitive matching because it is base64 hash we are matching and not the real string itself, so we use the i flag after the last slash . The relevant lines in the body_checks file are now:

#
# sellthrunet.net
/(HR0cDovL3d3dy5zZWxsdGhydW5ldC5uZXQv|dHRwOi8vd3d3LnNlbGx0aHJ1bmV0Lm5ldC|dHA6Ly93d3cuc2VsbHRocnVuZXQubmV0L)/i REJECT Spamvertised website
# don't bother checking each line of attachments
/^[0-9a-z+/=]{60,}s*$/                OK

To test it, I use pcregrep and mimencode again, on the mail file. This will show in clear text the spamming line and gives you an idea that it should work.

$ pcregrep 'dHRwOi8vd3d3LnNlbGx0aHJ1bmV0Lm5ldC' /var/mail/csmall  | mimencode -u
http://www.sellthrunet.net/pek/m2b.php?man=ki921">&ltl;im//www.sellthrunet.net/pek/m2b.php?man=ki

One thought on “Filtering base64 encoded spam

Comments are closed.