2.7 Unnecessary Encoding
MIME (Multipurpose Internet Mail Extensions), a protocol that defines how various objects may be embedded in email, is the standard used to create attachments. MIME also provides the standards for encoding attachments for mail transport.
When an attachment is binary (such as a photograph or a word processor document), it must be base64-encoded before email transport. When lines are overly long (as is much HTML code), that text must be quoted-printable-encoded before email transmission. Unfortunately, spammers have adopted these two valuable forms of encoding to help disguise spam email content.
The kind of MIME encoding used in an email message (if any) is specified by the Content-Transfer-Encoding: header:
Content-Transfer-Encoding: base64 Content-Transfer-Encoding: quoted-printable
We will show how to decode base64 in section 11.2, and how to decode quoted-printable in section 11.3. Here we illustrate how they are used to disguise spam email. Consider this example:
Content-Type: text/html Content-Transfer-Encoding: base64 PEZPTlQgZmFjZT0iVmVyZGFuYSIgc2l6ZT0zLjU+DQpDaW5kZXJlbGxhISAgUGV0ZXIgUGFuISAg dWxsIG9mIGl0OyBlYXQgaXQgZXZlcnkgZGF5IGZvciBicmVha2Zhc3Q=
Here, the message content is of type HTML (the text/html), but rather than allow the HTML to be transmitted as is, the spammer has base64-encoded it to make it difficult to recognize. Clearly, you must first decode this base64-encoded text before you can screen it for spam.
In addition to its primary use (encoding long lines for transport), quoted-printable can be used to obscure text so that it is difficult to parse:
Content-Type: text/html Content-Transfer-Encoding: quoted-printable <br> <a href=3D"http://bob.example.com/py/"> <img src=3D"http://amy.example.com/2/"></a>
Here, the internal equal signs have been turned into =3D expressions using quoted-printable encoding. You need to decode them back into equal signs before screening the message for spam content.