14

Jun

2018

DKIM: Using Cryptography to Protect Your Brand From Fraud

By: June 14, 2018

Have you ever received an email that appeared to come from your bank, colleague, or company you've done business with, only to discover that it was spam? Even to the most discerning eyes, some of these fraudulent emails can look nearly identical to their genuine counterparts. For cybercriminals, masquerading as recognizable individuals and organizations is a tactic that makes profiteering from email scams a very lucrative venture.

In fact, these endeavors are so successful that the FBI's most recent Internet Crime Report lists Business Email Compromise – a scam dubbed "CEO fraud" in which employees are tricked into wiring funds to fraudsters under the guise of an email from their CEO – as the most costly Internet crime of 2016, with losses totaling over $360 million in the U.S. alone:

Source: Internet Crime Complaint Center (IC3)


One oft-used method of deception employed in Internet crimes like CEO fraud and others listed above is email spoofing, a technique commonly used for spam and phishing in which parts of an email (most commonly, the sender address) are falsified to impersonate a well-known or trusted entity. Spammers often disguise themselves as trusted senders because it's relatively easy to do and leads to higher open and click rates.

Unfortunately, having your domain used in spoofed emails can significantly impact your ability to deliver email. This is because recipients' mail servers may simply choose to block all email from a domain if they're unable to distinguish the genuine messages from the counterfeits.

So how can individuals and organizations protect their domain, brand reputation, and deliverability from being sullied by imposter emails? As it turns out, one of the most effective methods for safeguarding against this type of activity is to implement DKIM.

What is DKIM?

The purpose of DKIM – an acronym for DomainKeys Identified Mail – is to prevent a domain name from being used in email spoofing. In short, DKIM allows your domain to disavow unauthorized email by ensuring it's only held accountable for messages that meet the following criteria:

  1. The email was sent by your domain's mail servers
  2. The email was not altered in transit to its intended recipient(s)

For would-be email scammers whose success hinges on making fraudulent emails appear authentic, the inability to use their own servers or modify messages sent by yours are two major obstacles that substantially lower their chances of success. But how exactly does DKIM manage to achieve these results?

How DKIM works

At a high level, DKIM authentication can be summarized as a two-step process. The first step, performed by the sender's mail server, is to generate and apply a digital signature to the message. The second step is the receiving mail server's responsibility, and involves validating the digital signature to confirm the email's authenticity. The concept is similar to identifying check fraud, vis-à-vis handwriting analysis (to catch forgery) and chemical analysis to detect the presence of substances like acetone often used in check alteration (to catch tampering). To fully grasp DKIM's authentication process, however, we need to take a peek at what's going under the hood.

DKIM Signing

Before we talk about how signatures are generated and applied by the sender's mail servers, it's helpful to know what a signature looks like and where to find it. For this purpose, a DKIM-Signature email header is included in every email signed by the sender. This header contains the digital signature and a variety of additional information required for the signing and verification processes, and formats them as a list of key/value pairs known as tags:

A DKIM-Signature header from an email sent by AWeber

There’s a lot of information in that DKIM-Signature header, so let's break it down and discuss the role each tag plays in the DKIM signing and verification process.

For senders, the first step in the signing process is assigning a value to the body hash tag (bh=) in the DKIM-Signature header. As the name suggests, this tag's value is obtained by hashing the body of the email. Hashing is a process that feeds an arbitrary (usually larger) amount of data in to a mathematical function — known as a cryptographic hash algorithm — to produce a small, unique text string that represents the data. If that sounds a little confusing, think of it this way: hashes for data are analogous to fingerprints for human beings. To demonstrate how hashing works, consider the following email message body:

Hashing this message body with the rsa-sha256 algorithm will always produce the following string:

You can verify this using the printf and openssl utilities (tested on Linux and Mac OS X):

At the time of writing, the DKIM RFC only supports two hashing algorithms: rsa-sha1 and rsa-sha256. Of these, rsa-sha256 is preferred because it offers better security (incidentally, this is the hashing algorithm used by AWeber's mail servers). Regardless of which algorithm a sender chooses, it must be declared in the algorithm tag (a=) of the DKIM-Signature header because receiving mail servers need this information to perform DKIM signature validation. Referring again to the DKIM-Signature header above, you can see that the rsa-sha256 algorithm was used by AWeber's mail server to generate the body hash:

The hash generated above for the sample message body from AJ was printed in a format known as the "string representation", simply because that's easier for humans to read. Your mail server will likely use the binary representation of the hash (which can contain unprintable characters), so the body hash is base64 encoded before placing the resulting value in the bh= tag. In the DKIM-Signature header above, that value is:

Next, senders must decide which email headers they want to use to generate the digital signature. Each of these headers will be specified in the h= tag of the DKIM-Signature header. In the header above, you can see AWeber that uses the Date, Mime-Version, Content-Type, From, ToList-Unsubscribe, Subject, and Sender email headers to sign messages sent from our mail servers:

When choosing the list of headers to sign in your emails, it's best to follow the recommendations in section 5.4 of the DKIM Signatures RFC (RFC6376). In addition to helping you determine which headers may/must be signed, they'll also steer you away from headers that are not recommended or strictly prohibited (such as Return-Path and Received, respectively).

As with the message body, these headers are fed to the hash algorithm specified in the a= tag to create a hash. Additionally, however, the value generated by hashing the headers is encrypted using the private half of the sender's DKIM cryptographic key (we'll explore public/private keypairs and their role in DKIM signing/verification in a few moments). The value produced by encrypting the header hash is then specified in the b= tag of the DKIM-Signature header:

DKIM Verification

Once the recipient's mail server receives the message, it runs several tests against the email to determine if it passes or fails DKIM validation. The first such test determines whether the message body has been tampered with in transit (after it left the sender’s mail server and before it arrived at the recipient’s mail server). In short, the receiver must generate a hash of the email body and ensure it matches the body hash calculated by the sender, which is found in the body hash tag (bh=). Receivers need two specific pieces of information to generate this hash, both of which are provided by the sender as tags in the DKIM-Signature header. The first piece of information is the cryptographic algorithm used to create the hash value, which can be found in the hashing algorithm tag (a=):

The second piece of information is the canonicalization, which thankfully is easier to explain than it is to pronounce. In computer science, canonicalization (also called normalization or standardization) is the process of converting data with several possible representations into a standard form (known as its canonical form). Dates, for example, are frequently canonicalized by computer programs into a format the code expects because they can be expressed in many different forms, e.g.:

  • March 2nd, 2018
  • 03 / 02 / 2018 (United States - month/day/year)
  • 02 / 03 / 2018 (Europe - day/month/year)
  • 3 Mar 2018
  • Friday, March 2, 2018

Senders use the aptly-named canonicalization tag (c=) to dictate how the headers and body should be formatted before they are hashed. This helps receiving mail servers ensure they're following the same process as the sending mail server. At the time of writing, section 3.4 of RFC 6376 (the authoritative document for DKIM signature standards) lists simple and relaxed as valid choices. In the canonicalization tag, the value to the left of the forward slash specifies how headers should be canonicalized, while the value to the right indicates how the body should be canonicalized. AWeber uses the relaxed canonicalization for both the headers and body of the email:

Why does AWeber use the relaxed canonicalization algorithm?

The simple canonicalization algorithm indicates a sender will tolerate absolutely no modifications to the signed body and/or headers as the email travels from one mail server to the next on its journey across the Internet to the recipients. The relaxed algorithm, by contrast, allows for minor but common alterations (such as reducing whitespace and re-wrapping long header values) that are considered immaterial to the verification process. The simple algorithm is also computationally less expensive, because the mail server doesn't have to normalize the body/headers as part of the signing process.

In principle, the simple algorithm's guarantee of byte-for-byte equality and reduced computational overhead might seem like the safest, most cost-effective – and therefore best – choice. For bulk senders in particular, that reduction in overhead may even translate into significant cost savings. The reality, however, is that the wide variety of email servers running on the Internet makes it exceedingly difficult to guarantee that some won't wrap long lines or collapse whitespace slightly differently. So even though the simple algorithm may reduce your server/datacenter power bill, that benefit is likely to be accompanied by increased DKIM failures that harm deliverability and inbox placement.

At AWeber, inbox placement is of the utmost importance, so selecting the relaxed algorithm was an easy choice. We're simply unwilling to sacrifice deliverability just to save a few bucks on the power bill.

With this information, the receiving mail server can proceed with hashing and base64 encoding the message body. If the value produced does not match the value in the body hash tag (bh=), DKIM validation will fail. If it passes, the receiver can be confident that the message body has not been tampered with in transit and move on to validating the email's DKIM signature.

The DKIM signature for the email, which is found in the signature tag (b=), helps receivers to determine if the message truly originated from the sender’s mail servers and if whether any of the signed headers were modified in transit. To achieve this, the receiving mail server must be able to verify the signature using the public half of the sender’s DKIM public/private keypair, known as the public key. Section 3.6 of RFC 6376 indicates that the public key must be available in DNS via a TXT record whose domain follows this specific format:

Consequently, the receiver needs to plug the values of the DKIM-Signature header’s domain (d=) and selector (s=) tags into that prescribed format to obtain the DNS TXT record name used to obtain the sender's public key. Let’s take another look at those tags from the AWeber DKIM-Signature provided above:

Substituting the selector and domain tags into the aforementioned DNS record format yields the following domain name:

Performing a DNS TXT query for that domain will return the public key required for signature verification. You can observe what that query returns by using the dig command (the public key is the value of the p= tag):

If you're running Windows and don't have the dig utility installed, you can use nslookup instead:

With the public key in hand, receiving mail servers have the information needed to verify the cryptographic signature (b=) in the email’s DKIM-Signature header. Because public key in the sender's DNS can only verify signatures generated by the sender's private key, this verification proves that the message originated from the sender's mail servers. Furthermore, it guarantees that the message signature was generated using the exact same header names and header values, giving receivers a way to determine whether the message was tampered with in-transit.

Remember: Only the sender should have access to the private key! If the corresponding public key retrieved from DNS fails to verify the DKIM signature, it's likely that either the sender's email servers and/or DNS records are misconfigured, or the email is a forgery and was signed using a private key that is not affiliated with the sender's domain/servers!

Putting It All Together

Building on the information discussed in this article, I thought it would be fun to get some hands-on experience with DKIM validation! This exercise below demonstrates the process followed by receiving mail servers when they're performing DKIM validation on messages sent by AWeber.

Requirements

  • Operating System
    • The commands below were tested on systems running Mac OS X 10.13.3 (High Sierra) and CentOS 6.8, but should work with little to no modification on most modern Mac and Linux systems.
  • Software
    • The openssl binary is required to run several commands in this demonstration.

Exercise

Consider the following email (including headers and body), which I delivered to myself from AWeber's email servers:

In order to verify that the message is authentic, we need to prove that it was both sent by AWeber's mail servers and not modified in transit.

Step 1: Message Body Verification

The first thing we need to do as a receiver is verify the authenticity of the message body. To achieve this, we must follow the same process that the sender did to generate the body hash (bh=) and verify that the value we produce matches the body hash value that the sender included in the message's DKIM-Signature header. The process for generating the body hash is described in section 3.7 of RFC 6376:

Given those instructions, we'll need to extract the following information from our email in order to perform DKIM verification on the message body:

  1. The cryptographic hash algorithm (a= tag) in the DKIM-Signature header:
  2. The message body canonicalization algorithm (c= tag) in the DKIM-Signature header (remember, the body canonicalization algorithm is the value to the right of the slash):
    The body canonicalization algorithm is described in section 3.4.4 of RFC 6376:

  3. The body length limit (l= tag) in the DKIM-Signature header.

    If this tag is omitted from the DKIM-Signature header, the entire message body is subjected to DKIM verification. AWeber does not use this tag.

    Use of this tag poses a security risk, as it instructs receivers to only verify the message body up to a certain length. Including it in the DKIM-Signature header means evil-doers could potentially inject fraudulent content into latter portions of the message body while an email is in transit without triggering a DKIM verification failure. This is why AWeber does not make use of the l= tag.

With this data in hand, we can proceed with the message body verification process laid out in the RFC. First, we must apply the relaxed body canonicalization algorithm to the email body, then hash the canonicalized body using the rsa-sha256 cryptographic algorithm, and finally base64 encode the hash we've generated. This can be achieved by executing the following commands:

Since the result produced matches the value in the DKIM-Signature header's bh= tag, we know that the message body is authentic and hasn't been tampered with in transit!

Step 2: Signature Verification

Next, we must verify that the message signature (b=) is valid. As with the body hash, the process for verifying the signature is described in section 3.7 of RFC 6376:

Given this description, we'll need to extract the following information from our email in order to perform DKIM verification on the message signature:

  1. The cryptographic hash algorithm (a= tag) in the DKIM-Signature header:
  2. The header canonicalization algorithm (c= tag) in the DKIM-Signature header (the header canonicalization algorithm is to the left of the slash):

    The relaxed header canonicalization algorithm is described in section 3.4.2 of RFC 6376:

  3. The DKIM domain (d= tag) in the DKIM-Signature header:

  4. The DKIM selector (s= tag) in the DKIM-Signature header:

  5. Based on section 3.6.2.1 of RFC 6376, we know that the public key used in signature verification is obtained by issuing a DNS TXT query to a domain name that incorporates the selector and domain tag values, formatted as selector._domainkey.domain:



    Additionally, section 3.6.1 of RFC 6376 tells us how the public key is stored and formatted in DNS:

    1. The public key itself can be found in the public key data tag (p=) of the DNS query response:
    2. By default, the public key is a base64 encoded, DER-formatted RSA key

Armed with this knowledge, we can proceed with verifying the DKIM signature:

  1. We must first issue a DNS TXT query for 20171212-2048-0c00cfd4._domainkey.aweber.com to obtain the DKIM public key, which is stored in the p= tag of the DNS response:

    Did you notice that the public key (p= tag) in the DNS record returned by our query was split into two double-quoted strings?

    As described in section 3.3.3 of RFC 6376, DKIM keys can range from 512 bits (weaker encryption) to 2048 bits (stronger encryption). Fewer bits means it's easier for bad guys to crack the key. If they manage to do that, they'll be able to forge messages from your domain and sign the emails with your private key. If this sounds bad, it's because it is! In fact, Google discovered just how vulnerable 512-bit keys are several years ago when someone cracked the google.com DKIM key and began sending spoofed recruiting emails using Google's corporate domain, complete with valid DKIM signatures!

    In order to safeguard our DKIM keys and maintain the highest standards of security, AWeber uses a 2048-bit key. That additional strength, however, means the key's length exceeds the 255-character threshold. To account for this, we leverage the ability to split up long records into multiple strings in accordance with the following provision from RFC 7208:

    3.3. Multiple Strings in a Single DNS Record

    As defined in [RFC1035], Sections 3.3 and 3.3.14, a single text DNS record can be composed of more than one string. If a published record contains multiple character-strings, then the record MUST be treated as if those strings are concatenated together without adding spaces. For example:

      IN TXT "v=spf1 .... first" "second string..."

    is equivalent to:

      IN TXT "v=spf1 .... firstsecond string..."

    TXT records containing multiple strings are useful in constructing records that would exceed the 255-octet maximum length of a character-string within a single TXT record.

  2. Since AWeber uses a 2048-bit DKIM key, the public key in the p= value is represented as two separate strings. We'll need to join these together, then use the openssl utility to base64 decode it, thus producing a DER-encoded public key. The key is then saved in binary form to a file called pubkey:

  3. Next, we'll base64 decode the signature tag (b=) from the DKIM-Signature header and save the result (also in binary form) to a file called signature:

  4. With the public key and signature formatted and saved, it's time to canonicalize the headers listed in the header tag (h=Date:To:From:Subject) using the relaxed header canonicalization algorithm. This means all header names will be lower-cased, and all header values will be unfolded (newlines and tab-indentations replaced with a single space) and stripped of leading/trailing whitespace (except for the ending CRLF):

    Curious as to why the trailing \r\n needed to be removed from the dkim-signature header above? I was too, until I scrutinized section 3.4.2 of RFC 6376 a bit more closely! The operative words are in the first sentence:

    The "relaxed" header canonicalization algorithm MUST apply the following steps in order

    During my first attempt at signature validation while writing this article, I'd glossed over that small detail and failed to follow the steps in order. Since the second step instructed me not to strip the CRLF at the end of unfolded headers, I left the trailing \r\n characters at the end of the dkim-signature header, and validation was failing as a result. Once I followed them in order, however, the fourth step led me to delete all whitespace characters (which includes CRLF's) from the end of lines with unfolded headers:

    • Convert all header field names (not the header field values) to lowercase. For example, convert "SUBJect: AbC" to "subject: AbC".
    • Unfold all header field continuation lines as described in [RFC5322]; in particular, lines with terminators embedded in continued header field values (that is, CRLF sequences followed by WSP) MUST be interpreted without the CRLF. Implementations MUST NOT remove the CRLF at the end of the header field value.
    • Convert all sequences of one or more WSP characters to a single SP character. WSP characters here include those before and after a line folding boundary.
    • Delete all WSP characters at the end of each unfolded header field value.
    • Delete any WSP characters remaining before and after the colon separating the header field name from the header field value. The colon separator MUST be retained.
  5. With our headers canonicalized, we'll use the signature and public key we formatted and saved earlier to verify that the signature in the b= tag of the DKIM-Signature:

    This successful verification is proof that the message was signed with AWeber's private key. It also tells us that the header names and values in the message we received are identical to those that were present in the message when AWeber's mail servers generated the cryptographic signature, thus offering an iron-clad guarantee that the message was not modified in transit!

Conclusion

DKIM is an incredibly deep topic that involves topics ranging from DNS to cryptography. While this article isn’t intended as a comprehensive resource on all of DKIM's myriad facets, I hope it has shed some light on why it's important and how it works. At AWeber, DKIM is only one of the many ways we strive to maximize your inbox placement and thwart ne’er-do-wells.

For even more in-depth reading on the topic, http://dkim.org/ is the authoritative resource for all things DKIM. I would also encourage you to familiarize yourself with SPF and DMARC, a pair of technologies closely related to and frequently implemented alongside DKIM (we use all three at AWeber) to offer even stronger protection against illegitimate use of your domains and brands by spammers.