Email analysis

Electronic mail (e-mail) is ubiquitous. Regular communication takes place through mail, spam is sent through mail, phishing attacks are carried out through mail, and scammers send letters.

In addition to the information displayed (the text of the letter, the To and From fields), emails also contain headers with technical and meta information. E-mails may pass through several mail nodes before being delivered to the addressee. As a rule, each mail node adds its own headers, so mail analysis can give interesting information – up to the sender's IP address.

How to find out IP by email

In fact, each mail node records the IP address where the letter came from. That is, the sender's IP address is also recorded! This works great if desktop email clients are used (eg Thunderbird, The Bat!). But now many users prefer the web interface for accessing mail (examples are mail.google.com, mail.yahoo.com, mail.yandex.ru, etc). When using the web interface, the web browser passes data (for example, a sent letter) to the mail program, and this mail program sends the letter to the addressee (or intermediate mail host) – and the addressee sees the IP address of the server (for example, mail.google.com server) – in this case, the real IP address of the sender is not included in the headers. The bad news is that web interfaces have become quite popular.

The good news is that attackers (spammers and scammers) do not always use the web interface – they need to send a lot of letters and the web interface is simply inconvenient in this case.

Another good news is that, depending on the structure of the sender's network, even the IP of the local network can get into the headers. Example:

Email may include computer name:

Information about the sender's mail client may be gotten:

Another example of information about sender's mail client:

Fake sender's email address

You need to know that anything can be specified as the sender address – absolutely anything. Including your own email address. That is, the sender's email address can be easily spoofed. Some phishing attacks are based on this.

Spoof IP address of the sender

The IP address and hostname of the previous mail node is recorded by each successive node. Therefore, if you trust the mail node (for example, the letter has already been delivered to the Google server), then the information about the IP of the previous node can be considered reliable. I've seen emails from scammers (one of which we'll tinker with a bit later) that went through one or two untrustworthy hosts before being delivered to a trusted host. For example:

Untrusted node 1Untrusted node 2Trusted node 1Trusted node 2

Information about each node is contained in the header. In this case, you can trust only information about the IP of the host, which is designated as “Untrusted node 2”. Information about the IP address of the “Untrusted host 1” host is also in the headers, but whether it is true, and in general, whether there were any e-mail forwarding servers to the second untrusted host, is no longer possible to say. So, those who are marked in yellow, we can consider their IPs to be reliable. All other data about those who are earlier in the chain can be spoofed (forged).

Email structure

The structure of the email is not rigid and variations are possible. In any case, headings come first. In the simplest case, an email consists of headers and a plain text message:

By the way, as in the case of the HTTP protocol, headers should not be confused with the <head> tag in HTML code or with headers on the page (for example, decorated with <h1>, <h2> tags, and so on. In this case, headers are meta information, which is part of the protocol (Simple Mail Transfer Protocol (SMTP)).

Modern letters usually have a more complex structure. Headers are used to separate parts of a letter:

Content-Type: ...; boundary="..."

The value of boundary is a unique string that is the delimiter for parts of the letter. This separator is usually followed by headers:

Content-Type: ...; charset=...
Content-Transfer-Encoding: ...

They set:

  • content type
  • encoding
  • transmission encoding

Content-Type: …; boundary="…" can be used more than once by setting different delimiters. It turns out something like nested structural elements.

Those parts that are separated by a unique delimiter string are sometimes plain text. Sometimes this is HTML code, which in fact is also text. Files (for example, a document, a photo) can be attached to the letter. In this case, an encoding is used that allows you to translate binary data into text. Below I will show you how to extract files from an email.

Related: Online service for decoding files and strings from Base64

Plain text or HTML code can also be translated into encoding – it all depends on who is sending and his email client.

Since even binary files (if they are attached to the letter) are converted into a “text” form, the source file of the electronic message is a text file that can be viewed in any text editor.

Email headers

Every email message must contain headers. The address of the recipient and the sender are registered in the headers. Depending on the mail program, a letter may contain a different set of headers. Some are present in letters necessarily, some are used very rarely.

Usually email clients do not display headers, to see them, for example, in Gmail, you need to select the “Show original” menu item:

Also you can click “Download message” button to get the original email on your computer’s hard drive.

In Thunderbird, select the letter you are interested in, Click on the “More” button (located in the area separating the list of letters and the contents of the selected letter, and then select “View Source”.

It is important to know that almost everything in the headers can be fake – not real, that is, all data can be fictitious except for the Received: line, which is added by your computer or computers that you absolutely trust.

Some popular email headers will be discussed in more detail at the very end. Now we will dwell on the Received header in detail:

“Received” in email

It has already been said that you should only trust records that are added only by reliable nodes – this applies specifically to the Received: header. That is, you need to make a decision every time whether this record deserves trust or not.

At the top of the email are the most “fresh” headers, the lower the Received: header is, the “older” it is, that is, the earlier it was added.

The typical structure of this header is:

Received: from ... by ... for ...

That is, at the beginning after “from” there is the IP address and host name From which the message was received, then after “by” there is information about WHO received the message, and after “for” information about the email address to whom this message is ultimately intended.

Full header example:

Received: from ns39859.ip-91-121-26.eu (prestashopitaliano.it [91.121.26.53]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sendmail8.hostland.ru (Postfix) with ESMTPS id 28E667A00BD for <al@mi-al.ru>; Tue,
  8 Jan 2019 01:26:51 +0300 (MSK)

It follows that a message was received from the host ns39859.ip-91-121-26.eu. In parentheses is the host name (obtained by inverse resolution from the IP address, as well as the IP address itself in square brackets). This message was received by the sendmail8.hostland.ru host running the Postfix program. Ultimately, this message is intended for the address al@mi-al.ru.

Finding the source IP address of fraudulent emails

We all have emails in our SPAM folder that we can practice on. I decided to choose the most interesting, a letter with a subject:

This account has been hacked! Change your password right now!

The essence of the letter is that I was hacked and all my stuff was stolen. I didn't read much, but according to this letter, I definitely need to send Bitcoins to the provided address. Oh yeah, the funny thing is, to prove that I really was hacked, this letter “was sent from my address”…

It would be very funny if, judging by recent transactions, almost 13 thousand dollars had not come to this wallet… And there are a lot of different wallet addresses in such mailing lists. One user reported that they sent him his email and router password in an email. Some people take such threats (to send damaging captured data) seriously.

We will immediately keep in mind that the IP addresses of hacked computers, devices and servers were almost certainly used. For example, there are quite a few reviews here of those who received this letter with the same wallet address. Some cite information from the headers – it is different everywhere. It's different for me too.

Many found the inscription “Detective” with numbers in the headers – apparently, this is a feature of the software used:

  • Detective26583 (unknown [123.24.206.222]), loft9675.serverprofi24.eu
  • Detective82182 IP address 113.160.165.88
  • Detective12523 (unknown [123.21.91.137]) (Authenticated sender: poohmike@com2com.ru)
  • Detective51686, via mail.inspectraperu.com
  • from [171.224.231.144] (helo=Detective75617) by mail.uos.net.ua with esmtpsa
  • Received: from Detective54508 ([85.41.190.130])
  • (mail.suspicious.org [104.131.63.74]) Detective70638 ( [85.96.174.183]) (Auth. sender: tidepool)
  • Received: from Detective31682 (unknown [41.242.143.42])
  • Detective06782 ([88.202.121.136]) by smtpcmd06.ad.aruba.it
  • X-Source-Sender: (Detective71582) [196.1.125.114]:37299 X-Source-Auth: mark@nmclippers.com

We are just learning to parse the headers – in this case, the found IPs do not belong to the scammer – this is almost certain.

Look at the email headers:

As we remember, the very first headers are located at the bottom, the lowest header is Received:

Received: from [82.193.112.236] (helo=Detective20033) by ns39859.ip-91-121-26.eu with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84) (envelope-from <al@mi-al.ru>) id 1gge8Y-00013H-39 for al@mi-al.ru; Tue, 08 Jan 2019 00:16:50 +0100

Let’s clean up the excess:

Received: from [82.193.112.236] (helo=Detective20033) by ns39859.ip-91-121-26.eu for al@mi-al.ru

That is, initially the letter was sent from the IP address 82.193.112.236 – and this is not surprising, I looked at that router – there is no item in the menu to change the factory password at all! At the same time, the router supports VPN, including as a server – that is, it could easily be used as an intermediate node. The router could have been hacked from the Internet, or the router's Wi-Fi password could have been hacked.

Someone is constantly connecting to the router – I saved the connection log for about the time when the letter was sent.

Look at the following header:

Received: from ns39859.ip-91-121-26.eu (prestashopitaliano.it [91.121.26.53]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sendmail8.hostland.ru (Postfix) with ESMTPS id 28E667A00BD for <al@mi-al.ru>; Tue,
  8 Jan 2019 01:26:51 +0300 (MSK)

Clean up the excess:

Received: from ns39859.ip-91-121-26.eu (prestashopitaliano.it [91.121.26.53]) by sendmail8.hostland.ru (Postfix) for <al@mi-al.ru>

Here sendmail8.hostland.ru is the server that I trust. Thus, this “Received:” string can be considered valid. You can be sure that this letter actually went through the mail node ns39859.ip-91-121-26.eu with the IP address 91.121.26.53. Returning to the very first line with the “Received:” header, it can be either real or spoofed (fake).

There are further headers:

Received: from sendmail8.hostland.ru (sendmail8.hostland.ru. [185.26.123.238])
        by mx.google.com with ESMTPS id 9-v6si51001889ljo.136.2019.01.07.14.26.50
        for <proghoster@gmail.com>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 07 Jan 2019 14:26:50 -0800 (PST)

Then:

X-Received: by 2002:a2e:58b:: with SMTP id 133-v6mr6235012ljf.127.1546900010472;
        Mon, 07 Jan 2019 14:26:50 -0800 (PST)

And finally, the letter is delivered:

Received: by 2002:a19:6750:0:0:0:0:0 with SMTP id e16csp4171758lfj;
        Mon, 7 Jan 2019 14:26:50 -0800 (PST)

But they only show the path of the letter through trusted mail nodes – that is, there is nothing interesting there.

Conclusions: the source of the letter was the IP address 82.193.112.236 or the host ns39859.ip-91-121-26.eu (91.121.26.53). Although the letter was intended for an email address on the *@mi-al.ru domain, it eventually arrived at the Gmail mail server (mx.google.com), and for the address proghoster@gmail.com, that is, the mail was forwarded.

How to extract files from email

Sometimes it is simply impossible to download a file attached to a letter, for example, a file recognized by Gmail as malicious, it simply does not allow downloading from the web interface. To get files in such cases (for example, for analysis), you can download the full mail with headers and extract attachments from it.

By default, when displaying headers, Gmail truncates long emails, so to be able to extract attachments, click on the “Download original” link:

The easiest way to extract the files and view the email is to save it with the .eml extension and open it with any email client (eg Thunderbird, The Bat!).

You can explore and extract attachments manually. An example of headers that describe an attachment:

Content-Type: text/plain; name="Список-Физика.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="Список-Физика.txt"

The name of the attachment file is “List-Physics.txt”, it is encoded in base64. To decode base64, you can use the Linux base64 utility of the same name. To do this, save the strings that encode the file attachment into a separate file, for example named FILE_FROM_ATTACHMENT. Then run a command like:

echo `cat FILE_FROM_ATTACHMENT | base64 --decode` > EXTRACTED_FILE

But base64 encoding is not always used and the described method requires manual analysis and actions, so it's easier to use the mu program.

To install mu on Kali Linux, Debian, Ubuntu, Linux Mint and their derivatives:

sudo apt install mu

To install mu Arch Linux, BlackArch, Manjaro and their derivatives:

sudo pacman -S mu

To display email attachments without extracting them, run a command like this:

mu extract LETTER

In which, instead of LETTER, specify the path to the email file.

For example, the file is called original_msg.txt, then the command is:

mu extract original_msg.txt

Sample output:

MIME-parts in this message:
  1 <none> text/plain [< none >] (0,4 kB)
  2 <none> text/html [< none >] (0,6 kB)
  3 Допустимые налоговые схемы. Налоговые проверки 2019.doc application/msword [attach] (75,0 kB)

Lines with [attach] show email attachments. In this case, the file is called “Допустимые налоговые схемы. Налоговые проверки 2019.doc”, its type is application/msword, that is, it is an MS Word document and its size is 75.0 kB.

The other two lines talk about the elements of the email, the type of which is text/plain (plain text) and text/html (HTML code) – this is the text that is displayed as a message, the email itself.

To extract the files attached to the letter (those marked with [attach]), use the command:

mu extract original_msg.txt -a

Replace original_msg.txt with the path to your saved email.

If you want to save everything: both the files attached to the email and the text of the email itself, then use the --save-all option:

mu extract original_msg.txt --save-all

Description of email headers

Let's look at some common email headers. Again, these can all be spoofed, so only Received: strings generated by a service on your computer or a trusted server can be fully trusted.

From

Shows who sent the message. Can be easily faked, is the most unreliable.

Subject

Here's what the sender put in as the subject of the email.

Date

Shows the date and time the email was created.

To

Shows who this message is addressed to, but may not include the recipient's address.

Return-Path

The email address for the reply. This is the same as “Reply-To:”.

Envelope-To and Envelope-From

These headers, like To and From, indicate the recipient and sender of the email. But the To and From headers are for the person reading the email or for the mailer. The SMTP protocol, which is used to send messages, uses certain commands with recipient and sender addresses, and information from these commands can be added to the Envelope-To and Envelope-From headers.

Delivery Date

Shows the date and time when the email was received by the mail service or mail client.

Received

Received is the most important part of an email header, and usually the most reliable. These lines form a list of all servers/computers that the message traveled through to reach you. The received lines are best read from bottom to top. That is, the first line “Received:” is your own system or mail server. The last line “Received:” is where the email is coming from. Each mail system has its own “Received:” string style. Each individual “Received:” usually indicates the machine from which the mail was received and the machine that received the mail.

DKIM-Signature and DomainKey-Signature

The information in these headers is used for email authentication and spam control. Quotes from Wikipedia:

DomainKeys Identified Mail is an E-mail authentication method designed to detect forged messages sent by email. The method allows the recipient to verify that the message was indeed sent from the declared domain. DKIM makes it easy to fight fake sender addresses that are often used in phishing emails and email spam.

DomainKeys Identified Mail (DKIM) combines several existing anti-phishing and anti-spam techniques to improve the classification and identification of legitimate email. Instead of a traditional IP address, DKIM adds a digital signature associated with the organization's domain name to determine the sender of a message. The signature is automatically verified on the recipient's side, after which “white lists” and “black lists” are applied to determine the sender's reputation.

DomainKeys technology uses domain names to authenticate senders. DomainKeys uses the existing Domain Name System (DNS) to communicate public encryption keys.

DomainKeys is an e-mail authentication system designed to verify the sender's domain name and the validity of e-mail. The DomainKeys specification inherits aspects of Identified Internet Mail to create an extended protocol called DomainKeys Identified Mail (DKIM). These combined specifications served as the basis for the IETF working group to develop the standard. DomainKeys is a deprecated naowdays.

Message-id

A unique string assigned by the mail system when the message is first created. Can be easily faked.

Mime-Version

MIME (Multipurpose Internet Mail Extensions) is a standard that describes the transmission of various types of data via e-mail, as well as, in general, a specification for encoding information and formatting messages so that they can be sent over the Internet.

Content-Type

It will usually tell you the format of the message, such as html or plaintext.

X-Spam-Status

Displays the spam rating generated by your service or email client.

X-Spam-Level

Displays the spam score typically generated by your service or email client.

Message Body

This is the content of the letter that is displayed to the recipient.

X-Received

X-Received is a non-standard header (as opposed to Received) added by some user agents or mail forwarding agents such as the google mail SMTP server.

X-PHP-Originating-Script

If the email is sent by a PHP script, then this header may contain the name of this PHP script. Example

X-PHP-Originating-Script: 6603:class-phpmailer.php

X-Mailer

The header that the PHP script can add contains information about the program that sent the email. Example:

X-Mailer: PHPMailer 5.2.22 (https://github.com/PHPMailer/PHPMailer)

X-Originating-IP

Contains the IP address of the computer that sent this email. If you can't find the X-Originating-IP header, then navigate through the Received headers to find the source IP address as shown above.

Online service for email analysis

Pretty simple email analysis service. You need to insert a full mail source with headers into it, and it will show the chain of mail nodes through which this email has passed, and also display a list of attachments.

Service address: https://suip.biz/ru/?act=email

Analysis example:

One more example:

If you have suggestions for the service, for example, what other important and interesting headers should be added to this short report, then write here in the comments.

Conclusion

So, in this article, we learned to understand email headers. Although many people now use the web interface, which usually does not contain such interesting information as the sender's IP address, in a corporate environment it is quite common to use mail programs installed on a computer. Analysis of such letters may reveal more interesting information. Usually, emails from spam mailing lists and various malicious mailing lists are also sent without using the web interface, which also makes the analysis of such emails more interesting.

In this article, we also learned how to extract malicious files from emails if web-based email services do not allow them to be downloaded.

Recommended for you:

One Comment to Email analysis

  1. vad says:

    useful!

Leave a Reply

Your email address will not be published. Required fields are marked *