TLS fingerprinting: methods for identifying client and server software

Contents

1. TLS fingerprinting: methods for identifying client and server software

1.1 What is TLS fingerprinting and what is it used for

1.2 What else is interesting about TLS fingerprinting?

1.3 How TLS fingerprinting works

1.4 How TLS fingerprinting hashes are calculated

1.5 Distinguish between TLS fingerprinting of clients and servers

1.6 Weaknesses of TLS fingerprinting

1.7 Perhaps, Encrypted Client Hello / ECH will be able to (partially) prevent TLS fingerprinting

2. TLS fingerprinting of clients: hash types, utilities for displaying TLS fingerprints of clients

3. TLS fingerprinting of servers: hash types, utilities for displaying TLS fingerprints of servers

4. How to change TLS fingerprints and impersonate other applications. How to bypass filtering based on TLS fingerprinting


1.1 What is TLS fingerprinting and what is it used for

Identification by TLS fingerprints allows you to find out the type of client or server software with a high degree of accuracy, while the first packet of the connection is sufficient.

TLS fingerprinting is a method for identifying individual differences in software that uses Transport Layer Security (TLS). Thanks to these individual differences, you can distinguish a web browser from a cURL utility with a high degree of probability, determine the manufacturer and version of a web browser, identify a Tor user, a virus or a server controlling a botnet, and so on. And all this does not require decrypting traffic or any special actions or interaction with the client or server, it is enough the transmitted data as it is.

Currently, most clients and servers use HTTPS (TLS), which means that TLS fingerprinting can be applied to all of us.

TLS fingerprinting is characterized by the following features:

  • TLS fingerprints are difficult to hide or spoof (without specialized software)
  • all software that uses cryptographic protocols that provide communications security over a computer network (TLS) are subject to TLS fingerprinting
  • TLS fingerprinting is applicable to both clients and servers
  • TLS fingerprinting can be performed by passively listening to traffic (or analyzing files with captured traffic), active scanning or other interaction with clients and servers (usually) is not required
  • different software groups usually have different TLS fingerprints. Moreover, these fingerprints can be unique to certain software versions. That is, with the help of TLS fingerprinting, it is possible to determine not only that a request was made by the Firefox web browser, but also its version
  • although TLS cryptographic protocols provide encrypted transmission, decryption of traffic is not required to collect TLS fingerprints. TLS fingerprinting uses the initial traffic of TLS protocols, which is transmitted in unencrypted form

Of course, TLS fingerprinting has its weaknesses, but at the same time this is a very interesting topic, which we will study in this series of articles.

Typical applications of TLS fingerprinting:

  • protection of sites from bots (for example, Cloudflare uses TLS fingerprinting to filter out obvious bots)
  • protection from DDoS attacks with the ability to quickly filter out clients generating malicious traffic
  • detection of Command and Control infrastructure (also known as C2 or C&C) – similar servers that manage malware will most likely have the same TLS fingerprint hashes, which are different from other software
  • the appearance (or surge) of clients with atypical TLS signatures in a corporate environment may indicate a compromise

1.2 What else is interesting about TLS fingerprinting?

The principles of identifying network clients and servers using encryption, which are used in TLS fingerprinting, can be used to identify traffic and software in other areas. For example, similar methods of identifying clients and servers are implemented for SSH. Apparently, to one degree or another, this is also applicable to VPN traffic.

And, as if TLS fingerprinting were not enough, even for web traffic there is another way to identify clients – HTTP/2 fingerprinting.

And all this gives reason to think – what other (unknown to us) identification methods exist?

Note: HTTP/2 fingerprinting will also be discussed in one of the following article series on Miloserdov.org – stay in touch!

Currently, TLS fingerprinting has already been implemented by many companies whose activities are related to network services. If you are wondering when TLS fingerprinting appeared, in 2009 there was already a publicly available Apache module mod_sslhaf, which performs passive SSL fingerprinting of clients: https://github.com/ssllabs/sslhaf. That is, in 2009, few people used HTTPS, but TLS fingerprinting had already been invented and implemented in the form of even public code.

JA3 became open source in 2017 (using ideas publicly discussed since 2015).

If you are interested in the history of TLS fingerprinting, you can find a little more here: https://blog.hqcodeshop.fi/archives/473-JA3-TLS-fingerprinting-with-Wireshark.html

1.3 How TLS fingerprinting works

Source of this section: https://lwthiker.com/networks/2022/06/17/tls-fingerprinting.html

TLS is an evolution of SSL, the protocol that was previously responsible for handling encrypted connections between web clients and servers. SSL is no longer widely used, but its name is still incorrectly used to refer to TLS.

Whenever a web client – a browser, script, or command line tool – accesses a site encrypted with TLS (https://…), it first performs a TLS handshake with the server. Here is a schematic diagram from Wikipedia:

The first message is the TLS Client Hello, sent from the client to the server. In this message, the client advertises to the server which parts of the TLS protocol it supports. Here are some examples of the parameters sent by the client:

  • The TLS protocol versions supported by the client (TLS 1.0 through TLS 1.3).
  • The cryptographic algorithms the client supports for encrypting data, known as cipher suites.
  • The cryptographic algorithms the client supports for digital signatures.

As is often the case, each client uses a different TLS library: Firefox uses NSS, Chrome uses BoringSSL, Safari uses Secure Transport, and Python uses OpenSSL. As a result, the above parameters vary significantly between clients. Here is an example of the list of cipher suites advertised by Chrome in the TLS client hello, as captured by Wireshark:

This list – its contents and the order of the ciphers – varies depending on the TLS client being used. In addition, TLS is such a complex protocol that it has many extensions, each with its own set of additional parameters. Here are some examples:

  • Some clients support compression of exchanged certificates using a dedicated TLS extension.
  • Some clients support negotiation of parameters for the underlying protocol (e.g. HTTP/2) using a dedicated TLS extension called ALPS.
  • Some clients add a fake TLS extension called GREASE.

Here is what the list of Chrome TLS extensions looks like in Wireshark (note that if you analyze the captured traffic in Wireshark and look at the TLS extensions in Chrome, the order of the extensions will be different!):

The list of extensions above is different for each browser, and the order of the extensions may also be different.

Below is a comparison table showing the notable differences in TLS signatures of common clients:

Chrome Safari Firefox Python
No. of cipher suites 16 27 17 43
No. of signature algorithms 8 11 11 20
ALPS extension Yes No No No
Certificate compression method Brotli Zlib None None
GREASE extension Yes Yes No No

Note: Chrome 101, Firefox 100, Safari 15.4, Python 3.8.10 with OpenSSL 1.1.1f and requests library. Source: https://lwthiker.com/networks/2022/06/17/tls-fingerprinting.html

With this in mind, it is clear that web clients can be easily distinguished by their TLS fingerprints. What is remarkable is that all this information is available in the very first packet of the session on the server. This way, the server can determine which client is connecting even before responding with any data. Moreover, this data is transmitted unencrypted, so any third-party eavesdropper on the network can do the same.

1.4 How TLS fingerprinting hashes are calculated

There are already several hashes for TLS fingerprinting.

The principles of obtaining data for TLS fingerprinting and calculating various hashes have common features:

1) Some part of the data is taken from the TLS handshake – the data is selected according to the following principles:

  • for different groups of applications, the data is different in some way
  • stable (the same) for different connections, that is, the data does not depend on a specific session
  • unencrypted data is used

2) This data is presented as numbers that are combined into strings

3) These strings (fully or partially) are hashed, for example, into an MD5 hash

Let's consider how the JA3 fingerprint is calculated.

JA3 is a popular method used to formalize the concept of a TLS fingerprint. It takes a Client Hello packet and creates a hash that identifies the client. JA3 can be considered obsolete these days, as some applications (like Google Chrome) actively resist fingerprinting and Google Chrome will return different JA3 values ​​for each connection. However, for understanding how TLS fingerprints are calculated, JA3 is good enough.

JA3 is formed by concatenating multiple Client Hello fields and then hashing them. The fields that are added to the JA3 hash are:

SSLVersion,Cipher,SSLExtension,EllipticCurve,EllipticCurvePointFormat

To be more precise, this is the original formula. Elliptic Curve has since been renamed to Supported Groups Registry (sources: 1, 2). So now the correct formula is:

SSLVersion,Cipher,SSLExtension,SupportedGroupsRegistry,EllipticCurvePointFormat

JA3 uses the decimal byte values ​​of the following fields in the Client Hello packet: Version, Accepted Ciphers, Extension List, Supported Group Register, and Elliptic Curve Formats. If there are several values ​​in one group, they are separated by a hyphen (dash). Groups are separated by commas. In JA3, the values ​​are listed in the same order as they are in the handshake – on the one hand, this is an additional identifying feature; but on the other hand, this allows clients to shuffle the order of some values ​​without changing the functionality. As you can see, if the string is subsequently hashed, then changing the order of any digits (even the most insignificant) leads to the hash being completely different. This is used, for example, by the Google Chrome web browser, for which the JA3 value is different for each request (looking ahead, we note that this does not save it from TLS fingerprinting).

This screenshot highlights the parts of the TLS Client Hello used for JA3:

For example, for the Firefox web browser, the string representing the above values ​​is:

771,4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-34-51-43-13-45-28-27-65037,4588-29-23-24-25-256-257,0

This is then hashed using MD5 to produce the JA3 signature:

echo -n '771,4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-34-51-43-13-45-28-27-65037,4588-29-23-24-25-256-257,0' | md5sum

Result:

2d692a4485ca2f5f2b10ecb2d2909ad3

JA3 is a de facto standard in this regard and has been integrated into Wireshark for example.

It is important to note that JA3 does not take into account all the different parameters in the Client Hello. This means that it is possible to have two different Client Hellos with the same JA3 signature.

Also, as already mentioned, Google Chrome has learned to resist JA3, so further variations were invented: JA3N and JA4. We will talk about them in the next part, dedicated to TLS fingerprinting of clients.

1.5 Distinguish between TLS fingerprinting of clients and servers

TLS fingerprints can be divided into two groups:

  • TLS fingerprints of clients
  • TLS fingerprints of servers

In response to the TLS Client Hello from the client, the server sends a TLS Server Hello. This is also unencrypted data that is used to calculate the server TLS fingerprints.

The client is the application initiating the connection.

The server is the device listening on the port and responding to the request from the client.

The same application can be both a client and a server – this is usually the case with malware that controls botnets.

1.6 Weaknesses of TLS fingerprinting

1. When changing the software version, TLS fingerprints can change. On the other hand, this is also an advantage, allowing you to determine the exact major version of the software using TLS fingerprinting.

2. Different programs can have the same TLS fingerprints, although this is usually not the case.

3. When using TLS session resumption, one client becomes characterized by 2 TLS fingerprints: 1) TLS fingerprint of the initial connection; 2) TLS fingerprint of reconnection.

3. TLS fingerprints can be spoofed.

4. Server TLS fingerprints are stable for the same clients, but can differ for different clients. For this reason, server TLS fingerprints are sometimes called “TLS session fingerprints”. This feature is taken into account and even exploited by the JARM utility (and the JARM hash of the same name), which is used for more reliable identification of servers by TLS fingerprints.

In the following parts, we will consider in more detail the hashes of TLS fingerprints and utilities for obtaining them, as well as ways to hide or spoof TLS fingerprints.

1.7 Perhaps, Encrypted Client Hello / ECH will be able to (partially) prevent TLS fingerprinting

The essence of Encrypted Client Hello can be guessed from the name – an encrypted client Hello message.

You can read more about this technology at the following links:

The concern in these articles is not TLS fingerprinting, but the fact that information about the domain is also transmitted in the cleartext in TLS handshakes. That is, DNS-over-HTTPS may be present, but data about visited domains still leaks.

See also: How to enable DNS over HTTPS and what it is for

Perhaps, the widespread implementation of Encrypted Client Hello will prevent TLS fingerprinting by an outside observer. But, in any case, the web server decrypts the traffic and can do TLS fingerprinting no matter what.

Currently, as far as I understand the situation, support for Encrypted Client Hello / ECH Protocol is missing in most popular web servers (and both the client and the server need to support this technology at the same time). In short, apparently, only Cloudflare servers have implemented this, so far.

In modern web browsers Firefox and Google Chrome (in others, perhaps, too – I just haven't checked) ECH is implemented and already enabled by default.

Next part: TLS fingerprinting of clients: hash types, utilities for displaying TLS fingerprints of clients

Recommended for you:

Leave a Reply

Your email address will not be published. Required fields are marked *