How to find all passwords and keys in a large number of files

Even if you take a quick look at the following lines:


It can be assumed that these are passwords, since these are meaningless sets of letters and symbols, while short enough to be remembered or entered.

Looking at these lines:


It can be assumed that these are API keys, or some secret keys for access, as this is still a meaningless character set, but they are too long to be passwords.

Those who at least once saw certificates and public or private keys, having seen the following lines, will immediately assume that these are some of these keys or certificates:


The following lines are hashes:


That is, a simple cursory glance at such data suggests its nature. But what if a huge data array of hundreds of gigabytes and (or) tens of thousands of files is necessary to analyze? How to extract all passwords and keys from a large amount of data without spending years rewiewing them manually?

This article is dedicated to this topic. Pawel Rzepa did research on key leaks for accessing cloud platforms; having spent many hours reviewing the finds, he decided that manual looking for passwords and keys in terabytes of different files is a wrong way. As a result, he found a method to automate the process of searching for sensitive information in large amounts of data and wrote the DumpsterDiver program.

How to search passwords on a computer

In this manual, we will learn how to search for passwords and secrets in arbitrary files, when they can be found in the source code of programs or in other cases requiring an individual approach. If you need to solve a more typical problem, for example, extract all passwords entered into a web browser from a computer or find typical sensitive files, then see the articles:

How to search passwords and keys?

Consider several options:

  • we know what we need, for example, a private SSH key or Azure Shared key
  • we want to find any passwords and keys that may be present on the computer (server)

Let's start with the first case. Suppose we need to find AWS Secret and Azure Shared keys, here are some examples:


What characteristics do these keys have? First, they have a fixed length, the AWS Secret Key is always 40 bytes long, and the Azure Shared Key is always 66. The keys also contain only Base64 characters. The last characteristic that we can find in these keys is the high randomness of the characters inside the key. Is it possible to somehow count randomness? Yes you can! It can be counted in bits using the Shannon entropy.

Informational entropy

If you want to understand what information entropy is and how you can calculate it, then a good explanation is given in this article. Using the Claude Shannon formula, let's compare the entropy of one character (the average amount of information delivered by one message from the information source) in the following lines:

  • 404e554d243c1a11d13c96b60129504a31b0abd has 3.57 entropy.
  • ChuckNorriscountedtoinfinitytwentytwice has 3.81 entropy.
  • 2r9pAuQxUFAstrWhEy4G4WiVx5iJ74Hja5AWgHq9 has 4.67 entropy.

You can calculate the entropy yourself using this simple script, which also provided with DumpsterDiver:


Example of calculating the informational entropy of the string BCVjd skd;bNhydfdklkg"lgbn,nbldkjgsd:

python 'BCVjd skd;bNhydfdklkg"lgbn,nbldkjgsd'
The entropy of a character in a string 'BCVjd skd;bNhydfdklkg"lgbn,nbldkjgsd' is 3.37953229264824 bits

One more example:

python 'abcdefghijklmnopqastuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
The entropy of a character in a string 'abcdefghijklmnopqastuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' is 5.661978179679557 bits

If the string satisfies the conditions mentioned below, then you can say with high probability that we are dealing with a key:

  • The string contains only Base64 characters (in other words, only consider strings between non Base64 characters, e.g. between “”, or ‘’)
  • The string’s length is between MIN_LEN and MAX_LEN values (e.g. if MIN_LEN is 40 and MAX_LEN is 66 bytes, then you can find AWS Secret key and Azure Shared Key)
  • The entropy of a string’s single character is higher than ENTROPY value (all AWS secret keys have the entropy higher than 4.2, while the entropy of Azure Share Key is always higher than 5.8)

These conditions became the basis for automating the process of searching for keys in any text file.

Password identification

Using this knowledge, DumpsterDiver was created, which was largely inspired by TruffleHog (also looking for passwords and keys based on the entropy of strings and regular expressions, but only in git repositories.

DumpsterDiver is a tool for finding sensitive data (passwords, hashes, API keys, asymmetric encryption keys) in various types of files.

Key Features:

  • can analyze any text files
  • uses Shannon Entropy to search for private keys
  • can search in git logs
  • decompress common archives (e.g. zip, tar.gz, etc.)
  • supports advanced search based on simple rules (details below)
  • searches for hardcoded passwords (in source code) or regular files
  • fully customizable, you can reduce the number of false positives by indicating clear criteria for what you are looking for
  • can search by regular expressions; supports wildcards
  • available as a docker image
  • writes output to JSON format

How to install DumpsterDiver

Installing DumpsterDiver on Kali Linux

sudo apt install python3-yaml
git clone
cd DumpsterDiver/
sudo pip3 install -r requirements.txt
python3 --help

Installation in BlackArch

sudo pacman -S python-pyaml
git clone
cd DumpsterDiver/
sudo pip3 install -r requirements.txt
python3 --help

Search for strings similar passwords and keys (with high informational entropy)

To test the program, you can use the folders on your computer, or create a special test folder:

mkdir ./source_folder/

In this folder, create a file with an arbitrary name and copy into it:

    "aws_auth": {
        "aws_access_key_id": "AKIAJIS5NP79GW2AYZHA",
        "aws_secret_access_key": "lxRV/uiC4kmZQryIZxSSlQ6xNlZMjo4kn+LnjNiF"
    "azure_auth": {
        "account": "foobar",
        "key": "M3mmbjOlIZr11OZoULqUWyFA1EpOdZAEcmaC64E/Ft9MRfDEYE7qDJm+9ezGQY15==",
        "container": "assets"
    "mail_auth": {
     "login": "",
     "password": "M5UWx/N-yjuZ"

Suppose we are looking for any keys in this file, then the command will look something like this:

python3 -p ./source_folder/ --level 3

If we are only hunting for AWS Secret key, then we should use the following command:

python3 -p ./source_folder/ --min-key 40 --max-key 40

An example of running a program to analyze the source code of a website, as a result of which private API keys were immediately found:

Search data leaks by given arbitrary characteristics

Now, let’s say we’d like to find files containing containing any email address in domain and don’t want to be notified about high entropy findings (so we have to set high entropy value to create a condition which is never satisfied, in our case it is any value > 6). For this purpose we can use the following command:

python3 -p ./source_folder/ -a --entropy 6 --grep-words '**'

Search files with passwords on disk

Looking for high entropy is quite effective method, but only when you’re dealing with long strings. For shorter strings like 8–12 characters this method may generate a lot of false positives. So again, let’s analyze characteristics of typical, complex password:

  • It is 8–12 characters long.
  • It contains upper and lower case letters, at least one digit and a special character.

To find strings that satisfy these conditions, you can use the Python library to… calculate password complexity, for example, passwordmeter. The following are a few examples of how this works in practice.

Using this approach, we can find all the passwords that follow the best practices for creating passwords. As for the trivial passwords, we leave them for brute force.

DumpsterDiver Results

DumpsterDiver scan results are not only printed into the terminal window, but also all finds are written in JSON format (by default, this is a file called results.json).

The JSON format is quite common and can easily be converted to any other format, for example, to .csv. Detailed output with any errors is written to the errors.log file.

Search for passwords and keys on a computer by certain parameters

There are three ways to tune DumpsterDiver so that the output matches what you need. It:

  • use of search levels
  • use of parameters of the launched command
  • using config.yaml file

Setting Search Levels

By setting up a level you can limit your findings (e.g. only to long keys, like SSH private keys) and in the same way limit the false positives. The level can be set from command line and below you cand find the detailed description of each choice:

  • --level 0 - searches for short (20-40 bytes long) keys, e.g. AWS Access Key ID.
  • --level 1 - (default) searches for typical (40-70 bytes long) keys, e.g. AWS Secret Access Key or Azure Shared Key.
  • --level 2 - searches for long (1000-1800 bytes long) keys, e.g. SSH private key
  • --level 3 - searches for any key (20-1800 bytes long), careful as it generates lots of false positives

Customization via command line parameters

  • --min-key MIN_KEY - specifies the minimum key length to be analyzed (default is 20).
  • --max-key MAX_KEY - specifies the maximum key length to be analyzed (default is 80).
  • --entropy ENTROPY - specifies the edge of high entropy (default is 4.3).
  • --grep-words GREP_WORDS [GREP_WORDS …] - specifies the grep words to look for. Multiple words should be separated by space. Wildcards are supported. Requires adding -a flag to the syntax.

There is also added a separate script which allows you to count an entropy of a character in a single word. It will help you to better customize the DumpsterDiver to your needs. You can check it using the following command:

python3 f2441e3810794d37a34dd7f8f6995df4

You can use it on test passwords to navigate the approximate level of entropy and then specify the level of informational entropy to which the found lines should correspond. This is useful when you know what you're looking for.

The following are a few examples.

When you're looking for AWS Secret Access Key:

python3 -p [PATH_TO_FOLDER] --min-key 40 --max-key 40 --entropy 4.3

When you're looking for Azure Shared Key:

python3 -p [PATH_TO_FOLDER] --min-key 66 --max-key 66 --entropy 5.1

When you're looking for SSH private key (by default RSA provate key is written in 76 bytes long strings):

python3 -p [PATH_TO_FOLDER] --min-key 76 --max-key 76 --entropy 5.1

When you're looking for any occurence of aws_access_key_id or aws_secret_access_key:

python3 -p ./test/ --grep-words '*aws_access_key_id*' '*aws_secret_access_key*' -a

Please note that wildcards before and after a grep word is used on purpose. This way expressions like "aws_access_key_id" or aws_access_key_id= will be also reported.

Finding hardcoded passwords

Using entropy for finding passwords isn't very effective as it generates a lot of false positives. This is why the DumpsterDiver uses a different attitude to find hardcoded passwords - it verifies the password complexity using passwordmeter. To customize this search you can use the following commands:

  • --min-pass MIN_PASS - specifies the minimum password length to be analyzed (default is 8). Requires adding -s flag to the syntax.
  • --max-pass MAX_PASS - specifies the maximum password length to be analyzed (default is 12). Requires adding -s flag to the syntax.
  • --pass-complex {1,2,3,4,5,6,7,8,9} - specifies the edge of password complexity between 1 (trivial passwords) to 9 (very complex passwords) (default is 8). Requires adding -s flag to the syntax.

For example if you want to find complex passwords (which contains uppercase, lowercase, special character, digit and is 10 to 15 characters long), then you can do it using the following command:

python3 -p [PATH_TO_FOLDER] --min-pass 10 --max-pass 15 --pass-complex 8

Skipping specific files

You may want to skip scanning certain files. For that purpose you can use the following parameters:

  • --exclude-files - specifies file names or extensions which shouldn't be analyzed. File extension should contain . character (e.g. .pdf). Multiple file names and extensions should be separated by space.
  • --bad-expressions - specifies bad expressions. If the DumpsterDiver find such expression in a file, then this file won't be analyzed. Multiple bad expressions should be separated by space.

If you want to specify multiple file names, bad expressions or grep words using a separated file you can do it via the following bash trick:

python3 -p ./test/ --exclude-files `while read -r line; do echo $line; done < blacklisted_files.txt`

Configuration via rules.yaml and config.yaml files

Instead of using multiple command line parameters you can specify values for all the above-mentioned parameters at once in config.yaml file.

Advanced search:

The DumpsterDiver supports also an advanced search. Beyond a simple grepping with wildcards this tool allows you to create conditions. Let's assume you're searching for a leak of corporate emails. Additionaly, you're interested only in a big leaks, which contain at least 100 email addresses. For this purpose you should edit a rules.yaml file in the following way:

filetype: [".*"]
filetype_weight: 0
grep_words: ["*"]
grep_words_weight: 10
grep_word_occurrence: 100

By the way, the contents of the default rules.yaml file are as follows:

#Rule 1
filetype: [".*"]
filetype_weight: 0
grep_words: ["*pass*", "*secret*"]
grep_word_occurrence: 1
grep_words_weight: 10

Let's assume a different scenario, you're looking for terms "pass", "password", "haslo", "hasło" (if you're analyzing polish company repository) in a .db or .sql file. Then you can achieve this by modifying a 'rules.yaml' file in the following way:

filetype: [".db", ".sql"]
filetype_weight: 5
grep_words: ["*pass*", "*haslo*", "*hasło*"]
grep_words_weight: 5
grep_word_occurrence: 1

Note that the rule will be triggered only when the total weight (filetype_weight + grep_words_weight) is >=10.

By the way, the default contents of the rules.yaml file are as follows:

logfile: './errors.log'
base64_chars: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
archive_types: ['.zip', '.tar.gz', '.tgz', '.tar.bz2', '.tbz']
excluded_files: [
        '.jpg', '.jpeg', '.png', '.gif', '.svg',
        '.mp4', '.mp3', '.webm',
        '.ttf', '.woff', '.eot',
        '.css', '.DS_Store',
bad_expressions: []
min_key_length: 40
max_key_length: 66
high_entropy_edge: 4.3
min_pass_length: 8
max_pass_length: 12
password_complexity: 8

Using Docker

A docker image is available for DumpsterDiver. Run it using:

docker run -v /path/to/my/files:/files --rm rzepsky/dumpsterdiver -p /files

If you want to override one of the configuration files (config.yaml or rules.yaml):

docker run -v /path/to/my/config/config.yaml:/config.yaml /path/to/my/config/rules.yaml:/rules.yaml -v /path/to/my/files:/files --rm rzepsky/dumpsterdiver -p /files

List of references

Last Updated on

Recommended for you:

Leave a Reply

Your email address will not be published.