Advanced wordlist generating techniques

This article contains possible situations of generating dictionaries that you may encounter in practice, but which have not yet been described in other articles. Some examples are taken from questions in the comments or on the forum, I faced some problems myself.

We will consider not only the already familiar tools, but also a couple of new ones. For some tasks, we will use not only specialized tools – some actions are easier to do using standard Linux utilities or our own scripts.

Since the basics of generating dictionaries will not be described here, we will start with a list of sources where you can read these very basics. It is recommended that you read them if you have not already done so.

The basics of generating dictionaries/wordlists

How to create dictionaries by masks using Crunch, Hashcat, maskprocessor and John the Ripper: Programs for generating wordlists

There are many more examples of composing masks in the article: Hashcat manual: how to use the program for cracking passwords

Rule Based Attack

Rule-based attack consists in modifying an existing dictionary based on a specified set of rules. If you want to change the behavior of a mask using a rule-based attack, you first need to generate a dictionary by mask, and then work with it.

The easiest way is to use the program with the graphical interface Mentalist, instruction: Generation and modification of dictionaries according to the specified rules

The rule-based attack in John the Ripper is much more powerful than in hashcat, for this attack I recommend choosing John from these two programs: Comprehensive Guide to John the Ripper. Part 5: Rule-based attack

More examples of John's mask attacks: How to create dictionaries that comply with specific password strength policies

Hashcat rule-based attack: https://hashcat.net/wiki/doku.php?id=rule_based_attack

Generation of dictionaries based on information about a person

If the password is based on user data, for example, a combination of first name, last name, date of birth, names of children, phone number, the same data of the next of kin, then such a password can be considered weak. The tools discussed above are not very suitable for compiling such dictionaries based on user information, except that the Combinator attack in Hashcat, but it only accepts 2 dictionaries at a time.

This is exactly the problem that the CUPP utility solves.

Installing CUPP on Kali Linux

sudo apt install cupp

Installing CUPP in BlackArch

sudo pacman -S cupp

Run the program interactively and enter the known user data:

cupp -i

An example of generated passwords:

Compiling wordlists and lists of usernames based on the website content

Let's get acquainted with one more tool – CeWL. This program crawls the specified site (you can specify the crawl depth) and sorts all words found on the site pages in the order of their frequency of use. Why do you need such a dictionary? The author suggests using it for brute force. In addition, the program can search for e-mail addresses, as well as extract the names of the creators of office documents – Word and PDF files are supported. This data can be used to compile a list of usernames.

Also included with the program is the FAB utility, which extracts author names from already downloaded Word and PDF documents – they can also be used as usernames for brute-force.

Installing CeWL on Kali Linux

The program is preinstalled in Kali Linux.

In the minimum versions, the program is installed as follows:

sudo apt install cewl

Installing CeWL in BlackArch

sudo pacman -S cewl
gem install mime mime-types mini_exiftool nokogiri rubyzip spider

Additional installation nuances are described in the program card – it is recommended to familiarize yourself with it.

CeWL launch examples

Starting the collection of words from the pages of the site https://site.ru, using only the pages, links to which will be found at the specified address (-d 1), to compose a dictionary that will be saved to the dic.txt file (-w dic.txt ):

cewl https://site.ru -d 1 -w dic.txt

Starting the collection of words from the pages of the site https://site.ru, using the pages, links to which will be found at the specified address, as well as on the downloaded pages (-d 2), to compile a dictionary that will be saved to the specified file (-w dic.txt), while for each word the frequency with which it occurs (-c) will be shown, a list of found email addresses (-e) will also be compiled, which will be saved to the specified file (--email_file emails.txt) and a list will be created based on the information found in the document meta tags (-a), this list will be saved to the specified file (--meta_file meta.txt):

cewl https://site.ru -d 2 -w words.txt -a --meta_file meta.txt -e --email_file emails.txt -c

FAB launch, during which all *.doc documents in the /home/mial/Downloads/ directory will be checked, a field containing the name of the document author will be extracted from the meta information of these documents, the data will be displayed:

ruby /usr/share/cewl/fab.rb /home/mial/Downloads/*.doc

How to create a variable-length masked dictionary

Let's consider the generation of wordlists of various lengths using the example of Hashcat and maskprocessor.

Incrementing passwords in Hashcat

In order to generate passwords of different lengths, the following options are available:

 -i, --increment                |      | Enable mask increment mode                           |
     --increment-min            | Num  | Start mask incrementing at X                         | --increment-min=4
     --increment-max            | Num  | Stop mask incrementing at X                          | --increment-max=8

The -i option is not mandatory. If it is used, it means that the length of the password candidates should not be fixed, it should increment in terms of the number of characters.

The --increment-min option is also not mandatory. It defines the minimum length of password candidates. If the -i option is used, the default --increment-min is 1.

And the --increment-max option is not mandatory as well. It defines the maximum length of password candidates. If the -i option is specified, but the --increment-max option is omitted, then its default value is the mask length.

Rules for using mask increment options:

the -i option must be specified before using --increment-min and --increment-max
the value of the --increment-min option can be less than or equal to the value of the --increment-max option, but cannot exceed it
the mask length can be greater in the number of characters or equal to the number of characters set by the --increment-max option, BUT the mask length cannot be less than the character length set by --increment-max.

So, here's the run command to generate passwords that are six to ten characters long:

hashcat -a 3 -i --increment-min=6 --increment-max=10 --stdout ?l?l?l?l?l?l?l?l?l?l

Incrementing passwords in maskprocessor

The maskprocessor has the following increment option:

  -i,  --increment=NUM:NUM   Enable increment mode. 1st NUM=start, 2nd NUM=stop
                             Example: -i 4:8 searches lengths 4-8 (inclusive)

The following command will compose a dictionary of numbers from 1 to 9999:

maskprocessor -i 1:4 ?d?d?d?d

How to crack a hash when nothing is known about the password (all characters)

I already wrote about the nuances, here are only examples of commands.

If you need to run a brute-force attack, when the password can contain uppercase and lowercase Latin letters, as well as numbers and a password length from 1 to 12, then you need to use the following options and a mask:

-i --increment-min=1 --increment-max=12 -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1?1?1

To list all password candidates or store them in a dictionary:

hashcat --stdout -a 3 -i --increment-min=1 --increment-max=12 -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1?1?1

If you need to run a brute-force attack when the password can contain uppercase and lowercase Latin letters, numbers, and symbols !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ and the password length is from 1 to 12, then you need to use the following options and mask:

-i --increment-min=1 --increment-max=12 ?a?a?a?a?a?a?a?a?a?a?a?a

To list all password candidates or store them in a dictionary:

hashcat --stdout -a 3 -i --increment-min=1 --increment-max=12 ?a?a?a?a?a?a?a?a?a?a?a?a

How to generate wordlists that necessarily use certain characters or strings

In the comments to articles about generating passwords by masks, sometimes I am asked how to create a dictionary containing certain characters or words, and they can be anywhere. In fact, it is the masks that are poorly suited for this. The problem can be solved using the Rule-based attack, especially when it comes to individual characters or groups of characters – above are already given links to solving such cases. But when it comes to strings, then a Rule-based attack becomes either too complex and confusing due to the need to create a large number of rules, or even simply impossible.

Let's look at a few examples.

Suppose it is known that a password consisting of any characters (upper and lower letters, as well as numbers) necessarily contains the word "Alexey", which can appear anywhere in the password and in any case. To solve this problem, instead of creating an insane number of rules, you can create a dictionary with every possible string and simply filter out the words that contain the desired string, for example:

maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i Alexey

In my opinion, this is the optimal solution. It is also suitable if you do not want to create a dictionary, but want to use a mask attack – many brute-force programs are capable of accepting password candidates from standard input.

Another variant – the search word can be in any case, but it is exactly located at the beginning of the password:

maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '^Alexey'

By the way, the last example is not particularly good – since we know that at first only 2 characters are possible – “A” or “a”, it is better to use a custom character set that includes these two characters. Similarly, for others – at least four known symbols (according to the number of possible custom character sets).

How to create a dictionary that necessarily contains the characters “e”, “g”, “D” and “t”? To do this, use a command of the form:

maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep e | grep g | grep D | grep t

In it, you can add a string from grep and filter out passwords with any number of required characters.

How to create a dictionary in which passwords in any place and in any case contain the word “Alexey” or the word “MiAl”? Use a command like:

maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '(Alexey)|(MiAl)'

The number of lines to search for can be any:

maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '(Alexey)|(MiAl)|(OneMoreString)|(AnotherString)|(EvenMore)'

An example of a command that creates a dictionary in which password candidates consist only of numbers, but the password must contain the sequence “12345” located anywhere:

maskprocessor ?d?d?d?d?d?d?d?d?d?d | grep 12345

I think the idea is clear – instead of trying to create an impossible mask, we create every possible string and filter out what we need.

How to create combined dictionaries

Combined dictionaries are usually called dictionaries that include both a username and a password, separated by a specific character (usually a colon or tab character). But in this case, I mean dictionaries that are composed of words from different wordlists by combining. But we will return to “normal” combined dictionaries as well.

This is called a Combinator attack, its detailed description: https://hashcat.net/wiki/doku.php?id=combinator_attack

The bottom line is that for each word from the first dictionary, each word from the second dictionary is added.

Dictionary 1 (dict1.txt):

yellow
green
black
blue

Dictionary 2 (dict2.txt):

car
bike

Launching a combinatorial attack (-a 1):

hashcat -a 1 --stdout dict1.txt dict2.txt

Output:

yellowcar
yellowbike
greencar
greenbike
blackcar
blackbike
bluecar
bluebik

For some reason, it seemed to me that the words should also be combined in the reverse order (that is, the word from the second dictionary comes first), but as you can see, this does not happen. Therefore, to obtain the described effect, you need to launch the attack again, swapping the dictionaries:

hashcat -a 1 --stdout dict2.txt dict1.txt

How to combine more than two dictionaries

The following is an example of a combination of three dictionaries – the point is that each new received word consists of one word from each of the three dictionaries:

# create intermediate dictionaries obtained by combining each pair of dictionaries
hashcat -a 1 --stdout dict1.txt dict2.txt > tmp12.txt
hashcat -a 1 --stdout dict2.txt dict1.txt > tmp21.txt
hashcat -a 1 --stdout dict1.txt dict3.txt > tmp13.txt
hashcat -a 1 --stdout dict2.txt dict3.txt > tmp23.txt
hashcat -a 1 --stdout dict3.txt dict2.txt > tmp32.txt
hashcat -a 1 --stdout dict3.txt dict1.txt > tmp31.txt

# creation of intermediate dictionaries, obtained by the combination of each pair of “source dictionary-combined dictionary”. As a result, dictionaries will be created that are a combination of all three
hashcat -a 1 --stdout dict12.txt dict3.txt > tmp123.txt
hashcat -a 1 --stdout dict21.txt dict3.txt > tmp213.txt
hashcat -a 1 --stdout dict13.txt dict2.txt > tmp132.txt
hashcat -a 1 --stdout dict31.txt dict2.txt > tmp312.txt
hashcat -a 1 --stdout dict23.txt dict1.txt > tmp231.txt
hashcat -a 1 --stdout dict32.txt dict1.txt > tmp321.txt

# collect all generated words into one dictionary:
cat tmp123.txt tmp213.txt tmp132.txt tmp312.txt tmp231.txt tmp321.txt | sort | uniq> full.txt

How to combine 4 or more dictionaries in the same way? It is hard for me to imagine that this can be useful in a real situation, but for this you most likely have to write your own script to automate the algorithm shown above. If you know programs that can do this, then write in the comments.

And… at this point I remembered the combinator3 program. It comes in the hashcat-utils package. This command is used to combine three dictionaries (use combinator to combine two dictionaries).

Usage:

combinator3 file1 file2 file3

This program can combine 3 specified dictionaries, but again – if the dictionary comes third, then the words from it will always be at the end.

To get all possible combinations of three words in any order, you need to use the following commands:

combinator3 file1 file2 file3 > full.txt
combinator3 file1 file3 file2 >> full.txt
combinator3 file2 file1 file3 >> full.txt
combinator3 file2 file3 file1 >> full.txt
combinator3 file3 file1 file2 >> full.txt
combinator3 file3 file2 file1 >> full.txt

How to create all possible combinations for a short list of strings

The combipow utility creates all “unique combinations” from a short input list. This program is also included in hashcat-utils.

Usage:

combipow DICTIONARY

An example of the contents of a dictionary named wordlist:

a
b
c
XYZ
123

Running combipow with this dictionary:

combipow wordlist

will give the following results:

a
b
ab
c
ac
bc
abc
XYZ
aXYZ
bXYZ
abXYZ
cXYZ
acXYZ
bcXYZ
abcXYZ
123
a123
b123
ab123
c123
ac123
bc123
abc123
XYZ123
aXYZ123
bXYZ123
abXYZ123
cXYZ123
acXYZ123
bcXYZ123
abcXYZ123

PRINCE algorithm combination

The princeprocessor program implements the PRINCE algorithm. You can find out more about this algorithm on the program card page. It also describes the essence of the program and its options.

Examples of using princeprocessor.

To create combinations from the contents of the dict1.txt file:

princeprocessor dict1.txt

PRINCE algorithm combination using words from the specified dictionary (dict1.txt), compose chains with a minimum length of 2 elements (--elem-cnt-min=2) and a maximum length of 2 elements (--elem-cnt-max=2), that is, each chain will contain only 2 words each:

princeprocessor --elem-cnt-min=2 --elem-cnt-max=2 dict1.txt

Hybrid attack – combining Combinator attack and Mask attack

This attack combines a dictionary attack and a mask attack – it takes a dictionary and a mask as input and produces a hybrid password.

Hybrid attack descrition: https://hashcat.net/wiki/doku.php?id=hybrid_attack

If your example.dict contains:

password
hello

Options:

... -a 6 example.dict ?d?d?d?d

generate the following password candidates:

password0000
password0001
password0002
.
.
.
password9999
hello0000
hello0001
hello0002
.
.
.
hello9999

It works the other way too!

Options:

... -a 7 ?d?d?d?d example.dict

generate the following password candidates:

0000password
0001password
0002password
.
.
.
9999password
0000hello
0001hello
0002hello
.
.
.
9999hello

All the features of a hybrid attack can be implemented using Rule Based Attack – so if you like it better then use it.

How to create a combined dictionary containing username and password separated by a character

Now we go back to the combined dictionaries containing both the username and the password.

As an example, look at the fragment of the dictionary (auth_basic.txt file) of Router Scan by Stas'M – it contains tab-delimited credentials:

admin	<empty>
admin	admin
admin	1234
admin	password
Admin	Admin
<empty>	admin
root	<empty>
root	admin
root	root
root	toor
root	public

And this is an example of a combined dictionary where the username and password are separated by a colon:

admin:admin
admin:1234
admin:password
root:root
root:toor

To create a combined dictionary, use a command like this:

hashcat -a 1 --stdout -j '$SEPARATOR' users.txt passwords.txt

In this command:

users.txt and passwords.txt – dictionaries from which usernames and passwords will be taken and all possible combinations will be composed.
SEPARATOR – a symbol that will separate the login and password

For example, in the following command, the separator is the colon:

hashcat -a 1 --stdout -j '$:' users.txt passwords.txt

By the way, if you need to insert a tab character as a separator, then press Ctrl-v + Tab:

By the way, if you try to understand the above hashcat command, you will find out that the Combinator attack is used at the same time and the rule from the Rule-based attack has been added.

Let's consider a special case: how to create a file with a paired dictionary of logins and a password of this type: login is always constant, then tabulation and password.

superadmin    Zte531zTE@fn18131
superadmin    Zte531zTE@fn18132
И т.д.

Of course, as the first dictionary, you can create a file with one text line – login. But there is another option with the powerful sed command:

sed -e 's/^/superadmin\t/' pass.txt > login_pass.txt

In this command:

superadmin is a line to be inserted before each password
\t is a tab character that will separate username and password
pass.txt is a file from where to read passwords
login_pass.txt is a new file where passwords will be saved

If you don't want to create a new file, but want to change the existing one, then remove the redirection and add the -i option:

sed -i -e 's/^/superadmin\t/' pass.txt

How to extract usernames and passwords from combination dictionary to regular dictionaries

If we only need to extract usernames and/or only passwords from the combined dictionary. For this we will use the (also powerful) awk program.

To retrieve usernames:

awk -F 'SEPARATOR' '{print $1}' DICTIONARY.txt | sort | uniq

To extract passwords:

awk -F 'SEPARATOR' '{print $2}' DICTIONARY.txt | sort | uniq

In these commands:

SEPARATOR is a symbol that separates logins and passwords. If you need to specify a tab character there, then write “\t”.
DICTIONARY.txt is a combined dictionary from which we extract lists of words

Basically, the commands only differ in $1 (first field before separator) and $2 (second field after separator).

How Hashcat can generate a dictionary of MD5 hashes of all six-digit numbers from 000000 to 999999

Hashcat can do rainbow tables, but only for Wi-Fi.

But with the help of PHP, this task can be solved in several lines:

<?php

for ($i = 0; $i <= 999999; $i++) {
	echo md5 (str_pad( "$i", 6, "0", STR_PAD_LEFT )) . PHP_EOL;
}

Execution time is 1-4 seconds. During this time, all md5 hashes for lines 000000…999999 will be generated.

Save the above code to md5-rb-gen.php file, run like this:

php md5-rb-gen.php

To save the received hashes to a file:

php md5-rb-gen.php > md5.txt

See “How to run PHP script without a web server” by the way.

An interesting observation about the speed of achieving the task.

The following two commands do exactly the same:

maskprocessor ?d?d?d?d?d?d > 6.txt
cat 6.txt | while read -r line ; do echo -n $line | md5sum | awk '{print $1}'; done > md5.txt

But on an average computer it will take up to an hour to execute commands. PHP is faster than native Linux commands …

Doubling words

How to create a dictionary of 12 character words, consisting only of decimal digits (?d) of the abcdefabcdef format, i.e. the six-digit number is written twice?

You can use a Rule-based Attack, or you can write a small Bash script (all words in the user.txt file are doubled):

cat user.txt | while read -r line; do echo $line$line; done

In relation to our task – doubling six-digit numbers, you can use the following command, which will generate six-digit numbers and write each number twice:

maskprocessor ?d?d?d?d?d?d | while read -r line; do echo $line$line; done

How to create a dictionary with a list of dates

How to create a list of dates using the DD-MM-YYYY pattern, that is, matching the mask ?d?d-?d?d-?d?d?d?d but so that the brute-force is not in the range 00-99, but 01-31, 01-12 and 1900-2021 respectively?

Such dictionaries can be created by the pydictor program.

But it's even easier to create a dictionary as follows (it will be saved to the dates.txt file):

echo {01..31}.{01..12}.{1900..2021} | tr " " "\n" > dates.txt

If you want to achieve it without creating a dictionary, then pipe the output of the previous commands to the hashcat standard input:

echo {01..31}.{01..12}.{1900..2021} | tr " " "\n" | hashcat ……..

How to split generated dictionaries into parts

Is it possible somehow in maskprocessor to split the output generated dictionary into several parts? For example, in 1GB portions.

Yes, you can split the output of maskprocessor as well as ready-made dictionaries into parts. On Linux, it is convenient to use the “split” utility for this, for example:

maskprocessor ?l?l?l?l?l?l?l?l?l?l | split -C 10G

Conclusion

If I missed something or there are utilities that make the shown things easier or make possible what I wrote about that is impossible, then write in the comments – it will be interesting to know about this and supplement the article.

You can also ask your questions related to the generation of dictionaries that take into account certain conditions.

Ethical hacking and penetration testing

InfoSec, IT, Kali Linux, BlackArch