Advanced wordlist generating techniques
Table of contents
1. The basics of generating dictionaries/wordlists
3. Generation of dictionaries based on information about a person
4. Compiling word lists and lists of usernames based on the website
5. How to create a variable-length masked dictionary
6. How to crack a hash when nothing is known about the password (all characters)
7. How to generate wordlists that necessarily use certain characters or strings
8. How to create combined dictionaries
9. How to combine more than two dictionaries
10. How to create all possible combinations for a short list of strings
11. PRINCE algorithm combination
12. Hybrid attack – combining Combinator attack and Mask attack
13. How to create a combined dictionary containing username and password separated by a character
14. How to extract usernames and passwords from combination dictionary to regular dictionaries
17. How to create a dictionary with a list of dates
18. How to split generated dictionaries into parts
This article contains possible situations of generating dictionaries that you may encounter in practice, but which have not yet been described in other articles. Some examples are taken from questions in the comments or on the forum, I faced some problems myself.
We will consider not only the already familiar tools, but also a couple of new ones. For some tasks, we will use not only specialized tools – some actions are easier to do using standard Linux utilities or our own scripts.
Since the basics of generating dictionaries will not be described here, we will start with a list of sources where you can read these very basics. It is recommended that you read them if you have not already done so.
The basics of generating dictionaries/wordlists
How to create dictionaries by masks using Crunch, Hashcat, maskprocessor and John the Ripper: Programs for generating wordlists
There are many more examples of composing masks in the article: Hashcat manual: how to use the program for cracking passwords
Rule Based Attack
Rule-based attack consists in modifying an existing dictionary based on a specified set of rules. If you want to change the behavior of a mask using a rule-based attack, you first need to generate a dictionary by mask, and then work with it.
The easiest way is to use the program with the graphical interface Mentalist, instruction: Generation and modification of dictionaries according to the specified rules
The rule-based attack in John the Ripper is much more powerful than in hashcat, for this attack I recommend choosing John from these two programs: Comprehensive Guide to John the Ripper. Part 5: Rule-based attack
More examples of John's mask attacks: How to create dictionaries that comply with specific password strength policies
Hashcat rule-based attack: https://hashcat.net/wiki/doku.php?id=rule_based_attack
Generation of dictionaries based on information about a person
If the password is based on user data, for example, a combination of first name, last name, date of birth, names of children, phone number, the same data of the next of kin, then such a password can be considered weak. The tools discussed above are not very suitable for compiling such dictionaries based on user information, except that the Combinator attack in Hashcat, but it only accepts 2 dictionaries at a time.
This is exactly the problem that the CUPP utility solves.
Installing CUPP on Kali Linux
sudo apt install cupp
Installing CUPP in BlackArch
sudo pacman -S cupp
Run the program interactively and enter the known user data:
cupp -i
An example of generated passwords:
Compiling wordlists and lists of usernames based on the website content
Let's get acquainted with one more tool – CeWL. This program crawls the specified site (you can specify the crawl depth) and sorts all words found on the site pages in the order of their frequency of use. Why do you need such a dictionary? The author suggests using it for brute force. In addition, the program can search for e-mail addresses, as well as extract the names of the creators of office documents – Word and PDF files are supported. This data can be used to compile a list of usernames.
Also included with the program is the FAB utility, which extracts author names from already downloaded Word and PDF documents – they can also be used as usernames for brute-force.
Installing CeWL on Kali Linux
The program is preinstalled in Kali Linux.
In the minimum versions, the program is installed as follows:
sudo apt install cewl
Installing CeWL in BlackArch
sudo pacman -S cewl gem install mime mime-types mini_exiftool nokogiri rubyzip spider
Additional installation nuances are described in the program card – it is recommended to familiarize yourself with it.
CeWL launch examples
Starting the collection of words from the pages of the site https://site.ru, using only the pages, links to which will be found at the specified address (-d 1), to compose a dictionary that will be saved to the dic.txt file (-w dic.txt ):
cewl https://site.ru -d 1 -w dic.txt
Starting the collection of words from the pages of the site https://site.ru, using the pages, links to which will be found at the specified address, as well as on the downloaded pages (-d 2), to compile a dictionary that will be saved to the specified file (-w dic.txt), while for each word the frequency with which it occurs (-c) will be shown, a list of found email addresses (-e) will also be compiled, which will be saved to the specified file (--email_file emails.txt) and a list will be created based on the information found in the document meta tags (-a), this list will be saved to the specified file (--meta_file meta.txt):
cewl https://site.ru -d 2 -w words.txt -a --meta_file meta.txt -e --email_file emails.txt -c
FAB launch, during which all *.doc documents in the /home/mial/Downloads/ directory will be checked, a field containing the name of the document author will be extracted from the meta information of these documents, the data will be displayed:
ruby /usr/share/cewl/fab.rb /home/mial/Downloads/*.doc
How to create a variable-length masked dictionary
Let's consider the generation of wordlists of various lengths using the example of Hashcat and maskprocessor.
Incrementing passwords in Hashcat
In order to generate passwords of different lengths, the following options are available:
-i, --increment | | Enable mask increment mode | --increment-min | Num | Start mask incrementing at X | --increment-min=4 --increment-max | Num | Stop mask incrementing at X | --increment-max=8
The -i option is not mandatory. If it is used, it means that the length of the password candidates should not be fixed, it should increment in terms of the number of characters.
The --increment-min option is also not mandatory. It defines the minimum length of password candidates. If the -i option is used, the default --increment-min is 1.
And the --increment-max option is not mandatory as well. It defines the maximum length of password candidates. If the -i option is specified, but the --increment-max option is omitted, then its default value is the mask length.
Rules for using mask increment options:
- the -i option must be specified before using --increment-min and --increment-max
- the value of the --increment-min option can be less than or equal to the value of the --increment-max option, but cannot exceed it
- the mask length can be greater in the number of characters or equal to the number of characters set by the --increment-max option, BUT the mask length cannot be less than the character length set by --increment-max.
So, here's the run command to generate passwords that are six to ten characters long:
hashcat -a 3 -i --increment-min=6 --increment-max=10 --stdout ?l?l?l?l?l?l?l?l?l?l
Incrementing passwords in maskprocessor
The maskprocessor has the following increment option:
-i, --increment=NUM:NUM Enable increment mode. 1st NUM=start, 2nd NUM=stop Example: -i 4:8 searches lengths 4-8 (inclusive)
The following command will compose a dictionary of numbers from 1 to 9999:
maskprocessor -i 1:4 ?d?d?d?d
How to crack a hash when nothing is known about the password (all characters)
I already wrote about the nuances, here are only examples of commands.
If you need to run a brute-force attack, when the password can contain uppercase and lowercase Latin letters, as well as numbers and a password length from 1 to 12, then you need to use the following options and a mask:
-i --increment-min=1 --increment-max=12 -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1?1?1
To list all password candidates or store them in a dictionary:
hashcat --stdout -a 3 -i --increment-min=1 --increment-max=12 -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1?1?1
If you need to run a brute-force attack when the password can contain uppercase and lowercase Latin letters, numbers, and symbols !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ and the password length is from 1 to 12, then you need to use the following options and mask:
-i --increment-min=1 --increment-max=12 ?a?a?a?a?a?a?a?a?a?a?a?a
To list all password candidates or store them in a dictionary:
hashcat --stdout -a 3 -i --increment-min=1 --increment-max=12 ?a?a?a?a?a?a?a?a?a?a?a?a
How to generate wordlists that necessarily use certain characters or strings
In the comments to articles about generating passwords by masks, sometimes I am asked how to create a dictionary containing certain characters or words, and they can be anywhere. In fact, it is the masks that are poorly suited for this. The problem can be solved using the Rule-based attack, especially when it comes to individual characters or groups of characters – above are already given links to solving such cases. But when it comes to strings, then a Rule-based attack becomes either too complex and confusing due to the need to create a large number of rules, or even simply impossible.
Let's look at a few examples.
Suppose it is known that a password consisting of any characters (upper and lower letters, as well as numbers) necessarily contains the word "Alexey", which can appear anywhere in the password and in any case. To solve this problem, instead of creating an insane number of rules, you can create a dictionary with every possible string and simply filter out the words that contain the desired string, for example:
maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i Alexey
In my opinion, this is the optimal solution. It is also suitable if you do not want to create a dictionary, but want to use a mask attack – many brute-force programs are capable of accepting password candidates from standard input.
Another variant – the search word can be in any case, but it is exactly located at the beginning of the password:
maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '^Alexey'
By the way, the last example is not particularly good – since we know that at first only 2 characters are possible – “A” or “a”, it is better to use a custom character set that includes these two characters. Similarly, for others – at least four known symbols (according to the number of possible custom character sets).
How to create a dictionary that necessarily contains the characters “e”, “g”, “D” and “t”? To do this, use a command of the form:
maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep e | grep g | grep D | grep t
In it, you can add a string from grep and filter out passwords with any number of required characters.
How to create a dictionary in which passwords in any place and in any case contain the word “Alexey” or the word “MiAl”? Use a command like:
maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '(Alexey)|(MiAl)'
The number of lines to search for can be any:
maskprocessor -1 ?l?u?d ?1?1?1?1?1?1?1?1?1?1 | grep -i -E '(Alexey)|(MiAl)|(OneMoreString)|(AnotherString)|(EvenMore)'
An example of a command that creates a dictionary in which password candidates consist only of numbers, but the password must contain the sequence “12345” located anywhere:
maskprocessor ?d?d?d?d?d?d?d?d?d?d | grep 12345
I think the idea is clear – instead of trying to create an impossible mask, we create every possible string and filter out what we need.
How to create combined dictionaries
Combined dictionaries are usually called dictionaries that include both a username and a password, separated by a specific character (usually a colon or tab character). But in this case, I mean dictionaries that are composed of words from different wordlists by combining. But we will return to “normal” combined dictionaries as well.
This is called a Combinator attack, its detailed description: https://hashcat.net/wiki/doku.php?id=combinator_attack
The bottom line is that for each word from the first dictionary, each word from the second dictionary is added.
Dictionary 1 (dict1.txt):
yellow green black blue
Dictionary 2 (dict2.txt):
car bike
Launching a combinatorial attack (-a 1):
hashcat -a 1 --stdout dict1.txt dict2.txt
Output:
yellowcar yellowbike greencar greenbike blackcar blackbike bluecar bluebik
For some reason, it seemed to me that the words should also be combined in the reverse order (that is, the word from the second dictionary comes first), but as you can see, this does not happen. Therefore, to obtain the described effect, you need to launch the attack again, swapping the dictionaries:
hashcat -a 1 --stdout dict2.txt dict1.txt
How to combine more than two dictionaries
The following is an example of a combination of three dictionaries – the point is that each new received word consists of one word from each of the three dictionaries:
# create intermediate dictionaries obtained by combining each pair of dictionaries hashcat -a 1 --stdout dict1.txt dict2.txt > tmp12.txt hashcat -a 1 --stdout dict2.txt dict1.txt > tmp21.txt hashcat -a 1 --stdout dict1.txt dict3.txt > tmp13.txt hashcat -a 1 --stdout dict2.txt dict3.txt > tmp23.txt hashcat -a 1 --stdout dict3.txt dict2.txt > tmp32.txt hashcat -a 1 --stdout dict3.txt dict1.txt > tmp31.txt # creation of intermediate dictionaries, obtained by the combination of each pair of “source dictionary-combined dictionary”. As a result, dictionaries will be created that are a combination of all three hashcat -a 1 --stdout dict12.txt dict3.txt > tmp123.txt hashcat -a 1 --stdout dict21.txt dict3.txt > tmp213.txt hashcat -a 1 --stdout dict13.txt dict2.txt > tmp132.txt hashcat -a 1 --stdout dict31.txt dict2.txt > tmp312.txt hashcat -a 1 --stdout dict23.txt dict1.txt > tmp231.txt hashcat -a 1 --stdout dict32.txt dict1.txt > tmp321.txt # collect all generated words into one dictionary: cat tmp123.txt tmp213.txt tmp132.txt tmp312.txt tmp231.txt tmp321.txt | sort | uniq> full.txt
How to combine 4 or more dictionaries in the same way? It is hard for me to imagine that this can be useful in a real situation, but for this you most likely have to write your own script to automate the algorithm shown above. If you know programs that can do this, then write in the comments.
And… at this point I remembered the combinator3 program. It comes in the hashcat-utils package. This command is used to combine three dictionaries (use combinator to combine two dictionaries).
Usage:
combinator3 file1 file2 file3
This program can combine 3 specified dictionaries, but again – if the dictionary comes third, then the words from it will always be at the end.
To get all possible combinations of three words in any order, you need to use the following commands:
combinator3 file1 file2 file3 > full.txt combinator3 file1 file3 file2 >> full.txt combinator3 file2 file1 file3 >> full.txt combinator3 file2 file3 file1 >> full.txt combinator3 file3 file1 file2 >> full.txt combinator3 file3 file2 file1 >> full.txt
How to create all possible combinations for a short list of strings
The combipow utility creates all “unique combinations” from a short input list. This program is also included in hashcat-utils.
Usage:
combipow DICTIONARY
An example of the contents of a dictionary named wordlist:
a b c XYZ 123
Running combipow with this dictionary:
combipow wordlist
will give the following results:
a b ab c ac bc abc XYZ aXYZ bXYZ abXYZ cXYZ acXYZ bcXYZ abcXYZ 123 a123 b123 ab123 c123 ac123 bc123 abc123 XYZ123 aXYZ123 bXYZ123 abXYZ123 cXYZ123 acXYZ123 bcXYZ123 abcXYZ123
PRINCE algorithm combination
The princeprocessor program implements the PRINCE algorithm. You can find out more about this algorithm on the program card page. It also describes the essence of the program and its options.
Examples of using princeprocessor.
To create combinations from the contents of the dict1.txt file:
princeprocessor dict1.txt
PRINCE algorithm combination using words from the specified dictionary (dict1.txt), compose chains with a minimum length of 2 elements (--elem-cnt-min=2) and a maximum length of 2 elements (--elem-cnt-max=2), that is, each chain will contain only 2 words each:
princeprocessor --elem-cnt-min=2 --elem-cnt-max=2 dict1.txt
Hybrid attack – combining Combinator attack and Mask attack
This attack combines a dictionary attack and a mask attack – it takes a dictionary and a mask as input and produces a hybrid password.
Hybrid attack descrition: https://hashcat.net/wiki/doku.php?id=hybrid_attack
If your example.dict contains:
password hello
Options:
... -a 6 example.dict ?d?d?d?d
generate the following password candidates:
password0000 password0001 password0002 . . . password9999 hello0000 hello0001 hello0002 . . . hello9999
It works the other way too!
Options:
... -a 7 ?d?d?d?d example.dict
generate the following password candidates:
0000password 0001password 0002password . . . 9999password 0000hello 0001hello 0002hello . . . 9999hello
All the features of a hybrid attack can be implemented using Rule Based Attack – so if you like it better then use it.
How to create a combined dictionary containing username and password separated by a character
Now we go back to the combined dictionaries containing both the username and the password.
As an example, look at the fragment of the dictionary (auth_basic.txt file) of Router Scan by Stas'M – it contains tab-delimited credentials:
admin <empty> admin admin admin 1234 admin password Admin Admin <empty> admin root <empty> root admin root root root toor root public
And this is an example of a combined dictionary where the username and password are separated by a colon:
admin:admin admin:1234 admin:password root:root root:toor
To create a combined dictionary, use a command like this:
hashcat -a 1 --stdout -j '$SEPARATOR' users.txt passwords.txt
In this command:
- users.txt and passwords.txt – dictionaries from which usernames and passwords will be taken and all possible combinations will be composed.
- SEPARATOR – a symbol that will separate the login and password
For example, in the following command, the separator is the colon:
hashcat -a 1 --stdout -j '$:' users.txt passwords.txt
By the way, if you need to insert a tab character as a separator, then press Ctrl-v + Tab:
By the way, if you try to understand the above hashcat command, you will find out that the Combinator attack is used at the same time and the rule from the Rule-based attack has been added.
Let's consider a special case: how to create a file with a paired dictionary of logins and a password of this type: login is always constant, then tabulation and password.
superadmin Zte531zTE@fn18131 superadmin Zte531zTE@fn18132 И т.д.
Of course, as the first dictionary, you can create a file with one text line – login. But there is another option with the powerful sed command:
sed -e 's/^/superadmin\t/' pass.txt > login_pass.txt
In this command:
- superadmin is a line to be inserted before each password
- \t is a tab character that will separate username and password
- pass.txt is a file from where to read passwords
- login_pass.txt is a new file where passwords will be saved
If you don't want to create a new file, but want to change the existing one, then remove the redirection and add the -i option:
sed -i -e 's/^/superadmin\t/' pass.txt
How to extract usernames and passwords from combination dictionary to regular dictionaries
If we only need to extract usernames and/or only passwords from the combined dictionary. For this we will use the (also powerful) awk program.
To retrieve usernames:
awk -F 'SEPARATOR' '{print $1}' DICTIONARY.txt | sort | uniq
To extract passwords:
awk -F 'SEPARATOR' '{print $2}' DICTIONARY.txt | sort | uniq
In these commands:
- SEPARATOR is a symbol that separates logins and passwords. If you need to specify a tab character there, then write “\t”.
- DICTIONARY.txt is a combined dictionary from which we extract lists of words
Basically, the commands only differ in $1 (first field before separator) and $2 (second field after separator).
How Hashcat can generate a dictionary of MD5 hashes of all six-digit numbers from 000000 to 999999
Hashcat can do rainbow tables, but only for Wi-Fi.
But with the help of PHP, this task can be solved in several lines:
<?php for ($i = 0; $i <= 999999; $i++) { echo md5 (str_pad( "$i", 6, "0", STR_PAD_LEFT )) . PHP_EOL; }
Execution time is 1-4 seconds. During this time, all md5 hashes for lines 000000…999999 will be generated.
Save the above code to md5-rb-gen.php file, run like this:
php md5-rb-gen.php
To save the received hashes to a file:
php md5-rb-gen.php > md5.txt
See “How to run PHP script without a web server” by the way.
An interesting observation about the speed of achieving the task.
The following two commands do exactly the same:
maskprocessor ?d?d?d?d?d?d > 6.txt cat 6.txt | while read -r line ; do echo -n $line | md5sum | awk '{print $1}'; done > md5.txt
But on an average computer it will take up to an hour to execute commands. PHP is faster than native Linux commands …
Doubling words
How to create a dictionary of 12 character words, consisting only of decimal digits (?d) of the abcdefabcdef format, i.e. the six-digit number is written twice?
You can use a Rule-based Attack, or you can write a small Bash script (all words in the user.txt file are doubled):
cat user.txt | while read -r line; do echo $line$line; done
In relation to our task – doubling six-digit numbers, you can use the following command, which will generate six-digit numbers and write each number twice:
maskprocessor ?d?d?d?d?d?d | while read -r line; do echo $line$line; done
How to create a dictionary with a list of dates
How to create a list of dates using the DD-MM-YYYY pattern, that is, matching the mask ?d?d-?d?d-?d?d?d?d but so that the brute-force is not in the range 00-99, but 01-31, 01-12 and 1900-2021 respectively?
Such dictionaries can be created by the pydictor program.
But it's even easier to create a dictionary as follows (it will be saved to the dates.txt file):
echo {01..31}.{01..12}.{1900..2021} | tr " " "\n" > dates.txt
If you want to achieve it without creating a dictionary, then pipe the output of the previous commands to the hashcat standard input:
echo {01..31}.{01..12}.{1900..2021} | tr " " "\n" | hashcat ……..
How to split generated dictionaries into parts
Is it possible somehow in maskprocessor to split the output generated dictionary into several parts? For example, in 1GB portions.
Yes, you can split the output of maskprocessor as well as ready-made dictionaries into parts. On Linux, it is convenient to use the “split” utility for this, for example:
maskprocessor ?l?l?l?l?l?l?l?l?l?l | split -C 10G
Conclusion
If I missed something or there are utilities that make the shown things easier or make possible what I wrote about that is impossible, then write in the comments – it will be interesting to know about this and supplement the article.
You can also ask your questions related to the generation of dictionaries that take into account certain conditions.
Related articles:
- How to generate dictionaries by any parameters with pydictor (76.2%)
- Generation and modification of dictionaries according to the specified rules (65.4%)
- How to create dictionaries that comply with specific password strength policies (using Rule-based attack) (62.2%)
- Programs for generating wordlists (61%)
- How to use precomputed tables to crack Wi-Fi passwords in Hashcat and John the Ripper (59%)
- Wi-Fi security audit with Hashcat and hcxdumptool (RANDOM - 0.7%)