How to use Hashcat to crack passwords containing non-Latin characters
If a password contains Russian, Chinese, Arabic characters or any other non-Latin (non-English) characters, then it can also be cracked in Hashcat, but it also requires an understanding how computers work with strings and encodings, especially when launching a mask attack.
We will consider two cases when the password contains non-Latin (in my example, Russian) letters:
1) dictionary attacks
2) mask attacks
I will crack the MD5 hash of the very short Russian word “нет”:
echo -n 'нет' | md5sum df28b6f9df132e3be4db5b102433d3b1 -
If you use echo to calculate the line hash, it is extremely important to specify the -n option, which prevents the addition of a newline character – otherwise each hash for the line will be incorrect! See also articles “Hash-generation software” and “How to identify hash types”.
So, I have an MD5 hash df28b6f9df132e3be4db5b102433d3b1, in Hashcat this hash is number 0 (option -m 0), I will crack it in two ways: by dictionary attack (option -a 0) and by mask attack (option -a 3).
Hashcat dictionary attack against password with non-Latin symbols
This option is simpler than a mask attack – just make sure that the dictionary is in the same encoding as the cracked password was during the computing of the hash.
How to find out in which encoding the password when the hash was computed? It is impossible – you need to take into account the circumstances in which the hash was made. If this is a hash of the website password, then most likely the password was in the same encoding as the website pages.
I got my hash in a terminal whose encoding is set to UTF-8, so my string was in UTF-8 encoding.
If the terminal were in a different encoding, for example, in Cyrillic WINDOWS-1251, then I would get a completely different hash.
Since the hashed string was encoded in UTF-8, I create a dictionary file named dic.txt encoded in UTF-8. My file contents:
да нет возможно позднее вряд ли совсем нет
Now I run a dictionary attack:
hashcat -m 0 -a 0 df28b6f9df132e3be4db5b102433d3b1 ./dic.txt
The password has been successfully cracked, this is indicated by the lines:
df28b6f9df132e3be4db5b102433d3b1:нет Status...........: Cracked Recovered........: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
That is, dictionary attack is rather plain – the main idea is that the dictionary should be in the same encoding as the password during the hashing.
Since I am going to continue my experiments with the same hash, I delete the line from the file ~/.hashcat/hashcat.potfile
Otherwise, instead of launching a new brute force, Hashcat will verify that the hash has already been cracked and will not start enumerating passwords.
Hashcat mask attack against password with non-Latin symbols
But now everything is a little more complicated. Intuitively, based on previous experience by mask attack, when the password contains only Latin characters and numbers, as well as characters from single-byte encoding (in other words, ASCII characters), you can compose the following command:
hashcat -m 0 -a 3 -1 тне df28b6f9df132e3be4db5b102433d3b1 ?1?1?1
- -a 3 - means mask attack
- -1 тне - a custom character set number one – I deliberately arranged the letters in a different order.
- ?1?1?1 - a mask that means a string of three characters from the first custom character set (these are one numbers, not small letters L)
Password is NOT hacked. Pay attention to one interesting line:
In total, 125 password candidates were checked. It's too much. With a mask length of 3 characters, when only 3 letters are used, there can be only 33 = 27 password candidates. Where did 125 come from?
To add even more mysticism, let's run the same command, but now instead of a mask of three characters, we’ll specify a mask of six characters (although we remember very well that we crack the hash of the word “no”, which has exactly three characters):
hashcat -m 0 -a 3 -1 тне df28b6f9df132e3be4db5b102433d3b1 ?1?1?1?1?1?1
Suddenly, the hash is hacked!
Once again, clean up the ~/.hashcat/hashcat.potfile file as we continue our experiments.
The point is that password cracking programs (in fact, like hashing programs and many others) do not care about the used encoding – they work with bytes, with a sequence of bytes.
Each letter of a non-Latin character in UTF-8 encoding consists of two bytes. Therefore, when we specify a custom character set as follows: -1 тне, then hashcat instead of three Russian letters sees six bytes in this line, they are:
As you can see, two bytes are duplicate – this is D0. That is, there are five unique letters with a word length of 3 characters, so the number of all combinations is 53 = 125, just as much as we saw in the screenshot above, and it was sorted out.
When we specified a mask of three user characters, the generated password candidates were only three bytes long, although the word “нет” was six bytes long — that is why the six-character mask worked.
So, to launch a mask attack to brute-force a password containing non-Latin characters (double-byte encoding), you need to double the length of the mask. As for custom character sets, you need to consider the coding of the console, or save the characters of the custom sets in files with the desired encoding and specify these files with options -1, -2, -3 and -4.
This method will work, but there is a serious “but”: the number of candidates for passwords increases many times, including those candidates for passwords that obviously do not fit, because they are added from bytes to characters that are not in the intended alphabet. Even with three letters in a three letter word, we get 125/27, that is, almost 5 times more candidates for passwords. Moreover, ≈80% of them are just rubbish, consisting of characters that are not only not specified for enumeration, but are generally even absent in the Russian alphabet. With an increase in the word length and the number of letters, there will be even more such garbage candidates for passwords, which means a multiple increase in brute force time.
Therefore, let's think about how to optimize the process. Let's look at a line that contains all the characters of the Russian alphabet:
This line consists of the following bytes:
D090 D091 D092 D093 D094 D095 D081 D096 D097 D098 D099 D09A D09B D09C D09D D09E D09F D0A0 D0A1 D0A2 D0A3 D0A4 D0A5 D0A6 D0A7 D0A8 D0A9 D0AA D0AB D0AC D0AD D0AE D0AF D0B0 D0B1 D0B2 D0B3 D0B4 D0B5 D191 D0B6 D0B7 D0B8 D0B9 D0BA D0BB D0BC D0BD D0BE D0BF D180 D181 D182 D183 D184 D185 D186 D187 D188 D189 D18A D18B D18C D18D D18E D18F
Please note that in the first position there are only two options D0 or D1.
Let us return to our string “тне”. The following bytes are present in it:
D182 D0BD D0B5
That is, in the first position there will be only D0 or D1, and in the second position of bytes there can be 82, BD and B5. Create two custom character sets and explicitly specify the bytes in each of them. In order for hashcat to understand that we specified bytes, not English letters and numbers, you need to use the --hex-charset option. As a result, we get the following command:
hashcat -m 0 -a 3 -1 D0D1 -2 82BDB5 --hex-charset df28b6f9df132e3be4db5b102433d3b1 ?1?2?1?2?1?2
Let me remind you that in the previous command to launch hash cracking, when we specified a mask of six characters, 15625 candidates for passwords (56 = 15625) were tried – and this is only for a three-letter word, when all three letters are known… This is too bad.
Let's look at the line in the last run command:
Progress………: 216/216 (100.00%)
Thanks to such a simple optimization, the number of candidates for passwords was reduced from 15625 to 216, that is, 72 times!
I think the essence of the idea is clear, consider a few related issues.
How to use hashcat files with alphabets in different encodings
Together with hashcat, files with the extension .hcchr are supplied – these are files with characters of certain alphabets in various encodings:
An example of Russian characters in various encodings:
/usr/share/doc/hashcat/charsets/combined/Russian.hcchr /usr/share/doc/hashcat/charsets/special/Russian/ru_ISO-8859-5-special.hcchr /usr/share/doc/hashcat/charsets/special/Russian/ru_cp1251-special.hcchr /usr/share/doc/hashcat/charsets/standard/Russian/ru_ISO-8859-5.hcchr /usr/share/doc/hashcat/charsets/standard/Russian/ru_KOI8-R.hcchr /usr/share/doc/hashcat/charsets/standard/Russian/ru_cp1251.hcchr
They can be used, keep in mind to double the length of your masks for double-byte encodings. But you should NOT need to use them, since in this case brute-force will take place without optimization, and most of the password candidates will obviously be inappropriate, which will significantly increase the cracking time.
How do I know which bytes a character is encoded?
The following command (replace the word "нет" with the desired string or character) will show which bytes the string is composed of:
echo -n "нет" | od -A n -t x1 d0 bd d0 b5 d1 82
Using the iconv command, you can convert strings and files to the desired encoding:
iconv -f utf-8 -t iso-8859-1 < rockyou.txt | sponge rockyou.txt.iso
If sponge is not found on your system, install the moreutils package.
You can use the online service, which will show the bytes from which the character or string is composed: https://w-e-b.site/?act=encoding-converter
Last Updated on
- Hashcat manual: how to use the program for cracking passwords (64.3%)
- How to continue brute-force from the hashcat restore point (hashcat sessions) (64.3%)
- Hacking WPA/WPA2 passwords with Aircrack-ng: dictionary attack, cooperation with Hashcat, maskprocessor, statsprocessor, John the Ripper, Crunch, hacking in Windows (61.9%)
- Windows Network Authentication Hacking (61.9%)
- Cracking WPA / WPA2 handshakes using GPU on Windows (58%)
- How to find out to which Wi-Fi networks a computer were connected to and stored Wi-Fi passwords (RANDOM - 52.4%)