How to use Hashcat to crack passwords containing non-Latin characters

If a password contains Russian, Chinese, Arabic characters or any other non-Latin (non-English) characters, then it can also be cracked in Hashcat, but it also requires an understanding how computers work with strings and encodings, especially when launching a mask attack.

We will consider two cases when the password contains non-Latin (in my example, Russian) letters:

1) dictionary attacks

2) mask attacks

I will crack the MD5 hash of the very short Russian word “нет”:

echo -n 'нет' | md5sum
df28b6f9df132e3be4db5b102433d3b1 -

If you use echo to calculate the line hash, it is extremely important to specify the -n option, which prevents the addition of a newline character – otherwise each hash for the line will be incorrect! See also articles “Hash-generation software” and “How to identify hash types”.

So, I have an MD5 hash df28b6f9df132e3be4db5b102433d3b1, in Hashcat this hash is number 0 (option -m 0), I will crack it in two ways: by dictionary attack (option -a 0) and by mask attack (option -a 3).

Hashcat dictionary attack against password with non-Latin symbols

This option is simpler than a mask attack – just make sure that the dictionary is in the same encoding as the cracked password was during the computing of the hash.

How to find out in which encoding the password when the hash was computed? It is impossible – you need to take into account the circumstances in which the hash was made. If this is a hash of the website password, then most likely the password was in the same encoding as the website pages.

I got my hash in a terminal whose encoding is set to UTF-8, so my string was in UTF-8 encoding.

If the terminal were in a different encoding, for example, in Cyrillic WINDOWS-1251, then I would get a completely different hash.

Since the hashed string was encoded in UTF-8, I create a dictionary file named dic.txt encoded in UTF-8. My file contents:

да
нет
возможно
позднее
вряд ли
совсем нет

Now I run a dictionary attack:

hashcat -m 0 -a 0 df28b6f9df132e3be4db5b102433d3b1 ./dic.txt

The password has been successfully cracked, this is indicated by the lines:

df28b6f9df132e3be4db5b102433d3b1:нет

Status...........: Cracked
Recovered........: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts

That is, dictionary attack is rather plain – the main idea is that the dictionary should be in the same encoding as the password during the hashing.

Since I am going to continue my experiments with the same hash, I delete the line from the file ~/.hashcat/hashcat.potfile

df28b6f9df132e3be4db5b102433d3b1:нет

Otherwise, instead of launching a new brute force, Hashcat will verify that the hash has already been cracked and will not start enumerating passwords.

Hashcat mask attack against password with non-Latin symbols

But now everything is a little more complicated. Intuitively, based on previous experience by mask attack, when the password contains only Latin characters and numbers, as well as characters from single-byte encoding (in other words, ASCII characters), you can compose the following command:

hashcat -m 0 -a 3 -1 тне df28b6f9df132e3be4db5b102433d3b1 ?1?1?1

New here:

  • -a 3 - means mask attack
  • -1 тне - a custom character set number one – I deliberately arranged the letters in a different order.
  • ?1?1?1 - a mask that means a string of three characters from the first custom character set (these are one numbers, not small letters L)

Result:

Password is NOT hacked. Pay attention to one interesting line:

Progress.........: 125/125

In total, 125 password candidates were checked. It's too much. With a mask length of 3 characters, when only 3 letters are used, there can be only 33 = 27 password candidates. Where did 125 come from?

To add even more mysticism, let's run the same command, but now instead of a mask of three characters, we’ll specify a mask of six characters (although we remember very well that we crack the hash of the word “no”, which has exactly three characters):

hashcat -m 0 -a 3 -1 тне df28b6f9df132e3be4db5b102433d3b1 ?1?1?1?1?1?1

Suddenly, the hash is hacked!

Once again, clean up the ~/.hashcat/hashcat.potfile file as we continue our experiments.

The point is that password cracking programs (in fact, like hashing programs and many others) do not care about the used encoding – they work with bytes, with a sequence of bytes.

Each letter of a non-Latin character in UTF-8 encoding consists of two bytes. Therefore, when we specify a custom character set as follows: -1 тне, then hashcat instead of three Russian letters sees six bytes in this line, they are:

  • D1
  • 82
  • D0
  • BD
  • D0
  • B5

As you can see, two bytes are duplicate – this is D0. That is, there are five unique letters with a word length of 3 characters, so the number of all combinations is 53 = 125, just as much as we saw in the screenshot above, and it was sorted out.

When we specified a mask of three user characters, the generated password candidates were only three bytes long, although the word “нет” was six bytes long — that is why the six-character mask worked.

So, to launch a mask attack to brute-force a password containing non-Latin characters (double-byte encoding), you need to double the length of the mask. As for custom character sets, you need to consider the coding of the console, or save the characters of the custom sets in files with the desired encoding and specify these files with options -1, -2, -3 and -4.

This method will work, but there is a serious “but”: the number of candidates for passwords increases many times, including those candidates for passwords that obviously do not fit, because they are added from bytes to characters that are not in the intended alphabet. Even with three letters in a three letter word, we get 125/27, that is, almost 5 times more candidates for passwords. Moreover, ≈80% of them are just rubbish, consisting of characters that are not only not specified for enumeration, but are generally even absent in the Russian alphabet. With an increase in the word length and the number of letters, there will be even more such garbage candidates for passwords, which means a multiple increase in brute force time.

Therefore, let's think about how to optimize the process. Let's look at a line that contains all the characters of the Russian alphabet:

АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя

This line consists of the following bytes:

D090
D091
D092
D093
D094
D095
D081
D096
D097
D098
D099
D09A
D09B
D09C
D09D
D09E
D09F
D0A0
D0A1
D0A2
D0A3
D0A4
D0A5
D0A6
D0A7
D0A8
D0A9
D0AA
D0AB
D0AC
D0AD
D0AE
D0AF
D0B0
D0B1
D0B2
D0B3
D0B4
D0B5
D191
D0B6
D0B7
D0B8
D0B9
D0BA
D0BB
D0BC
D0BD
D0BE
D0BF
D180
D181
D182
D183
D184
D185
D186
D187
D188
D189
D18A
D18B
D18C
D18D
D18E
D18F

Please note that in the first position there are only two options D0 or D1.

Let us return to our string “тне”. The following bytes are present in it:

D182
D0BD
D0B5

That is, in the first position there will be only D0 or D1, and in the second position of bytes there can be 82, BD and B5. Create two custom character sets and explicitly specify the bytes in each of them. In order for hashcat to understand that we specified bytes, not English letters and numbers, you need to use the --hex-charset option. As a result, we get the following command:

hashcat -m 0 -a 3 -1 D0D1 -2 82BDB5 --hex-charset df28b6f9df132e3be4db5b102433d3b1 ?1?2?1?2?1?2

Let me remind you that in the previous command to launch hash cracking, when we specified a mask of six characters, 15625 candidates for passwords (56 = 15625) were tried – and this is only for a three-letter word, when all three letters are known… This is too bad.

Let's look at the line in the last run command:

Progress………: 216/216 (100.00%)

Thanks to such a simple optimization, the number of candidates for passwords was reduced from 15625 to 216, that is, 72 times!

I think the essence of the idea is clear, consider a few related issues.

How to use hashcat files with alphabets in different encodings

Together with hashcat, files with the extension .hcchr are supplied – these are files with characters of certain alphabets in various encodings:

locate .hcchr

An example of Russian characters in various encodings:

/usr/share/doc/hashcat/charsets/combined/Russian.hcchr
/usr/share/doc/hashcat/charsets/special/Russian/ru_ISO-8859-5-special.hcchr
/usr/share/doc/hashcat/charsets/special/Russian/ru_cp1251-special.hcchr
/usr/share/doc/hashcat/charsets/standard/Russian/ru_ISO-8859-5.hcchr
/usr/share/doc/hashcat/charsets/standard/Russian/ru_KOI8-R.hcchr
/usr/share/doc/hashcat/charsets/standard/Russian/ru_cp1251.hcchr

They can be used, keep in mind to double the length of your masks for double-byte encodings. But you should NOT need to use them, since in this case brute-force will take place without optimization, and most of the password candidates will obviously be inappropriate, which will significantly increase the cracking time.

How do I know which bytes a character is encoded?

The following command (replace the word "нет" with the desired string or character) will show which bytes the string is composed of:

echo -n "нет" | od -A n -t x1
 d0 bd d0 b5 d1 82

Using the iconv command, you can convert strings and files to the desired encoding:

iconv -f utf-8 -t iso-8859-1 < rockyou.txt | sponge rockyou.txt.iso

If sponge is not found on your system, install the moreutils package.

You can use the online service, which will show the bytes from which the character or string is composed: https://w-e-b.site/?act=encoding-converter

Recommended for you:

Leave a Reply

Your email address will not be published. Required fields are marked *