Quick way to decode unknown encoding
It happens that in a web browser, instead of normal text, we face something like:
â ÐÑÐ¿Ð¾Ð»Ð½Ð¸ÑÐµ Ð²Ñ Ð¾Ð´ Ð¸Ð»Ð¸ Ð·Ð°ÑÐµÐ³Ð¸ÑÑÑÐ¸ÑÑÐ¹ÑÐµÑÑ
that is, completely unreadable characters.
Or so, when English characters are displayed normally, and instead of other characters, a percent sign and letters with numbers:
There are lines consisting of large and small letters with numbers, at the end there can be one or two equal signs:
Sometimes you come across a text in which a backslash with an x is regularly found followed by letters and numbers:
To quickly decode, even when you do not know how the string is encoded, use the free online service for determining and converting encoding. This service is copied from here http://0xcc.net/jsescape/.
The principle of operation is very simple – you insert an unreadable string into the window, and the service tries to convert it into each of the encodings it supports. That is, if you see readable text in the Plain Text field, then your string has been successfully decoded. I’ll try to understand the meaning of â ÐÑÐ¿Ð¾Ð»Ð½Ð¸ÑÐµ Ð²Ñ Ð¾Ð´ Ð¸Ð»Ð¸ Ð·Ð°ÑÐµÐ³Ð¸ÑÑÑÐ¸ÑÑÐ¹ÑÐµÑÑ:
Got it! This line means:
— Выполните вход или зарегистрируйтесь
Now let's figure out the line:
Its meaning turned out to be:
mat2: новая версия программы для удаления метаданных
Now look at the message from the scam email:
How to determine the encoding
Some frequently encountered encodings can be easily determined “by eye”. Determining the encoding with the naked eye can greatly accelerate the process of decrypting a string or quickly understand the reason why the text is displayed in this form.
Let's start with the encoding that everyone saw – in the browser bar or on the sites you could see something like these addresses: https://kali.org.ru/%d0%b4%d1%80%d1%83%d0%b3%d0%b8%d0%b5-it-%d1%82%d0%b5%d0%bc%d1%8b/%d0%ba%d0%b0%d0%ba-%d0%bd%d0%b0%d1%87%d0%b0%d1%82%d1%8c-%d0%b7%d0%bd%d0%b0%d0%ba%d0%be%d0%bc%d1%81%d1%82%d0%b2%d0%be-%d1%81-%d0%ba%d0%be%d0%bc%d0%b0%d0%bd%d0%b4%d0%b0%d0%bc%d0%b8-linux-cygwin
The URL standard uses the US-ASCII character set. This has a serious drawback, since only Latin letters, numbers and a few punctuation marks are allowed. All other characters must be recoded. For example, letters of the Cyrillic alphabet, letters with diacritics, ligatures, hieroglyphs should be transcoded. The encoding is described in RFC 3986 and is called URL-encoding, URLencoded, or percent‐encoding.
Data from web forms when Content-Type is specified as application/x-www-form-urlencoded is also passed in URL encoding.
I am almost sure that you have ever seen messages in this encoding – they are written in capital and small Latin letters, as well as numbers. At the end, there can be one or two equal signs:
In any case, almost certainly you use this encoding almost every day, even without knowing it, since Base64 is very often used by e-mail messages, especially for letters that have files (photos, documents, etc.) attached.
Base64 is a standard for encoding binary data using only 64 ASCII characters. The coding alphabet contains text-digital Latin characters A-Z, a-z and 0-9 (62 characters) and 2 additional characters, depending on the implementation system. Every 3 source bytes are encoded with 4 characters (increase by ¹⁄₃).
This system is widely used in e-mail to represent binary files in the text of the letter (transport coding).
The specified service can also decode from Base64, as well as encode to Base64, but there is a peculiarity: quite often a long Base64 string in an email is split into strings of the same length (for convenience reasons). In the service to which the link is given, you need to remove unnecessary line feeds, that is, the input data must be on one line, otherwise after the first character "new line" the message will be decoded incorrectly.
The incorrectly displayed UTF-8 encoding looks like capital letters N and D with additional lines, fractions of 3/4 are found.
â ÐÑÐ¿Ð¾Ð»Ð½Ð¸ÑÐµ Ð²Ñ Ð¾Ð´ Ð¸Ð»Ð¸ Ð·Ð°ÑÐµÐ³Ð¸ÑÑÑÐ¸ÑÑÐ¹ÑÐµÑÑ
In this case, the UTF-8 encoding is processed as the ISO-8859-1 or CP1258 encoding. Using the specified service, such strings can be decrypted if you copy them to Quoted-printable windows or URL.
UTF-8 encoding processed as ANSI resembles strings from capital letters P, C and small letters r and s:
РґРѕР±Р°РІРёС‚СЊ С‡С‘СЂРЅС‹Р№ СЃРїРёСЃРѕРє
Escaped sequences are especially often seen in the source code of programs. If you want to know what a string written this way means, then copy it to one of the fields:
- \uXXXX - backslash and u followed by letters and numbers (hexadecimal number)
- \UXXXXXXXX - backslash and big U followed by letters and numbers (hexadecimal number)
- &#DDDD; - an ampersand sign and a number sign followed by four numbers
- &#xXXXX; - ampersand sign, number sign and x followed by a hexadecimal number
- \xXX is the backslash and x followed by a hexadecimal number
- \OOO is the backslash and large O, followed by a number in the octal number system.
Such strings are used in situations where there is a danger that the lines written in the letters of the national alphabet will be distorted (for example, the browser will not correctly understand the encoding of the web page):
<script> alert ("\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82") </script>
How to convert to escaped sequences
On the same page, as you might already guess, you can also convert to the escaped sequence of characters.
How to change the encoding of a string or document without third-party services
Double Commander when viewing text files (to do this, select the file and press F3) or while editing (F4) you can change the encoding after opening, and also save with a different encoding.
Another option for those with Linux is to use the command line. With it, you can find out the encoding of an incomprehensible string, and also change it to the correct one.
To reveal encoding of a file:
To reveal encoding of a atring with chardet:
echo $'\xed\xe5 \xed\xe0 \xe9\xe4\xe5\xed\xf3\xea \xe0\xe7\xe0\xed\xed\xfb\xe9\xec\xee\xe4\xf3\xeb\xfc' | chardet <stdin>: windows-1251 with confidence 0.970067019236
To reveal encoding of a atring with enca:
echo $'\xed\xe5 \xed\xe0 \xe9\xe4\xe5\xed\xf3\xea \xe0\xe7\xe0\xed\xed\xfb\xe9\xec\xee\xe4\xf3\xeb\xfc' | enca -L ru MS-Windows code page 1251 LF line terminators
To reveal encoding of a atring with uchardet:
echo $'\xed\xe5 \xed\xe0 \xe9\xe4\xe5\xed\xf3\xea \xe0\xe7\xe0\xed\xed\xfb\xe9\xec\xee\xe4\xf3\xeb\xfc' | uchardet
To change encoding of a file with iconv:
enca -i mypoem_draft.txt cat mypoem_draft.txt iconv -f CP1251 -t UTF-8//TRANSLIT mypoem_draft.txt -o poem.txt cat poem.txt enca -i poem.txt
To change encoding of a file with enca:
enca -x UTF-8 mypoem_draft.txt
Converting a string to the correct encoding with iconv:
echo $'\xed\xe5 \xed\xe0\xe9\xe4\xe5\xed \xf3\xea\xe0\xe7\xe0\xed\xed\xfb\xe9 \xec\xee\xe4\xf3\xeb\xfc' | iconv -f 'Windows-1251' не найден указанный модуль
Last Updated on
- How to install web server on Windows 10 (Apache 2.4, PHP 7, MySQL 8.0 and phpMyAdmin) (50%)
- Kali Linux Rolling post install tips (50%)
- How to install OWASP Mutillidae II and Damn Vulnerable Web Application (DVWA) in Kali Linux (50%)
- How to install AMD / ATI Catalyst drivers AKA Crimson 15.12 in BlackArch / Arch Linux with kernel 4.7, 4.8, 4.9, 4.10, 4.11 (50%)
- How to install and run VLC, Google Chrome, and Chromium on Kali Linux (50%)
- In Kali Linux, the default user will be a normal (not root) user (RANDOM - 50%)