|
Print out the raw bytes of the string in hexadecimal. Then try decoding those bytes by hand, following a description of how UTF-8 works.
Code points (Unicode’s generic word for “character”, given that some languages don’t use “characters” in the English sense) from 00 to 7F represent themselves, so that all ASCII strings are UTF-8 strings. From 80 upwards, UTF-8 uses two or more
bytes to encode each numerical code point, even though the numbers 80 to FF themselves fit into one byte.
If you manually encode and decode the characters r, é, s, u and m you will get the hang of it.
A clever encoding system, designed one evening in a fast food restaurant in New Jersey if memory serves.
On 20 Jun 2022, at 08:56, Budi <budikusasi@gmail.com> wrote:
|