What is UTF-8?

UTF stand for Unicode Transformation Format. The 8 refers to the fact that each character is stored in as little as one or as many as 4 bytes (a byte is made up of 8 bits). UTF-8 is an encoding scheme for Unicode characters. Its encoding method is compatible with standard ASCII encoded characters. It is gaining more and more mainstream use as the standard for character encoding. UTF-8 is able to be compatible with standard ASCII by encoding standard ascii values less than 128 using the 7 lowest order bits of a byte. If the value is higher than 127, then the highest order bit(s) are set to indicate how many bytes will be used to store the character.

Browse Deleted Files
Click Image to Enlarge

Example: A two byte UTF-8 encoded character would have 110 starting from the highest order bit. The first tow bits indicate that two bytes will be used. The 0 indicates the end of the signaling bits. That would leave 5 bits for the character value. The second byte will have a 1 and a 0 and the rest used for the remainder of the character value. The second third and fourth byte of any multibyte UTF-8 encoded character will have a 1 and a 0 and 6 bits for the character value.