


The name is derived from: Universal Coded Character Set + Transformation Format—8-bit.


UTF-8 uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters.
8bit = 1byte (00-FF)


The encoding is variable-length and uses 8-bit code units.
All code points in the BMP are accessed as a single code unit in UTF-16 encoding and can be encoded in one, two or three bytes in UTF-8.


UTF-8 is the dominant character encoding for the World Wide Web.
The Internet Mail Consortium (IMC) recommends that all e-mail programs be able to display and create mail using UTF-8,[5] and the W3C recommends UTF-8 as the default encoding in XML and HTML.

posted on 2016-08-02 21:44  zno2  阅读(137)  评论(0编辑  收藏  举报
