打开网址http://inamidst.com/stuff/unidata/

可以查看unicode以及对应的字符:

点击选择一个字符后,会转到http://www.fileformat.info这个网址,这个网站上会显示该字符的详细信息,包Unicode Data,Encodings,在html/c/c++/java/python 语言中的编码信息。
比如下面是美元符号的信息:
| Unicode Data | |
|---|---|
| Name | DOLLAR SIGN |
| Block | Basic Latin |
| Category | Symbol, Currency [Sc] |
| Combine | 0 |
| BIDI | European Number Terminator [ET] |
| Mirror | N |
| Index entries |
milreis DOLLAR SIGN escudo |
| Comments |
milreis, escudo glyph may have one or two vertical bars other currency symbol characters: U+20A0-U+20B8 |
| See Also |
currency sign U+00A4 heavy dollar sign U+1F4B2 |
| Version | Unicode 1.1.0 (June, 1993) |
| Encodings | |
|---|---|
| HTML Entity (decimal) | $ |
| HTML Entity (hex) | $ |
| How to type in Microsoft Windows |
Alt +0024 Alt 036 Alt 36 |
| UTF-8 (hex) | 0x24 (24) |
| UTF-8 (binary) | 00100100 |
| UTF-16 (hex) | 0x0024 (0024) |
| UTF-16 (decimal) | 36 |
| UTF-32 (hex) | 0x00000024 (0024) |
| UTF-32 (decimal) | 36 |
| C/C++/Java source code | "\u0024" |
| Python source code | u"\u0024" |
| More... | |
| Java Data | |
|---|---|
| string.toUpperCase() | $ |
| string.toLowerCase() | $ |
| Character.UnicodeBlock | BASIC_LATIN |
| Character.charCount() | 1 |
| Character.getDirectionality() | DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR [5] |
| Character.getNumericValue() | -1 |
| Character.getType() | 26 |
| Character.isDefined() | Yes |
| Character.isDigit() | No |
| Character.isIdentifierIgnorable() | No |
| Character.isISOControl() | No |
| Character.isJavaIdentifierPart() | Yes |
| Character.isJavaIdentifierStart() | Yes |
| Character.isLetter() | No |
| Character.isLetterOrDigit() | No |
| Character.isLowerCase() | No |
| Character.isMirrored() | No |
| Character.isSpaceChar() | No |
| Character.isSupplementaryCodePoint() | No |
| Character.isTitleCase() | No |
| Character.isUnicodeIdentifierPart() | No |
| Character.isUnicodeIdentifierStart() | No |
| Character.isUpperCase() | No |
| Character.isValidCodePoint() | Yes |
| Character.isWhitespace() | No |
wiki 上code point的解释:
plane, and 16 supplementary planes), each with 65,536 (= 216)
code points.
Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.
在Python中,可以通过unicode name的取得相应的字符,如可以通过名字'dollar sign',
来得到dollar符号:
----------------------------------------------------------------------------------------------------------
>>> dollar = u"\N{dollar sign}"
>>> print dollar
$
>>> print dollar
$
----------------------------------------------------------------------------------------------------------
浙公网安备 33010602011771号