LCMapStringEx: http://msdn.microsoft.com/en-us/library/windows/desktop/dd318702(v=vs.85).aspx
For a locale specified by name, maps an input character string to another using a specified transformation, or generates a sort key for the input string.
也就是说对某个指定的Locale,将一个输入的字符使用某种转换映射为另外一个字符,或者对输入的字符串产生一个sort key。具体支持哪些转换呢,就要从理解其第2个参数开始:
Flag | Meaning | 注解 |
LCMAP_BYTEREV | Use byte reversal. For example, if the application passes in 0x3450 0x4822, the result is 0x5034 0x2248. | 字节转置,也许在大小端时会用到 |
LCMAP_FULLWIDTH | Use Unicode (wide) characters where applicable. This flag and LCMAP_HALFWIDTH are mutually exclusive. | 转换为全角字符 |
LCMAP_HALFWIDTH | Use narrow characters where applicable. This flag and LCMAP_FULLWIDTH are mutually exclusive. | 转换为半角字符 |
LCMAP_HIRAGANA | Map all katakana characters to hiragana. This flag and LCMAP_KATAKANA are mutually exclusive. | 将平假名转换为片假名 |
LCMAP_KATAKANA | Map all hiragana characters to katakana. This flag and LCMAP_HIRAGANA are mutually exclusive. | 将片假名转换为平假名 |
LCMAP_LINGUISTIC_CASING | Use linguistic rules for casing, instead of file system rules (default). This flag is valid with LCMAP_LOWERCASE or LCMAP_UPPERCASE only. |
土耳其语时会用到,不太清楚,可以看看 http://blogs.msdn.com/b/michkap/archive/2004/12/03/274288.aspx |
LCMAP_LOWERCASE | For locales and scripts capable of handling uppercase and lowercase, map all characters to lowercase. | 转换为小写 |
LCMAP_SIMPLIFIED_CHINESE | Map traditional Chinese characters to simplified Chinese characters. This flag and LCMAP_TRADITIONAL_CHINESE are mutually exclusive. | 将繁体中文转换为简体中文 |
LCMAP_SORTKEY | Produce a normalized sort key. If the LCMAP_SORTKEY flag is not specified, the function performs string mapping. For details of sort key generation and string mapping, see the Remarks section. | |
LCMAP_TITLECASE | Windows 7: Map all characters to title case, in which the first letter of each major word is capitalized. | 每个单词的第一个字母大写 |
LCMAP_TRADITIONAL_CHINESE | Map simplified Chinese characters to traditional Chinese characters. This flag and LCMAP_SIMPLIFIED_CHINESE are mutually exclusive. | 将简体中文转换为繁体中文 |
LCMAP_UPPERCASE | For locales and scripts capable of handling uppercase and lowercase, map all characters to uppercase. | 转换为大写 |
下面的Flag可以单独用,互相用,或者跟LCMAP_SORTKEY and/or LCMAP_BYTEREV使用,但是不能跟上面的结合使用。
Flag | Meaning | 注释 |
NORM_IGNORENONSPACE |
Ignore nonspacing characters. For many scripts (notably Latin scripts), NORM_IGNORENONSPACE coincides with LINGUISTIC_IGNOREDIACRITIC. Note NORM_IGNORENONSPACE ignores any secondary distinction, whether it is a diacritic or not. Scripts for Korean, Japanese, Chinese, and Indic languages, among others, use this distinction for purposes other than diacritics. LINGUISTIC_IGNOREDIACRITIC causes the function to ignore only actual diacritics, instead of ignoring the second sorting weight. |
|
NORM_IGNORESYMBOLS | Ignore symbols and punctuation |
下面的参数只能跟LCMAP_SORTKEY一起使用
Flag | Meaning | 注释 |
LINGUISTIC_IGNORECASE | Ignore case, as linguistically appropriate. | 如果语言适用,忽略大小写 |
LINGUISTIC_IGNOREDIACRITIC |
Ignore nonspacing characters, as linguistically appropriate. Note This flag does not always produce predictable results when used with decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. |
如果语言适用,忽略非空白字符 |
NORM_IGNORECASE |
Ignore case. For many scripts (notably Latin scripts), NORM_IGNORECASE coincides with LINGUISTIC_IGNORECASE. Note NORM_IGNORECASE ignores any tertiary distinction, whether it is actually linguistic case or not. For example, in Arabic and Indic scripts, this flag distinguishes alternate forms of a character, but the differences do not correspond to linguistic case. LINGUISTIC_IGNORECASE causes the function to ignore only actual linguistic casing, instead of ignoring the third sorting weight. Note For double-byte character set (DBCS) locales, NORM_IGNORECASE has an effect on all Unicode characters as well as narrow (one-byte) characters, including Greek and Cyrillic characters. |
忽略大小写 |
NORM_IGNOREKANATYPE | Do not differentiate between hiragana and katakana characters. Corresponding hiragana and katakana characters compare as equal. | 不区分平假名和片假名 |
NORM_IGNOREWIDTH | Ignore the difference between half-width and full-width characters, for example, C a t == cat. The full-width form is a formatting distinction used in Chinese and Japanese scripts. | 不区分半角和全角 |
NORM_LINGUISTIC_CASING | Use linguistic rules for casing, instead of file system rules (default). | |
SORT_DIGITSASNUMBERS | Windows 7: Treat digits as numbers during sorting, for example, sort "2" before "10". | 将数字字符为数字,英语解释的更好 |
SORT_STRINGSORT | Treat punctuation the same as symbols. | 将标点符号作为symbol |
关于generate Sorting key还得看看http://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx,总结入下:
sorting key 是对特定字符串在制定Locale中生成的一种二进制的表示形式,这个二进制的表示可以表达这个字符串在这个Locale中进行Sort的行为,如果要比较两个字符串,可以为他们分别生成sorting key,然后使用memcmp进行比较。
#include <memory> #include <Windows.h> int main(int argc, char **argv) { LPCWSTR pSrc = L"我爱北京天安门"; LPCWSTR pSrc2 = L"我爱北京天安门金山"; LPWSTR pDest = new WCHAR[200]; memset(pDest, 0, sizeof(WCHAR) * 200); int result = LCMapStringEx(L"zh-CN" , LCMAP_SORTKEY|LCMAP_BYTEREV,pSrc, 7, pDest,200,NULL, NULL, 0); int result2 = LCMapStringEx(L"zh-CN" , LCMAP_SORTKEY,pSrc2, 9, pDest,200,NULL, NULL, 0); //int result = LCMapString (0x041d,LCMAP_SORTKEY,pSrc, 7,pDest,11); if(result == 0) { DWORD lasterror = GetLastError(); if(lasterror == ERROR_INVALID_FLAGS) printf("ERROR_INVALID_FLAGS"); else if(lasterror == ERROR_INSUFFICIENT_BUFFER) printf("ERROR_INSUFFICIENT_BUFFER"); else if(lasterror == ERROR_INVALID_PARAMETER) printf("ERROR_INVALID_PARAMETER"); } }
为“我爱北京天安门”生成的sorting key是ce c6 13 c0 11 34 c0 83 0d c6 19 25 cd e6 0d c0 15 10 c8 aa 0d 01 01 01 01 00,而为“我爱北京天安门金山”生成的sorting key是ce c6 13 c0 11 34 c0 83 0d c6 19 25 cd e6 0d c0 15 10 c8 aa 0d c6 0e 2e cc 82 0d 01 01 01 01 00