WideCharToMultiByte函数祥解

Posted on 2008-03-07 23:46  少林  阅读(6040)  评论(0编辑  收藏  举报

WideCharToMultiByte

The WideCharToMultiByte function maps a wide-character string to a new character string. The new character string is not necessarily from a multibyte character set.

int WideCharToMultiByte(
UINT CodePage,            // code page
  DWORD dwFlags,            // performance and mapping flags
  LPCWSTR lpWideCharStr,    // wide-character string
  int cchWideChar,          // number of chars in string.
  LPSTR lpMultiByteStr,     // buffer for new string
  int cbMultiByte,          // size of buffer
  LPCSTR lpDefaultChar,     // default for unmappable chars
  LPBOOL lpUsedDefaultChar  // set when default char used
);

Parameters

CodePage
[in] Specifies the code page used to perform the conversion. This parameter can be given the value of any code page that is installed or available in the system. For a list of code pages, see Code Page Identifiers. You can also specify one of the following values.
Value Meaning
CP_ACP ANSI code page
CP_MACCP Macintosh code page
CP_OEMCP OEM code page
CP_SYMBOL Windows 2000/XP: Symbol code page (42)
CP_THREAD_ACP Windows 2000/XP: Current thread's ANSI code page
CP_UTF7 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-7. When this is set, lpDefaultChar and lpUsedDefaultChar must be NULL
CP_UTF8 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-8. When this is set, dwFlags must be zero and both lpDefaultChar and lpUsedDefaultChar must be NULL.

Windows 95: Under the Microsoft Layer for Unicode, WideCharToMultiByte also supports CP_UTF7 and CP_UTF8.

dwFlags
[in] Specifies the handling of unmapped characters. The function performs more quickly when none of these flags is set. The following flag constants are defined.
Value Meaning
WC_NO_BEST_FIT_CHARS Windows 98/Me and Windows 2000/XP: Any Unicode characters that do not translate directly to multibyte equivalents are translated to the default character (see lpDefaultChar parameter). In other words, if translating from Unicode to multibyte and back to Unicode again does not yield the exact same Unicode character, the default character is used.

This flag can be used by itself or in combination with the other dwFlag options.

WC_COMPOSITECHECK Convert composite characters to precomposed characters.
WC_DISCARDNS Discard nonspacing characters during conversion.
WC_SEPCHARS Generate separate characters during conversion. This is the default conversion behavior.
WC_DEFAULTCHAR Replace exceptions with the default character during conversion.

When WC_COMPOSITECHECK is specified, the function converts composite characters to precomposed characters. A composite character consists of a base character and a nonspacing character, each having different character values. A precomposed character has a single character value for a base/nonspacing character combination. In the character , the e is the base character, and the accent grave mark is the nonspacing character.

When an application specifies WC_COMPOSITECHECK, it can use the last three flags in this list (WC_DISCARDNS, WC_SEPCHARS, and WC_DEFAULTCHAR) to customize the conversion to precomposed characters. These flags determine the function's behavior when there is no precomposed mapping for a base/nonspace character combination in a wide-character string. These last three flags can only be used if the WC_COMPOSITECHECK flag is set.

The function's default behavior is to generate separate characters (WC_SEPCHARS) for unmapped composite characters.

For the code pages in the following table, dwFlags must be zero, otherwise the function fails with ERROR_INVALID_FLAGS.

50220

50221

50222

50225

50227

50229

52936

54936

57002 through 57011

65000 (UTF7)

65001 (UTF8)

42 (Symbol)


lpWideCharStr
[in] Points to the wide-character string to be converted.
cchWideChar
[in] Specifies the number of wide characters in the string pointed to by the lpWideCharStr parameter. If this value is -1, the string is assumed to be null-terminated and the length is calculated automatically. The length will include the null-terminator.

Note that if cchWideChar is zero the function fails.

lpMultiByteStr
[out] Points to the buffer to receive the translated string.
cbMultiByte
[in] Specifies the size, in bytes, of the buffer pointed to by the lpMultiByteStr parameter. If this value is zero, the function returns the number of bytes required for the buffer. (In this case, the lpMultiByteStr buffer is not used.)
lpDefaultChar
[in] Points to the character used if a wide character cannot be represented in the specified code page. If this parameter is NULL, a system default value is used. To obtain the system default character which is used if a wide character cannot be represented in the specified code page, use the GetCPInfo or GetCPInfoEx function. The function is faster when both lpDefaultChar and lpUsedDefaultChar are NULL.

For the code pages mentioned in dwFlags, lpDefaultChar must be NULL, otherwise the function fails with ERROR_INVALID_PARAMETER.

lpUsedDefaultChar
[in] Points to a flag that indicates whether a default character was used. The flag is set to TRUE if one or more wide characters in the source string cannot be represented in the specified code page. Otherwise, the flag is set to FALSE. This parameter may be NULL. The function is faster when both lpDefaultChar and lpUsedDefaultChar are NULL.

For the code pages mentioned in dwFlags, lpUsedDefaultChar must be NULL, otherwise the function fails with ERROR_INVALID_PARAMETER.

Return Values

If the function succeeds, and cbMultiByte is nonzero, the return value is the number of bytes written to the buffer pointed to by lpMultiByteStr. The number includes the byte for the null terminator.

If the function succeeds, and cbMultiByte is zero, the return value is the required size, in bytes, for a buffer that can receive the translated string.

If the function fails, the return value is zero. To get extended error information, call GetLastError. GetLastError may return one of the following error codes:

ERROR_INSUFFICIENT_BUFFER
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER

Remarks

security note Security Alert   Using the WideCharToMultiByte function incorrectly can compromise the security of your application. Calling the WideCharToMultiByte function can easily cause a buffer overrun because the size of the In buffer equals the number of WCHARs in the string, while the size of the Out buffer equals the number of bytes. To avoid a buffer overrun, be sure to specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.

For strings that require validation, such as file, resource and user names, always use the WC_NO_BEST_FIT_CHARS flag with WideCharToMultiByte. This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme e.g., symbol for ‘∞’ (infinity) maps to 8 (eight) in some code pages.

WC_NO_BEST_FIT_CHARS is not available on Windows 95 and NT4. If your code must run on these platforms, you can achieve the same effect by round tripping the string using MultiByteToWideChar. Any code point that does not round trip is a best-fit character.

The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns ERROR_INVALID_PARAMETER.

If CodePage is CP_SYMBOL and cbMultiByte is less than cchWideChar, no characters are written to lpMultiByte. Otherwise, if cbMultiByte is less than cchWideChar, cbMultiByte characters are copied to the buffer pointed to by lpMultiByte.

An application can use the lpDefaultChar parameter to change the default character used for the conversion.

As noted earlier, the WideCharToMultiByte function operates most efficiently when both lpDefaultChar and lpUsedDefaultChar are NULL. The following table shows the behavior of WideCharToMultiByte for the four combinations of lpDefaultChar and lpUsedDefaultChar.

lpDefaultChar lpUsedDefaultChar Result
NULL NULL No default checking. This is the most efficient way to use this function.
non-NULL NULL Uses the specified default character, but does not set lpUsedDefaultChar.
NULL non-NULL Uses the system default character and sets lpUsedDefaultChar if necessary.
non-NULL non-NULL Uses the specified default character and sets lpUsedDefaultChar if necessary.

Windows 95/98/Me: WideCharToMultiByte is supported by the Microsoft Layer for Unicode. To use this, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.

Example Code

For an example, see Looking Up a User's Full Name.

Copyright © 2024 少林
Powered by .NET 8.0 on Kubernetes