[C] wide char in C, how to use it
In C, a standard char is only 8 bits (1 byte), which is too small to hold "wide" characters like the em dash (—) or emojis. To handle these, you have to move into the world of Wide Characters (wchar_t) and Multibyte Strings (UTF-8).
Here is how you implement them.
1. The wchar_t Approach (Fixed Width)
If you want to treat the em dash as a single "unit" in C, you use the wchar_t type and the L prefix for literals.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
// 1. Critical: Set the locale to the user's default (usually UTF-8)
setlocale(LC_ALL, "");
// 2. Define a wide character using the L prefix
wchar_t em_dash = L'—';
// 3. Print using wprintf and the %lc (wide char) specifier
wprintf(L"The wide dash looks like this: %lc\n", em_dash);
return 0;
}
2. The UTF-8 Approach (Modern Standard)
In modern development, we usually keep strings as standard char arrays but encode them as UTF-8. In this case, the em dash actually takes up 3 bytes ($0xE2$, $0x80$, $0x94$).
#include <stdio.h>
int main() {
// A standard string can hold UTF-8, but it treats the dash as 3 "chars"
char *text = "Interrupting—like this.";
printf("%s\n", text);
return 0;
}
Key Differences to Remember
| Feature | char (UTF-8) | wchar_t |
| Size | 1 byte | 2 or 4 bytes (Platform dependent) |
| Storage | Uses 3 bytes for an em dash | Uses 1 "slot" for an em dash |
| Best For | Saving to files, Web/Network API | Internal text processing/indexing |
| Print Function | printf("%s") |
wprintf(L"%ls") |
Why setlocale is Mandatory
By default, a C program starts in the "C" locale, which only understands basic ASCII. If you don't call setlocale(LC_ALL, "");, your program won't know how to translate the wide bits in memory into something your terminal can display, usually resulting in a ? or nothing at all appearing on screen.
Are you planning to process these characters (like counting how many dashes are in a file), or do you just need to print them to the screen?

浙公网安备 33010602011771号