[C] wide char in C, how to use it

In C, a standard char is only 8 bits (1 byte), which is too small to hold "wide" characters like the em dash (—) or emojis. To handle these, you have to move into the world of Wide Characters (wchar_t) and Multibyte Strings (UTF-8).

Here is how you implement them.

1. The `wchar_t` Approach (Fixed Width)

If you want to treat the em dash as a single "unit" in C, you use the wchar_t type and the L prefix for literals.

C 

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
    // 1. Critical: Set the locale to the user's default (usually UTF-8)
    setlocale(LC_ALL, "");

    // 2. Define a wide character using the L prefix
    wchar_t em_dash = L'—'; 

    // 3. Print using wprintf and the %lc (wide char) specifier
    wprintf(L"The wide dash looks like this: %lc\n", em_dash);

    return 0;
}

2. The UTF-8 Approach (Modern Standard)

In modern development, we usually keep strings as standard char arrays but encode them as UTF-8. In this case, the em dash actually takes up 3 bytes ( $0xE2$

C 

#include <stdio.h>

int main() {
    // A standard string can hold UTF-8, but it treats the dash as 3 "chars"
    char *text = "Interrupting—like this.";
    
    printf("%s\n", text);
    return 0;
}

Key Differences to Remember

Feature	char (UTF-8)	wchar_t
Size	1 byte	2 or 4 bytes (Platform dependent)
Storage	Uses 3 bytes for an em dash	Uses 1 "slot" for an em dash
Best For	Saving to files, Web/Network API	Internal text processing/indexing
Print Function	`printf("%s")`	`wprintf(L"%ls")`

Why `setlocale` is Mandatory

By default, a C program starts in the "C" locale, which only understands basic ASCII. If you don't call setlocale(LC_ALL, "");, your program won't know how to translate the wide bits in memory into something your terminal can display, usually resulting in a ? or nothing at all appearing on screen.

Are you planning to process these characters (like counting how many dashes are in a file), or do you just need to print them to the screen?

posted @ 2026-01-29 03:05 profesor 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部

profesor

[C] wide char in C, how to use it

1. The wchar_t Approach (Fixed Width)

2. The UTF-8 Approach (Modern Standard)

Key Differences to Remember

Why setlocale is Mandatory

公告

1. The `wchar_t` Approach (Fixed Width)

Why `setlocale` is Mandatory