libc 之 locales
Table of Contents
1 Locales
软件的国际化,意味着使软件符合用户的习惯。 ISO C 中,通过 locale 来实现这一目的。
每一台机器可以支持多个 locales , 用户可以通过环境变量来设置程序将要使用的 locale.
1.1 Locale 的作用
每个 locale 均由若干为不同目的而定义的规范构成。 这些规范包括:
- 什么样的宽字符序列是合法的,以及如何来解释他们。
- 如何对字符进行分类。
- 本地语言和字符的对照表。
- 如何格式化数字的显示。
- 输出以及错误提示使用何种语言。
- 使用何种语言来回答 yes-or-no questions。
- 使用何种语言来应对复杂的用户输入。
1.2 Locale 的选择
选择 (设置) Locale 的最简方法是设置环境变量: LANG , 该方法将会选择这个 locale 的所有规范。例如:
[yyc@localhost ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
同时,我们也可以单独设置一个 locale 中的某个单独的规范, 例如早期的 fcitx (Linux 下的中文输入法), 要求 LC_CTYPE 必须为 GB2312 , 则可以进行如下设置:
[yyc@localhost ~]$ export LC_CTYPE="zh_CN.GB2312" [yyc@localhost ~]$ locale LANG=en_US.UTF-8 LC_CTYPE=zh_CN.GB2312 LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
一个系统不一定支持所有的 locales , 但所有的系统都需要支持一个标准的 Locale —— "C" 或者 "POSIX" 。
1.3 Locales 影响到的 Activities 的类别
locale 定义的规范可以分为若干类别,这些类别如下, 其中,每个类别的名字既可以作为环境变量名而在环境变量中找到, 也可以作为宏名在函数 setlocale 中作为参数。
- LC_COLLATE
影响字符串的校对。
- LC_TYPE
影响字符的分类,以及将字符转换成多字节和宽字符。
- LC_MONETARY
影响估计货币的格式化输出。
- LC_NUMERIC
影响数字的格式化输出。
- LC_TIME
影响日期和时间的格式化输出。
- LC_MESSAGES
影响用户接口中消息中使用的语言及用于匹配 yes-or-no questions 答案的正则表达式。
- LC_ALL
该符号并非环境变量,用在 setlocale() 中,用于设置上述所有的类别。
- LANG
如果设置了该环境变量,则该环境变量的值会影响上述所有的类别, 除非用户又显示地、重新设置了上述类别中的某一个。
1.4 Locale 的设置
由 C Family 编写的应用程序启动时可以自动继承通过环境变量设置的 locale , 但这种继承仅限于应用程序本身,对应用程序所使用的库不起作用 —— 这些库提供的函数将默认使用标准库中的 C Locale 。
我们可以通过 setlocale() 来通知库函数使用由环境变量指定的 locale:
setlocale(LC_ALL, "");
setlocale() 还可以用来指定 locale 中的某个单独的规范:
char * setlocale (int CATEGORY, const char *LOCALE);
该函数用于将当前 Locale 中的 CATEGORY 设置为 LOCALE 。
- 如果 *LOCALE 为 NULL, 则返回当前使用的 LOCALE;
- 如果 *LOCALE 不为 NULL且合法, 则返回当设置成功后使用的 LOCALE;
- 如果 *LOCALE 不为 NULL且不合法, 则当前 locale 不变,函数返回 NULL。
1.5 标准 Locales
前面提到,并非所有的系统都支持所有的 locales , 但是所有的系统都必须支持若干标准的 locales, 这些标准 Locales 包括:
- C:
由标准 C 指定的 locale , 其属性和行为均符合 ISO C 标准。
- POSIX:
POSIX locale,Linux 下的 POSIX locale 当前与 C 完全一样。 - ""
空 locale ,使用该 locale 的程序会自动使用环境变量中规定的 locale 。locales 的定义和安装通常是由系统管理员完成的。
1.6 Locale 信息的获取
有多种方式可以用于获取 locale 信息, 其中最简单的方法是让 C library 自己去获取, 很多 Library 都可以这样去做。 以 strftime() 为例,同样的代码,在不同的 locale 下,输出会随 locale 而变。
但 有时程序无法自动完成 locale 信息的获取, 此时我们足要自己去做。 用来完成这个目的的函数有两个 localeconv() 和 nl_langinfo() 。 其中,前者是 标准C 提供的,可移植性好,但借口超烂。后者是 Unix 接口, 只要系统遵循 Unix 标准,就可以使用。
1.6.1 蹩脚的 localeconv
localeconv() 同 setlocale() 一样,是由标准 C 提供的,可移植, 但使用代价昂贵,可拓展性差。并且,它接提供了访问 locale 中的 LC_MONETARY 和 LC_NUMERIC , 通用性差。
localeconv() 原型为:
struct lconv * localeconv (void);
该函数返回一个 lconv 结构的指针, lconv 结构中的元素包含了如何在当前 locale 中格式化输出数字和货币的一些信息。 Glibc 中,其定义如下:
/* Structure giving information about numeric and monetary notation. */ struct lconv { /* Numeric (non-monetary) information. */ char *decimal_point; /* Decimal point character. */ char *thousands_sep; /* Thousands separator. */ /* Each element is the number of digits in each group; elements with higher indices are farther left. An element with value CHAR_MAX means that no further grouping is done. An element with value 0 means that the previous element is used for all groups farther left. */ char *grouping; /* Monetary information. */ /* First three chars are a currency symbol from ISO 4217. Fourth char is the separator. Fifth char is '\0'. */ char *int_curr_symbol; char *currency_symbol; /* Local currency symbol. */ char *mon_decimal_point; /* Decimal point character. */ char *mon_thousands_sep; /* Thousands separator. */ char *mon_grouping; /* Like `grouping' element (above). */ char *positive_sign; /* Sign for positive values. */ char *negative_sign; /* Sign for negative values. */ char int_frac_digits; /* Int'l fractional digits. */ char frac_digits; /* Local fractional digits. */ /* 1 if currency_symbol precedes a positive value, 0 if succeeds. */ char p_cs_precedes; /* 1 iff a space separates currency_symbol from a positive value. */ char p_sep_by_space; /* 1 if currency_symbol precedes a negative value, 0 if succeeds. */ char n_cs_precedes; /* 1 iff a space separates currency_symbol from a negative value. */ char n_sep_by_space; /* Positive and negative sign positions: 0 Parentheses surround the quantity and currency_symbol. 1 The sign string precedes the quantity and currency_symbol. 2 The sign string follows the quantity and currency_symbol. 3 The sign string immediately precedes the currency_symbol. 4 The sign string immediately follows the currency_symbol. */ char p_sign_posn; char n_sign_posn; #ifdef __USE_ISOC99 /* 1 if int_curr_symbol precedes a positive value, 0 if succeeds. */ char int_p_cs_precedes; /* 1 iff a space separates int_curr_symbol from a positive value. */ char int_p_sep_by_space; /* 1 if int_curr_symbol precedes a negative value, 0 if succeeds. */ char int_n_cs_precedes; /* 1 iff a space separates int_curr_symbol from a negative value. */ char int_n_sep_by_space; /* Positive and negative sign positions: 0 Parentheses surround the quantity and int_curr_symbol. 1 The sign string precedes the quantity and int_curr_symbol. 2 The sign string follows the quantity and int_curr_symbol. 3 The sign string immediately precedes the int_curr_symbol. 4 The sign string immediately follows the int_curr_symbol. */ char int_p_sign_posn; char int_n_sign_posn; #else char __int_p_cs_precedes; char __int_p_sep_by_space; char __int_n_cs_precedes; char __int_n_sep_by_space; char __int_p_sign_posn; char __int_n_sign_posn; #endif };
具体含义,参考其中注释。
1.6.2 优雅、迅捷的 nl_langinfo
char *nl_langinfo(ln_item ITEM);
nl_langinfo() 用于访问 locale 中的细节,粒度细,速度快。 其中, ITEM 定义在头文件 langinfo.h 中,解释如下:
`CODESET'
`nl_langinfo' returns a string with the name of the coded
character set used in the selected locale.
`ABDAY_1'
`ABDAY_2'
`ABDAY_3'
`ABDAY_4'
`ABDAY_5'
`ABDAY_6'
`ABDAY_7'
`nl_langinfo' returns the abbreviated weekday name. `ABDAY_1'
corresponds to Sunday.
`DAY_1'
`DAY_2'
`DAY_3'
`DAY_4'
`DAY_5'
`DAY_6'
`DAY_7'
Similar to `ABDAY_1' etc., but here the return value is the
unabbreviated weekday name.
`ABMON_1'
`ABMON_2'
`ABMON_3'
`ABMON_4'
`ABMON_5'
`ABMON_6'
`ABMON_7'
`ABMON_8'
`ABMON_9'
`ABMON_10'
`ABMON_11'
`ABMON_12'
The return value is abbreviated name of the month. `ABMON_1'
corresponds to January.
`MON_1'
`MON_2'
`MON_3'
`MON_4'
`MON_5'
`MON_6'
`MON_7'
`MON_8'
`MON_9'
`MON_10'
`MON_11'
`MON_12'
Similar to `ABMON_1' etc., but here the month names are not
abbreviated. Here the first value `MON_1' also corresponds
to January.
`AM_STR'
`PM_STR'
The return values are strings which can be used in the
representation of time as an hour from 1 to 12 plus an am/pm
specifier.
Note that in locales which do not use this time representation
these strings might be empty, in which case the am/pm format
cannot be used at all.
`D_T_FMT'
The return value can be used as a format string for
`strftime' to represent time and date in a locale-specific
way.
`D_FMT'
The return value can be used as a format string for
`strftime' to represent a date in a locale-specific way.
`T_FMT'
The return value can be used as a format string for
`strftime' to represent time in a locale-specific way.
`T_FMT_AMPM'
The return value can be used as a format string for
`strftime' to represent time in the am/pm format.
Note that if the am/pm format does not make any sense for the
selected locale, the return value might be the same as the
one for `T_FMT'.
`ERA'
The return value represents the era used in the current
locale.
Most locales do not define this value. An example of a
locale which does define this value is the Japanese one. In
Japan, the traditional representation of dates includes the
name of the era corresponding to the then-emperor's reign.
Normally it should not be necessary to use this value
directly. Specifying the `E' modifier in their format
strings causes the `strftime' functions to use this
information. The format of the returned string is not
specified, and therefore you should not assume knowledge of
it on different systems.
`ERA_YEAR'
The return value gives the year in the relevant era of the
locale. As for `ERA' it should not be necessary to use this
value directly.
`ERA_D_T_FMT'
This return value can be used as a format string for
`strftime' to represent dates and times in a locale-specific
era-based way.
`ERA_D_FMT'
This return value can be used as a format string for
`strftime' to represent a date in a locale-specific era-based
way.
`ERA_T_FMT'
This return value can be used as a format string for
`strftime' to represent time in a locale-specific era-based
way.
`ALT_DIGITS'
The return value is a representation of up to 100 values used
to represent the values 0 to 99. As for `ERA' this value is
not intended to be used directly, but instead indirectly
through the `strftime' function. When the modifier `O' is
used in a format which would otherwise use numerals to
represent hours, minutes, seconds, weekdays, months, or
weeks, the appropriate value for the locale is used instead.
`INT_CURR_SYMBOL'
The same as the value returned by `localeconv' in the
`int_curr_symbol' element of the `struct lconv'.
`CURRENCY_SYMBOL'
`CRNCYSTR'
The same as the value returned by `localeconv' in the
`currency_symbol' element of the `struct lconv'.
`CRNCYSTR' is a deprecated alias still required by Unix98.
`MON_DECIMAL_POINT'
The same as the value returned by `localeconv' in the
`mon_decimal_point' element of the `struct lconv'.
`MON_THOUSANDS_SEP'
The same as the value returned by `localeconv' in the
`mon_thousands_sep' element of the `struct lconv'.
`MON_GROUPING'
The same as the value returned by `localeconv' in the
`mon_grouping' element of the `struct lconv'.
`POSITIVE_SIGN'
The same as the value returned by `localeconv' in the
`positive_sign' element of the `struct lconv'.
`NEGATIVE_SIGN'
The same as the value returned by `localeconv' in the
`negative_sign' element of the `struct lconv'.
`INT_FRAC_DIGITS'
The same as the value returned by `localeconv' in the
`int_frac_digits' element of the `struct lconv'.
`FRAC_DIGITS'
The same as the value returned by `localeconv' in the
`frac_digits' element of the `struct lconv'.
`P_CS_PRECEDES'
The same as the value returned by `localeconv' in the
`p_cs_precedes' element of the `struct lconv'.
`P_SEP_BY_SPACE'
The same as the value returned by `localeconv' in the
`p_sep_by_space' element of the `struct lconv'.
`N_CS_PRECEDES'
The same as the value returned by `localeconv' in the
`n_cs_precedes' element of the `struct lconv'.
`N_SEP_BY_SPACE'
The same as the value returned by `localeconv' in the
`n_sep_by_space' element of the `struct lconv'.
`P_SIGN_POSN'
The same as the value returned by `localeconv' in the
`p_sign_posn' element of the `struct lconv'.
`N_SIGN_POSN'
The same as the value returned by `localeconv' in the
`n_sign_posn' element of the `struct lconv'.
`INT_P_CS_PRECEDES'
The same as the value returned by `localeconv' in the
`int_p_cs_precedes' element of the `struct lconv'.
`INT_P_SEP_BY_SPACE'
The same as the value returned by `localeconv' in the
`int_p_sep_by_space' element of the `struct lconv'.
`INT_N_CS_PRECEDES'
The same as the value returned by `localeconv' in the
`int_n_cs_precedes' element of the `struct lconv'.
`INT_N_SEP_BY_SPACE'
The same as the value returned by `localeconv' in the
`int_n_sep_by_space' element of the `struct lconv'.
`INT_P_SIGN_POSN'
The same as the value returned by `localeconv' in the
`int_p_sign_posn' element of the `struct lconv'.
`INT_N_SIGN_POSN'
The same as the value returned by `localeconv' in the
`int_n_sign_posn' element of the `struct lconv'.
`DECIMAL_POINT'
`RADIXCHAR'
The same as the value returned by `localeconv' in the
`decimal_point' element of the `struct lconv'.
The name `RADIXCHAR' is a deprecated alias still used in
Unix98.
`THOUSANDS_SEP'
`THOUSEP'
The same as the value returned by `localeconv' in the
`thousands_sep' element of the `struct lconv'.
The name `THOUSEP' is a deprecated alias still used in Unix98.
`GROUPING'
The same as the value returned by `localeconv' in the
`grouping' element of the `struct lconv'.
`YESEXPR'
The return value is a regular expression which can be used
with the `regex' function to recognize a positive response to
a yes/no question. The GNU C library provides the `rpmatch'
function for easier handling in applications.
`NOEXPR'
The return value is a regular expression which can be used
with the `regex' function to recognize a negative response to
a yes/no question.
`YESSTR'
The return value is a locale-specific translation of the
positive response to a yes/no question.
Using this value is deprecated since it is a very special
case of message translation, and is better handled by the
message translation functions (*note Message Translation::).
The use of this symbol is deprecated. Instead message
translation should be used.
`NOSTR'
The return value is a locale-specific translation of the
negative response to a yes/no question. What is said for
`YESSTR' is also true here.
The use of this symbol is deprecated. Instead message
translation should be used.

浙公网安备 33010602011771号