编程老鸟请注意
我发现getwc(fp)速度很慢。看了下文档:
The getwc() function or macro functions identically to fgetwc(). It may be implemented as a macro, and may evaluate its argument more than once. There is no reason ever to use it.
这倒罢了,字符转换比较麻烦,不能用宏实现可以想像。
过去的书上说getc是宏呢?
getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.
用下面的程序试了下:
~$ cat t.cpp
#include <stdio.h>
int xxx = getc(stdin);
int* ppp = NULL;
$ gcc -E t.cpp
# 2 "t.cpp"
int xxx = getc(
# 2 "t.cpp" 3 4
stdin
# 2 "t.cpp"
);
int* ppp =
# 3 "t.cpp" 3 4
__null
# 3 "t.cpp"
;
getc也不是宏。
处理几个G的语料,调用getc几十亿次?!
The fgetws() function is the wide-character equivalent of the fgets(3) function.
我觉得搞NLP的基本上应该把语料都转成UTF-16编码,一次费事,次次受益。
但是GB18030里的生僻字用16位表示不了。

浙公网安备 33010602011771号