编程老鸟请注意

我发现getwc(fp)速度很慢。看了下文档:

The getwc() function or macro functions identically to fgetwc(). It may be implemented as a macro, and may evaluate its argument more than once. There is no reason ever to use it.

这倒罢了,字符转换比较麻烦,不能用宏实现可以想像。

过去的书上说getc是宏呢?

getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.

用下面的程序试了下:

~$ cat t.cpp
#include <stdio.h>
int xxx = getc(stdin);
int*	ppp = NULL;

$ gcc -E t.cpp
# 2 "t.cpp"
int xxx = getc(
# 2 "t.cpp" 3 4
              stdin
# 2 "t.cpp"
                   );
int* ppp = 
# 3 "t.cpp" 3 4
          __null
# 3 "t.cpp"
              ;

getc也不是宏。

处理几个G的语料,调用getc几十亿次?!

The fgetws() function is the wide-character equivalent of the fgets(3) function.

我觉得搞NLP的基本上应该把语料都转成UTF-16编码,一次费事,次次受益。

但是GB18030里的生僻字用16位表示不了。

posted @ 2025-11-11 20:18  华容道专家  阅读(3)  评论(0)    收藏  举报