博客园  :: 首页  :: 联系 :: 管理

lisp的本质[4]

Posted on 2011-08-18 09:36  雪庭  阅读(215)  评论(0)    收藏  举报

C Macros Reloaded

重新审视C语言的宏

 

By now you must be tired of all the XML talk. I'm tired of it as well. It's time

to take a break from all the trees, s-expressions, and Ant business. Instead,
let's go back to every programmer's roots. It's time to talk about C
preprocessor. What's C got to do with anything, I hear you ask? Well, we now
know enough to get into metaprogramming and discuss code that writes other
code. Understanding this tends to be hard since all tutorials discuss it in
terms of languages that you don't know. But there is nothing hard about the
concept. I believe that a metaprogramming discussion based on C will make the
whole thing much easier to understand. So, let's see (pun intended).

到了这里, 对XML的讨论你大概都听累了, 我都讲累了。我们先停一停, 把树, s表达式,
Ant这些东西先放一放, 我们来说说C的预处理器。一定有人问了, 我们的话题和C有什么
关系? 我们已经知道了很多关于元编程的事情, 也探讨过专门写代码的代码。理解这问题
有一定难度, 因为相关讨论文章所使用的编程语言, 都是你们不熟悉的。但是如果只论概
念的话, 就相对要简单一些。我相信, 如果以C语言做例子来讨论元编程, 理解起来一定
会容易得多。好, 我们接着看。

Why would anyone want to write a program that writes programs? How can we use
something like this in the real world? What on Earth is metaprogramming, anyway?
You already know all the answers, you just don't know it yet. In order to unlock
the hidden vault of divine knowledge let's consider a rather mundane task of
simple database access from code. We've all been there. Writing SQL queries all
over the code to modify data within tables turns into repetitive hell soon
enough. Even with the new C# 3.0 LINQ stuff this is a huge pain. Writing a full
SQL query (albeit with a nice built in syntax) to get someone's name or to
modify someone's address isn't exactly a programmer's idea of comfort. What do
we do to solve these problems? Enter data access layers.
 
一个问题是, 为什么要用代码来写代码呢? 在实际的编程中, 怎样做到这一点呢? 到底元
编程是什么意思? 你大概已经听说过这些问题的答案, 但是并不懂得其中缘由。为了揭示
背后的真理, 我们来看一下一个简单的数据库查询问题。这种题目我们都做过。比方说,
直接在程序码里到处写SQL语句来修改表(table)里的数据, 写多了就非常烦人。即便用
C#3.0的LINQ, 仍然不减其痛苦。写一个完整的SQL查询(尽管语法很优美)来修改某人的地
址, 或者查找某人的名字, 绝对是件令程序员倍感乏味的事情, 那么我们该怎样来解决这
个问题? 答案就是: 使用数据访问层。

The idea is simple enough. You abstract database access (at least trivial
queries, anyway) by creating a set of classes that mirror the tables in the
database and use accessor methods to execute actual queries. This simplifies
development tremendously - instead of writing SQL queries we make simple method
calls (or property assignments, depending on your language of choice). Anyone
who has ever used even the simplest of data access layers knows how much time it
can save. Of course anyone who has ever written one knows how much time it can
kill - writing a set of classes that mirror tables and convert accessors to SQL
queries takes a considerable chunk of time. This seems especially silly since
most of the work is manual: once you figure out the design and develop a
template for your typical data access class you don't need to do any
thinking. You just write code based on the same template over and over and over
and over again. Many people figured out that there is a better way - there are
plenty of tools that connect to the database, grab the schema, and write code
for you based on a predefined (or a custom) template.
 
概念挺简单, 其要点是把数据访问的内容(至少是那些比较琐碎的部分)抽象出来, 用类来
映射数据库的表, 然后用访问对象属性访问器(accessor)的办法来间接实现查询。这样就
极大地简化了开发工作量。我们用访问对象的方法(或者属性赋值, 这要视你选用的语言
而定)来代替写SQL查询语句。凡是用过这种方法的人, 都知道这很节省时间。当然, 如果
你要亲自写这样一个抽象层, 那可是要花非常多的时间的--你要写一组类来映射表, 把属
性访问转换为SQL查询, 这个活相当耗费精力。用手工来做显然是很不明智的。但是一旦
你有了方案和模板, 实际上就没有多少东西需要思考的。你只需要按照同样的模板一次又
一次重复编写相似代码就可以了。事实上很多人已经发现了更好的方法, 有一些工具可以
帮助你连接数据库, 抓取数据库结构定义(schema), 按照预定义的或者用户定制的模板来
自动编写代码。

Anyone who has ever used such a tool knows what an amazing time saver it can
be. In a few clicks you connect the tool to the database, get it to generate the
data access layer source code, add the files to your project and voil脿 - ten
minutes worth of work do a better job than hundreds of man-hours that were
required previously. What happens if your database schema changes? Well, you
just have to go through this short process again. Of course some of the best
tools let you automate this - you simply add them as a part of your build step
and every time you compile your project everything is done for you
automatically. This is perfect! You barely have to do anything at all. If the
schema ever changes your data access layer code updates automatically at compile
time and any obsolete access in your code will result in compiler errors!
 
如果你用过这种工具, 你肯定会对它的神奇效果深为折服。往往只需要鼠标点击数次, 就
可以连接到数据库, 产生数据访问源码, 然后把文件加入到你的工程里面, 十几分钟的工
作, 按照往常手工方式来作的话, 也许需要数百个小时人工(man-hours)才能完成。可是,
如果你的数据库结构定义后来改变了怎么办? 那样的话, 你只需把这个过程重复一遍就可
以了。甚至有一些工具能自动完成这项变动工作。你只要把它作为工程构造的一部分, 每
次编译工程的时候, 数据库部分也会自动地重新构造。这真的太棒了。你要做的事情基本
上减到了0。如果数据库结构定义发生了改变, 并在编译时自动更新了数据访问层的代码,
那么程序中任何使用过时的旧代码的地方, 都会引发编译错误。

Data access layers are one good example, but there are plenty of others. From
boilerplate GUI code, to web code, to COM and CORBA stubs, to MFC and ATL, -
there are plenty of examples where the same code is written over and over
again. Since writing this code is a task that can be automated completely and a
programmer's time is far more expensive than CPU time, plenty of tools have been
created that generate this boilerplate code automatically. What are these tools,
exactly? Well, they are programs that write programs. They perform a simple task
that has a mysterious name of metaprogramming. That's all there is to it.
 
数据访问层是个很好的例子, 这样的例子还有好多。从GUI样板代码, WEB代码, COM和
CORBA存根, 以及MFC和ATL等等。在这些地方, 都是有好多相似代码多次重复。既然这些
代码有可能自动编写, 而程序员时间又远远比CPU时间昂贵, 当然就产生了好多工具来自
动生成样板代码。这些工具的本质是什么呢? 它们实际上就是制造程序的程序。它们有一
个神秘的名字, 叫做元编程。所谓元编程的本义, 就是如此。

We could create and use such tools in millions of scenarios but more often than
not we don't. What it boils down to is a subconscious calculation - is it worth
it for me to create a separate project, write a whole tool to generate
something, and then use it, if I only have to write these very similar pieces
about seven times? Of course not. Data access layers and COM stubs are written
hundreds, thousands of times. This is why there are tools for them. For similar
pieces of code that repeat only a few times, or even a few dozen times, writing
code generation tools isn't even considered. The trouble to create such a tool
more often than not far outweighs the benefit of using one. If only creating
such tools was much easier, we could use them more often, and perhaps save many
hours of our time. Let's see if we can accomplish this in a reasonable manner.
 
元编程本来可以用到无数多的地方, 但实际上使用的次数却没有那么多。归根结底, 我们
心里还是在盘算, 假设重复代码用拷贝粘贴的话, 大概要重复6,7次, 对于这样的工作量,
值得专门建立一套生成工具吗? 当然不值得。数据访问层和COM存根往往需要重用数百次,
甚至上千次, 所以用工具生成是最好的办法。而那些仅仅是重复几次十几次的代码, 是没
有必要专门做工具的。不必要的时候也去开发代码生成工具, 那就显然过度估计了代码生
成的好处。当然, 如果创建这类工具足够简单的话, 还是应当尽量多用, 因为这样做必然
会节省时间。现在来看一下有没有合理的办法来达到这个目的。

Surprisingly C preprocessor comes to the rescue. We've all used it in C and
C++. On occasion we all wish Java had it. We use it to execute simple
instructions at compile time to make small changes to our code (like selectively
removing debug statements). Let's look at a quick example:
 
现在, C预处理器要派上用场了。我们都用过C/C++的预处理器, 我们用它执行简单的编译
指令, 来产生简单的代码变换(比方说, 设置调试代码开关), 看一个例子:
 
#define triple(X) X+X+X

What does this line do? It's a simple instruction written in the preprocessor
language that instructs it to replace all instances of triple(X) with X + X +
X. For example all instances of 'triple(5)' will be replaced with '5 + 5 + 5'
and the resulting code will be compiled by the C compiler. We're really doing a
very primitive version of code generation here. If only C preprocessor was a
little more powerful and included ways to connect to the database and a few more
simple constructs, we could use it to develop our data access layer right there,
from within our program! Consider the following example that uses an imaginary
extension of the C preprocessor:
 
这一行的作用是什么? 这是一个简单的预编译指令, 它把程序中的triple(X)替换称为
X+X+X。例如, 把所有的triple(5)都换成5+5+5, 然后再交给编译器编译。这就是一个简
单的代码生成的例子。要是C的预处理器再强大一点, 要是能够允许连接数据库, 要是能
多一些其他简单的机制, 我们就可以在我们程序的内部开发自己的数据访问层。下面这个
例子, 是一个假想的对C宏的扩展:
 
#get-db-schema("127.0.0.1, un, pwd");
#iterate-through-tables
#for-each-table
    class #table-name
    {
    };
#end-for-each

We've just connected to the database schema, iterated through all the tables,
and created an empty class for each. All in a couple of lines right within our
source code! Now every time we recompile the file where above code appears we'll
get a freshly built set of classes that automatically update based on the
schema. With a little imagination you can see how we could build a full data
access layer straight from within our program, without the use of any external
tools! Of course this has a certain disadvantage (aside from the fact that such
an advanced version of C preprocessor doesn't exist) - we'd have to learn a
whole new "compile-time language" to do this sort of work. For complex code
generation this language would have to be very complex as well, it would have to
support many libraries and language constructs. For example, if our generated
code depended on some file located at some ftp server the preprocessor would
have to be able to connect to ftp. It's a shame to create and learn a new
language just to do this. Especially since there are so many nice languages
already out there. Of course if we add a little creativity we can easily avoid
this pitfall.
 
我们连接数据库结构定义, 遍历数据表, 然后对每个表创建一个类, 只消几行代码就完成
了这个工作。这样每次编译工程的时候, 这些类都会根据数据库的定义同步更新。显而易
见, 我们不费吹灰之力就在程序内部建立了一个完整的数据访问层, 根本用不着任何外部
工具。当然这种作法有一个缺点, 那就是我们得学习一套新的"编译时语言", 另一个缺点
就是根本不存在这么一个高级版的C预处理器。需要做复杂代码生成的时候, 这个语言(译
者注: 这里指预处理指令, 即作者所说的"编译时语言")本身也一定会变得相当复杂。它
必须支持足够多的库和语言结构。比如说我们想要生成的代码要依赖某些ftp服务器上的
文件, 预处理器就得支持ftp访问, 仅仅因为这个任务而不得不创造和学习一门新的语言,
真是有点让人恶心(事实上已经存在着有此能力的语言, 这样做就更显荒谬)。我们不妨再
灵活一点, 为什么不直接用 C/C++自己作为自己的预处理语言呢? 这样子的话, 我们可
以发挥语言的强大能力, 要学的新东西也只不过是几个简单的指示字 , 这些指示字用来
区别编译时代码和运行时代码。
 
<%
    cout << "Enter a number: ";
    cin >> n;
%>
for(int i = 0; i < <%= n %>; i++)
{
    cout << "hello" << endl;
}

Can you see what happens here? Everything that's between <% and %> tags runs
when the program is compiled. Anything outside of these tags is normal code. In
the example above you'd start compiling your program in the development
environment. The code between the tags would be compiled and then ran. You'd get
a prompt to enter a number. You'd enter one and it would be placed inside the
for loop. The for loop would then be compiled as usual and you'd be able to
execute it. For example, if you'd enter 5 during the compilation of your
program, the resulting code would look like this:
 
你明白了吗? 在<%和%>标记之间的代码是在编译时运行的, 标记之外的其他代码都是普通
代码。编译程序时, 系统会提示你输入一个数, 这个数在后面的循环中会用到。而for循
环的代码会被编译。假定你在编译时输入5, for循环的代码将会是:
 
for(int i = 0; i < 5; i++)
{
    cout << "hello" << endl;
}
 
Simple and effective. No need for a special preprocessor language. We get full
power of our host language (in this case C/C++) at compile time. We could easily
connect to a database and generate our data access layer source code at compile
time in the same way JSP or ASP generate HTML! Creating such tools would also be
tremendously quick and simple. We'd never have to create new projects with
specialized GUIs. We could inline our tools right into our programs. We wouldn't
have to worry about whether writing such tools is worth it because writing them
would be so fast - we could save tremendous amounts of time by creating simple
bits of code that do mundane code generation for us!

又简单又有效率, 也不需要另外的预处理语言。我们可以在编译时就充分发挥宿主语言(
此处是C/C++)的强大能力, 我们可以很容易地在编译时连接数据库, 建立数据访问层, 就
像JSP或者ASP创建网页那样。我们也用不着专门的窗口工具来另外建立工程。我们可以在
代码中立即加入必要的工具。我们也用不着顾虑建立这种工具是不是值得, 因为这太容易
了, 太简单了。这样子不知可以节省多少时间啊。