XML Reloaded
重新审视XML
A thousand mile journey starts with a single step. A journey to enlightenment is no exception and our first step just happens to be XML. What more could possibly be said about XML that hasn't already been said? It turns out, quite a bit. While there's nothing particularly interesting about XML itself, its relationship to Lisp is fascinating. XML is the all too familiar concept that Lisp advocates need so much. It is our bridge to conveying understanding to regular programmers. So let's revive the dead horse, take out the stick, and venture into XML wilderness that no one dared venture into before us. It's time to see the all too familiar moon from the other side.
千里之行始于足下。让我们的第一步从XML开始。可是XML已经说得更多的了, 还能有什么新意思可说呢? 有的。XML自身虽然谈谈不上有趣, 但是XML和Lisp的关系却相当有趣。XML和Lisp的概念有着惊人的相似之处。XML是我们通向理解Lisp的桥梁。好吧, 我们且把XML当作活马医。让我们拿好手杖, 对XML的无人涉及的荒原地带作一番探险。我们要从一个全新的视角来考察这个题目。
Superficially XML is nothing more than a standardized syntax used to express arbitrary hierarchical data in human readable form. To-do lists, web pages, medical records, auto insurance claims, configuration files are all examples of potential XML use. Let's use a simple to-do list as an example (in a couple of sections you'll see it in a whole new light):
<todo name="housework">
<item priority="high">Clean the house.</item>
<item priority="medium">Wash the dishes.</item>
<item priority="medium">Buy more soap.</item>
</todo>
<item priority="high">Clean the house.</item>
<item priority="medium">Wash the dishes.</item>
<item priority="medium">Buy more soap.</item>
</todo>
表面上看, XML是一种标准化语法, 它以适合人阅读的格式来表达任意的层次化数据(hirearchical data)。象任务表(to-do list), 网页, 病历, 汽车保险单, 配置文件等等, 都是XML用武的地方。比如我们拿任务表做例子:
<todo name="housework">
<item priority="high">Clean the house.</item>
<item priority="medium">Wash the dishes.</item>
<item priority="medium">Buy more soap.</item>
</todo>
<item priority="high">Clean the house.</item>
<item priority="medium">Wash the dishes.</item>
<item priority="medium">Buy more soap.</item>
</todo>
What happens if we unleash our favorite XML parser on this to-do list? Once the data is parsed, how is it represented in memory? The most natural representation is, of course, a tree - a perfect data structure for hierarchical data. After all is said and done, XML is really just a tree serialized to a human readable form. Anything that can be represented in a tree can be represented in XML and vice versa. I hope you understand this idea. It's very important for what's coming next.
解析这段数据时会发生什么情况? 解析之后的数据在内存中怎样表示? 显然, 用树来表示这种层次化数据是很恰当的。说到底, XML这种比较容易阅读的数据格式, 就是树型结构数据经过序列化之后的结果。任何可以用树来表示的数据, 同样可以用XML来表示, 反之亦然。希望你能懂得这一点, 这对下面的内容极其重要。
Let's take this a little further. What other type of data is often represented as a tree? At this point the list is as good as infinite so I'll give you a hint at what I'm getting at - try to remember your old compiler course. If you have a vague recollection that source code is stored in a tree after it's parsed, you're on the right track. Any compiler inevitably parses the source code into an abstract syntax tree. This isn't surprising since source code is hierarchical: functions contain arguments and blocks of code. Blocks of code contain expressions and statements. Expressions contain variables and operators. And so it goes.
再进一步。还有什么类型的数据也常用树来表示? 无疑列表(list)也是一种。上过编译课吧? 还模模糊糊记得一点吧? 源代码在解析之后也是用树结构来存放的, 任何编译程序都会把源代码解析成一棵抽象语法树, 这样的表示法很恰当, 因为源代码就是层次结构的:函数包含参数和代码块, 代码快包含表达式和语句, 语句包含变量和运算符等等。
Let's apply our corollary that any tree can easily be serialized into XML to this idea. If all source code is eventually represented as a tree, and any tree can be serialized into XML, then all source code can be converted to XML, right? Let's illustrate this interesting property by a simple example. Consider the function below:
我们已经知道, 任何树结构都可以轻而易举的写成XML, 而任何代码都会解析成树, 因此,任何代码都可以转换成XML, 对不对? 我举个例子, 请看下面的函数:
int add(int arg1, int arg2)
{
return arg1+arg2;
}
{
return arg1+arg2;
}
Can you convert this function definition to its XML equivalent? Turns out, it's reasonably simple. Naturally there are many ways to do this. Here is one way the resulting XML can look like:
能把这个函数变成对等的XML格式吗? 当然可以。我们可以用很多种方式做到, 下面是其中的一种, 十分简单:
<define-function return-type="int" name="add">
<arguments>
<argument type="int">arg1</argument>
<argument type="int">arg2</argument>
</arguments>
<body>
<return>
<add value1="arg1" value2="arg2" />
</return>
</body>
</define>
<arguments>
<argument type="int">arg1</argument>
<argument type="int">arg2</argument>
</arguments>
<body>
<return>
<add value1="arg1" value2="arg2" />
</return>
</body>
</define>
We can go through this relatively simple exercise with any language. We can turn any source code into XML, and we can transform the resulting XML back to original source code. We can write a converter that turns Java into XML and a converter that turns XML back to Java. We could do the same for C++. (In case you're wondering if anyone is crazy enough to do it, take a look at GCC-XML). Furthermore, for languages that share common features but use different syntax (which to some extent is true about most mainstream languages) we could convert source code from one language to another using XML as an intermediary representation. We could use our Java2XML converter to convert a Java program to XML. We could then run an XML2CPP converter on the resulting XML and turn it into C++ code. With any luck (if we avoid using features of Java that don't exist in C++) we'll get a working C++ program. Neat, eh?
这个例子非常简单, 用哪种语言来做都不会有太大问题。我们可以把任何程序码转成XML,也可以把XML转回到原来的程序码。我们可以写一个转换器, 把Java代码转成XML, 另一个转换器把XML转回到Java。一样的道理, 这种手段也可以用来对付C++(这样做跟发疯差不多么。可是的确有人在做, 看看GCC-XML(http://www.gccxml.org)就知道了)。进一步说,凡是有相同语言特性而语法不同的语言, 都可以把XML当作中介来互相转换代码。实际上几乎所有的主流语言都在一定程度上满足这个条件。我们可以把XML作为一种中间表示法,在两种语言之间互相译码。比方说, 我们可以用Java2XML把Java代码转换成XML, 然后用XML2CPP再把XML转换成C++代码, 运气好的话, 就是说, 如果我们小心避免使用那些C++不具备的Java特性的话, 我们可以得到完好的C++程序。这办法怎么样, 漂亮吧?
All this effectively means that we can use XML for generic storage of source code. We'd be able to create a whole class of programming languages that use uniform syntax, as well as write transformers that convert existing source code to XML. If we were to actually adopt this idea, compilers for different languages wouldn't need to implement parsers for their specific grammars - they'd simply use an XML parser to turn XML directly into an abstract syntax tree.
这一切充分说明, 我们可以把XML作为源代码的通用存储方式, 其实我们能够产生一整套使用统一语法的程序语言, 也能写出转换器, 把已有代码转换成XML格式。如果真的采纳这种办法, 各种语言的编译器就用不着自己写语法解析了, 它们可以直接用XML的语法解析来直接生成抽象语法树。
By now you're probably wondering why I've embarked on the XML crusade and what it has to do with Lisp (after all, Lisp was created about thirty years before XML). I promise that everything will become clear soon enough. But before we take our second step, let's go through a small philosophical exercise. Take a good look at the XML version of our "add" function above. How would you classify it? Is it data or code? If you think about it for a moment you'll realize that there are good reasons to put this XML snippet into both categories. It's XML and it's just information encoded in a standardized format. We've already determined that it can be generated from a tree data structure in memory (that's effectively what GCC-XML does). It's lying around in a file with no apparent way to execute it. We can parse it into a tree of XML nodes and do various transformations on it. It's data. But wait a moment! When all is said and done it's the same "add" function written with a different syntax, right? Once parsed, its tree could be fed into a compiler and we could execute it. We could easily write a small interpreter for this XML code and we could execute it directly. Alternatively, we could transform it into Java or C++ code, compile it, and run it. It's code.
说到这里你该问了, 我们研究了这半天XML, 这和Lisp有什么关系呢? 毕竟XML出来之时,Lisp早已经问世三十年了。这里我可以保证, 你马上就会明白。不过在继续解释之前, 我们先做一个小小的思维练习。看一下上面这个XML版本的add函数例子, 你怎样给它分类,是代码还是数据? 不用太多考虑都能明白, 把它分到哪一类都讲得通。它是XML, 它是标准格式的数据。我们也知道, 它可以通过内存中的树结构来生成(GCC-XML做的就是这个事情)。它保存在不可执行的文件中。我们可以把它解析成树节点, 然后做任意的转换。显而易见, 它是数据。不过且慢, 虽然它语法有点陌生, 可它又确确实实是一个add函数,对吧? 一旦经过解析, 它就可以拿给编译器编译执行。我们可以轻而易举写出这个XML代码解释器, 并且直接运行它。或者我们也可以把它译成Java或C++代码, 然后再编译运行。所以说, 它也是代码。
So, where are we? Looks like we've just arrived to an interesting point. A concept that has traditionally been so hard to understand is now amazingly simple and intuitive. Code is also always data! Does it mean that data is also always code? As crazy as this sounds this very well might be the case. Remember how I promised that you'll see our to-do list in a whole new light? Let me reiterate on that promise. But we aren't ready to discuss this just yet. For now let's continue walking down our path.
我们说到那里了? 不错, 我们已经发现了一个有趣的关键之点。过去被认为很难解的概念已经非常直观非常简单的显现出来。代码也是数据, 并且从来都是如此。这听起来疯疯癫癫的, 实际上却是必然之事。我许诺过会以一种全新的方式来解释Lisp, 我要重申我的许诺。但是我们此刻还没有到预定的地方, 所以还是先继续上边的讨论。
A little earlier I mentioned that we could easily write an interpreter to execute our XML snippet of the add function. Of course this sounds like a purely theoretical exercise. Who in their right mind would want to do that for practical purposes? Well, it turns out quite a few people would disagree. You've likely encountered and used their work at least once in your career, too. Do I have you out on the edge of your seat? If so, let's move on!
刚才我说过, 我们可以非常简单地实现XML版的add函数解释器, 这听起来好像不过是说说而已。谁真的会动手做一下呢? 未必有多少人会认真对待这件事。随便说说, 并不打算真的去做, 这样的事情你在生活中恐怕也遇到吧。你明白我这样说的意思吧, 我说的有没有打动你? 有哇, 那好, 我们继续。