(ASWP摘录) Chap 2 XML
介绍了XML有关的内容,包括XML语法,DTD,schema,XPath和XSL。介绍的很好。
2.1 Introduction
1. 标记语言(Markup Language)
用一系列约定好的标记来对电子文档进行标记,来实现对电子文档的语义、结构、格式的定义。
这些标记必须能够容易的跟内容相区分,易于识别。标记语言必须定义什么样的标记是允许的,什么样的标记是必须的,标记是如何与文档的内容向区分的,以及标记的含义是什么。
A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text.
The best-known markup language is in modern use is HTML
(Hypertext Markup Language), one of the foundations of the World Wide Web.
Historically, markup was (and is) used in the publishing industry in the
communication of printed work between authors, editors, and printers.
包括文本信息和对这些信息的描述(标记)。
2. XML对HTML的优势
对每一项内容都有标记
通过嵌套关系体现内容之间的关系
允许定义对值的约束
2.2 The XML Language
1. Processing Instruction (PI,处理指令)
应用程序使用处理指令(PI)来得知如何处理元素。
The general form is
<?target instruction ?>
For example,
<?stylesheet type="text/css" href="mystyle.css"?>
PIs offer procedural possibilities in an otherwise declarative environment.
2. Well-Formed XML Document
An XML is well-formed if it is syntactically correct.
如只能有一个根元素,尖扩号要对称,元素和属性名必须合法等。
2.3 Structuring
1. valid XML document
well-formed, uses structuring
information and respects that structuring information
2. A Concluding Example of DTD
<!ELEMENT email (head,body)>
<!ELEMENT head (from,to+,cc*,subject)>
<!ELEMENT from EMPTY> // EMPTY 元素
<!ATTLIST from
name CDATA
#IMPLIED //
IMPLIED: 属性可选
address CDATA #REQUIRED> // REQUIRED: 属性必须要有
<!ELEMENT to EMPTY>
<!ATTLIST to
name CDATA #IMPLIED
address CDATA #REQUIRED>
<!ELEMENT cc EMPTY>
<!ATTLIST cc
name CDATA #IMPLIED
address CDATA #REQUIRED>
<!ELEMENT subject (#PCDATA)> // #PCDATA: 任何值
<!ELEMENT body (text,attachment*)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT attachment EMPTY>
<!ATTLIST attachment
encoding (mime|binhex) "mime" // 值可选“mime”或“binhex”,默认是mime
file CDATA #REQUIRED>
3. Another DTD Example to show “IDREF” and “IDREFS”
<!ELEMENT family (person*)>
<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST person
id ID #REQUIRED
mother IDREF #IMPLIED
// 值指向文档里的其他元素
father IDREF #IMPLIED
children IDREFS #IMPLIED>
<family>
<person
id="bob" mother="mary"
father="peter">
<name>Bob Marley</name>
</person>
<person
id="bridget" mother="mary">
<name>Bridget
Jones</name>
</person>
<person
id="mary" children="bob bridget">
<name>Mary
Poppins</name>
</person>
<person
id="peter" children="bob">
<name>Peter
Marley</name>
</person>
</family>
4. XML Schema
XML Schema也是一个XML文件,因此可以用现有的XML工具来解析它。
针对DTD种数据类型限定很弱的情况,XML Schema有很大的改进,允许自定义类型和对现有的类型增加限制。
5. A Cocluding Example of XML Schema
<element name="email" type="emailType"/>
<complexType name="emailType"> // 有2种类型:complexType和simpleType
<sequence>
<element
name="head" type="headType"/>
<element
name="body" type="bodyType"/>
</sequence>
</complexType>
<sequence>
<element
name="from" type="nameAddress"/>
<element
name="to" type="nameAddress"
minOccurs="1"
maxOccurs="unbounded"/> // 限定数量
<element
name="cc" type="nameAddress"
minOccurs="0"
maxOccurs="unbounded"/>
<element
name="subject" type="string"/>
</sequence>
</complexType>
<attribute
name="name" type="string" use="optional"/>
<attribute
name="address" type="string" use="required"/>
</complexType>
<sequence>
<element
name="text" type="string"/>
<element
name="attachment" minOccurs="0"
maxOccurs="unbounded">
<complexType> // Anonymous
complexType: 内嵌,只用一次
<attribute
name="encoding" use="default" value="mime">
<simpleType> // SimpleType:对现有的类型添加限制条件
<restriction base="string">
<enumeration
value="mime"/>
<enumeration
value="binhex"/>
</restriction>
</simpleType>
</attribute>
<attribute
name="file" type="string" use="required"/>
</complexType>
</element>
</sequence>
</complexType>
Note that some data types are defined separately and given names, while others are defined within other types and defined anonymously (the types for the attachment element and the encoding attribute).
2.4 Namespaces
2.5 Addressing and Querying XML Documents
1. XPath Example
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE library PUBLIC "library.dtd">
<library location="Bremen">
<author name="Henry
Wise">
<book title="Artificial
Intelligence"/>
<book title="Modern
Web Services"/>
<book title="Theory of
Computation"/>
</author>
<author name="William
Smart">
<book title="Artificial
Intelligence"/>
</author>
<author name="Cynthia
Singleton">
<book title="The
Semantic Web"/>
<book title="Browser
Technology Revised"/>
</author>
</library>
library元素下类型为author的元素
XML文件里所有类型为author的元素
library元素的location属性
XML文件里第一个author元素
2. Basic Syntax of XPath
XPath(XML路径语言) is a language for addressing parts of an XML document.
是一种用来确定XML文档中某部分位置的语言。
Path Expression由一系列的steps组成,step之间用/来分隔,step的的组成部分如下:
axis specifier: determines the tree relationship between the nodes to beaddressed and the context node. Examples are parent, ancestor, child (the default), sibling, attribute node. // is such an axis specifier; it denotes descendant or self.
node test: 指定要查找的node。最常用的是根据元素的名字。
predicate(也叫filter expression, optional):过滤条件,如[1]表示选择第一个元素。
2.6 Processing
1. XSL (extensible stylesheet language)
既是一种transformation language(XSLT),也是一种formatting language.
XSL指定了将一个XML文档转换为另一个XML文档,HTML文档或普通文本的一些规则。
当采用了不同的DTD或Schema的应用程序需要交流时,会使用XSLT。