跨平台的Html解析代码
发表于 2013 年 8 月 27 日 由 admin
前一段时间为了解析HTML在网上找Delphi版本的HTML解析器,发现没有太好用的.遇到复杂的HTML都会出错.最常见的JavaScript中嵌入HTML的字符串,会解析出错.
至于收费的没看过.不知道怎么样.
于是自己写了一个,到现在没有遇到解析出错的HTML.现在公开出来给大家用.只是苦了老外那几个收费的.
采用的是接口形式,生存期自管理,不用理会释放的事情.最近又增加了CSS Selector语法的查找功能.可以像CSS选择器一样选择节点.
只引用了SysUtils单元.避免了在高版本Delphi中Classes这个体积大户.同时也具有较好的跨平台性.
支持Delphi7-DelphiXE4为止的编译器.
因为采用的是接口,理论上编译成DLL的话C++和VB也能使用.
接口声明如下:
IHtmlElement = interface ['{8C75239C-8CFA-499F-B115-7CEBEDFB421B}'] function GetOwner: IHtmlElement; stdcall; function GetTagName: WideString; safecall; function GetContent: WideString; safecall; function GetOrignal: WideString; safecall; function GetChildrenCount: Integer; stdcall; function GetChildren(Index: Integer): IHtmlElement; stdcall; function GetCloseTag: IHtmlElement; stdcall; function GetInnerHtml(): WideString; safecall; function GetOuterHtml(): WideString; safecall; function GetInnerText(): WideString; safecall; function GetAttributes(Key: WideString): WideString; safecall; function GetSourceLineNum(): Integer; stdcall; function GetSourceColNum(): Integer; stdcall; // 属性是否存在 function HasAttribute(AttributeName: WideString): Boolean; stdcall; // 查找节点 { FindElements('Link','type="application/rss+xml"') FindElements('','type="application/rss+xml"') } function FindElements(ATagName: WideString; AAttributes: WideString; AOnlyInTopElement: Boolean): IHtmlElementList; stdcall; //用CSS选择器语法查找Element function SimpleCSSSelector(const selector: WideString) : IHtmlElementList; stdcall; // 枚举属性 procedure EnumAttributeNames(AParam: Pointer; ACallBack: TEnumAttributeNameCallBack); stdcall; property TagName: WideString read GetTagName; property ChildrenCount: Integer read GetChildrenCount; property Children[index: Integer]: IHtmlElement read GetChildren; default; property CloseTag: IHtmlElement read GetCloseTag; property Content: WideString read GetContent; property Orignal: WideString read GetOrignal; property Owner: IHtmlElement read GetOwner; // 获取元素在源代码中的位置 property SourceLineNum: Integer read GetSourceLineNum; property SourceColNum: Integer read GetSourceColNum; // property InnerHtml: WideString read GetInnerHtml; property OuterHtml: WideString read GetOuterHtml; property InnerText: WideString read GetInnerText; property Attributes[Key: WideString]: WideString read GetAttributes; end; IHtmlElementList = interface ['{8E1380C6-4263-4BF6-8D10-091A86D8E7D9}'] function GetCount: Integer; stdcall; function GetItems(Index: Integer): IHtmlElement; stdcall; property Count: Integer read GetCount; property Items[Index: Integer]: IHtmlElement read GetItems; default; end;function ParserHTML(const Source: WideString): IHtmlElement; stdcall; |
GoogleCode SVN源代码:
http://code.google.com/p/delphi-html-parser/
浙公网安备 33010602011771号