PHP HTML DOM / Ganon & phpQuery & Simple HTML DOM (操作htmldom对象的类库) - 飞龙再生

公告

PHP HTML DOM / Ganon & phpQuery & Simple HTML DOM (操作htmldom对象的类库)

PHP Simple HTML DOM Parser

这个我从第一个测试版用到现在好几年了，轻量级，很不错，单文件代码 1393 行
项目地址： http://simplehtmldom.sourceforge.net/
手册： http://simplehtmldom.sourceforge.net/manual.htm

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!

Require PHP 5+.

Supports invalid HTML.

Find tags on an HTML page with selectors just like jQuery.

Extract contents from HTML in a single line.

PHP Simple HTML DOM Parser 使用示例
查看

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';

修改

// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

$html->find('div', 1)->class = 'bar';

$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>

Fix absoult url

$uri = new Net_URL2('http://example.com/foo/bar'); // URI of the resource
$baseURI = $uri;
foreach ($html->find('base[href]') as $elem) {
   $baseURI = $uri->resolve($elem->href);
}

foreach ($html->find('*[src]') as $elem) {
   $elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($html->find('*[href]') as $elem) {
   if (strtoupper($elem->tag) === 'BASE') continue;
   $elem->href = $baseURI->resolve($elem->href)->__toString();
}
foreach ($html->find('form[action]') as $elem) {
   $elem->action = $baseURI->resolve($elem->action)->__toString();
}

Ganon

项目地址： http://code.google.com/p/ganon/
文档： http://code.google.com/p/ganon/w/list

这个功能强大的很，最近才发现的，加入我的常库，单文件代码 2856 行

The Ganon library gives access to HTML/XML documents in a very simple object oriented way. It eases modifying the DOM and makes finding elements easy with CSS3-like queries.

A universal tokenizer
A HTML/XML/RSS DOM Parser
Ability to manipulate elements and their attributes
Supports invalid HTML
Supports UTF8
Can perform advanced CSS3-like queries on elements (like jQuery -- namespaces supported)
A HTML beautifier (like HTML Tidy)
Minify CSS and Javascript
Sort attributes, change character case, correct indentation, etc.
Extensible
Parsing documents using callbacks based on current character/token
Operations separated in smaller functions for easy overriding
Fast
Easy

Ganon 使用示例：

// Parse the google code website into a DOM
$html = file_get_dom('http://code.google.com/');

Access
Accessing elements is made easy through the CSS3-like selectors and the object model.

// Find all the paragraph tags with a class attribute and print the
// value of the class attribute
foreach($html('p[class]') as $element) {
echo $element->class, "<br>\n";
}

// Find the first div with ID "gc-header" and print the plain text of
// the parent element (plain text means no HTML tags, just the text)
echo $html('div#gc-header', 0)->parent->getPlainText();

// Find out how many tags there are which are "ns:tag" or "div", but not
// "a" and do not have a class attribute
echo count($html('(ns|tag, div + !a)[!class]');
?>

Modification
Elements can be easily modified after you've found them.

// Find all paragraph tags which are nested inside a div tag, change
     // their ID attribute and print the new HTML code
     foreach($html('div p') as $index => $element) {
     $element->id = "id$index";
     }
     echo $html;

     // Center all the links inside a document which start with "http://"
     // and print out the new HTML
     foreach($html('a[href ^= "http://"]') as $element) {
     $element->wrap('center');
     }
     echo $html;

     // Find all odd indexed "td" elements and change the HTML to make them links
     foreach($html('table td:odd') as $element) {
     $element->setInnerText('<a href="#">'.$element->getPlainText().'</a>');
     }
     echo $html;

Beautify
Ganon can also help you beautify your code and format it properly.

// Beautify the old HTML code and print out the new, formatted code
dom_format($html, array('attributes_case' => CASE_LOWER));
echo $html;

phpQuery

这个重量级，比较耗资源，单文件代码 5702 行

phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library.
Library is written in PHP5 and provides additional Command Line Interface (CLI).

项目地址： http://code.google.com/p/phpquery/
文档：http://code.google.com/p/phpquery/wiki/Manual

phpQuery Examples

CLI
Fetch number of downloads of all release packages

phpquery 'http://code.google.com/p/phpquery/downloads/list?can=1' \
--find '.vt.col_4 a' --contents \
--getString null array_sum

PHP
Examples from demo.php

require('phpQuery/phpQuery.php');
// for PEAR installation use this
// require('phpQuery.php');

初始化 INITIALIZE IT

// $doc = phpQuery::newDocumentHTML($markup);
// $doc = phpQuery::newDocumentXML();
// $doc = phpQuery::newDocumentFileXHTML('test.html');
// $doc = phpQuery::newDocumentFilePHP('test.php');
// $doc = phpQuery::newDocument('test.xml', 'application/rss+xml');
// this one defaults to text/html in utf8
$doc = phpQuery::newDocument('<div/>');

填充 FILL IT

// array syntax works like ->find() here
$doc['div']->append('<ul></ul>');
// array set changes inner html
$doc['div ul'] = '<li>1</li><li>2</li><li>3</li>';

操纵 MANIPULATE IT

// almost everything can be a chain
$li = null;
$doc['ul > li']
   ->addClass('my-new-class')
   ->filter(':last')
   ->addClass('last-li')
// save it anywhere in the chain
   ->toReference($li);

选择 SELECT DOCUMENT

// pq(); is using selected document as default
phpQuery::selectDocument($doc);
// documents are selected when created or by above method
// query all unordered lists in last selected document
pq('ul')->insertAfter('div');

遍历 ITERATE IT

// all LIs from last selected DOM
foreach(pq('li') as $li) {
   // iteration returns PLAIN dom nodes, NOT phpQuery objects
   $tagName = $li->tagName;
   $childNodes = $li->childNodes;
   // so you NEED to wrap it within phpQuery, using pq();
   pq($li)->addClass('my-second-new-class');
}

输出 PRINT OUTPUT

// 1st way
print phpQuery::getDocument($doc->getDocumentID());
// 2nd way
print phpQuery::getDocument(pq('div')->getDocumentID());
// 3rd way
print pq('div')->getDocument();
// 4th way
print $doc->htmlOuter();
// 5th way
print $doc;
// another...
print $doc['ul'];

Incoming search terms:

Tags: DOM, Ganon, HTML, PHP, phpQuery, Simple HTML DOM

本文地址: http://www.21andy.com/new/20120716/2071.html

posted on 2015-09-16 16:13 飞龙再生阅读(521) 评论(0) 收藏举报

刷新页面返回顶部