【jsoup】Java解析Html
==============================================
1、解析为xml文档,处理html片段
==============================================
1、解析为xml文档,处理html片段
字符串解析为xml文档,作用输入是什么样子的片断,输出业务什么样子的
String html = "<div>hello</div>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
System.out.println(doc.html());
结果:
<div>hello</div>
不带参数解析
String html = "<div>hello</div>";
Document doc = Jsoup.parse(html);
System.out.println(doc.html());
结果:
<html>
<head></head>
<body>
<div>
hello
</div>
</body>
</html>
字符串解析为html文档
String html = "<html><head><title>First html parse</title></head><body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
System.out.println(doc.html());
字符串解析为片断
String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
System.out.println(body.html());
从url加载文档
Document doc = Jsoup.connect("http://www.lianhu.gov.cn/").get();
String title = doc.title();
System.out.println(title);
构建特殊请求
Document doc = Jsoup.connect("http://www.lianhu.gov.cn/")
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
从文件加载文档
File input = new File("D:/deya/vhost/zizhou/index.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
System.out.println(doc.html());

浙公网安备 33010602011771号