【jsoup】Java解析Html

==============================================

1、解析为xml文档,处理html片段

==============================================

1、解析为xml文档,处理html片段

字符串解析为xml文档,作用输入是什么样子的片断,输出业务什么样子的

String html = "<div>hello</div>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
System.out.println(doc.html());

结果:

<div>hello</div>

不带参数解析

String html = "<div>hello</div>";
Document doc = Jsoup.parse(html);
System.out.println(doc.html());

结果:

<html>
 <head></head>
 <body>
  <div>
   hello
  </div>
 </body>
</html>

 

字符串解析为html文档

String html = "<html><head><title>First html parse</title></head><body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
System.out.println(doc.html());

 

字符串解析为片断

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
System.out.println(body.html());

 

从url加载文档

Document doc = Jsoup.connect("http://www.lianhu.gov.cn/").get();
String title = doc.title();
System.out.println(title);
构建特殊请求
Document doc = Jsoup.connect("http://www.lianhu.gov.cn/")
        .data("query", "Java")
        .userAgent("Mozilla")
        .cookie("auth", "token")
        .timeout(3000)
        .post();

 

从文件加载文档

File input = new File("D:/deya/vhost/zizhou/index.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
System.out.println(doc.html());

 

posted @ 2022-03-04 14:26  谷粒-笔记  阅读(162)  评论(0)    收藏  举报