简单Java爬虫
到Apache官网下载httpcomponents-client-4.4,在项目中导入jar
主要类:HttpClient,HttpGet,HttpResponse,HttpEntity
public static void getContentFromUrl(String url) { HttpClient client = new DefaultHttpClient(); HttpGet getHttp = new HttpGet(url); while (true) { try { HttpResponse response = client.execute(getHttp); HttpEntity entity = response.getEntity(); String content = null; String str = null; if (entity != null) { content = EntityUtils.toString(entity); str = new String(content.getBytes("ISO-8859-1"), "UTF-8");
//在这里可以正则匹配str获得需要的信息 } sleep(2000); } catch (IOException | InterruptedException | ParseException e) { e.printStackTrace(); } } }
作者:包子头斯基
出处:http://www.cnblogs.com/colleen/
欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

浙公网安备 33010602011771号