简单Java爬虫

Apache官网下载httpcomponents-client-4.4,在项目中导入jar

主要类:HttpClient,HttpGet,HttpResponse,HttpEntity

public static void getContentFromUrl(String url) {
    HttpClient client = new DefaultHttpClient();
    HttpGet getHttp = new HttpGet(url);
    while (true) {
        try {
            HttpResponse response = client.execute(getHttp);
            HttpEntity entity = response.getEntity();
            String content = null;
            String str = null;
            if (entity != null) {
                content = EntityUtils.toString(entity);
                str = new String(content.getBytes("ISO-8859-1"), "UTF-8");  
//在这里可以正则匹配str获得需要的信息 } sleep(
2000); } catch (IOException | InterruptedException | ParseException e) { e.printStackTrace(); } } }

 

posted @ 2015-06-27 20:31  包子头斯基  阅读(172)  评论(0)    收藏  举报