回到顶部


httpclient_使用代理

当爬取网页的时候,有的目标站点有反爬虫机制,对于频繁访问站点以及规则性访问站点的行为,会采用屏蔽IP的措施。
这时候代理IP就派上用场了。
代理的分类
透明代理
匿名代理
混淆代理
高匿代理

***透明代理(Transparent Proxy) ***

REMOTE_ADDR= Proxy IP
HTTP_VIA = Proxy IP
HTTP_X_FORWORAD_FOR= YOUR IP
透明代理虽然可以直接隐藏你的IP地址,但是还是从HTTP_X_FORWARD_FOR 来查到你是谁

***匿名代理(Anonymous Proxy) ***

REMOTE_ADDR= proxy Ip
HTTP_VIA = proxy IP
HTTP_X_FORWARD_FOR = proxy_IP
匿名代理比透明代理进步一点,别人只能知道你用了代理,无法知道你是谁

***混淆代理(Distorting Proxies) ***

REMOTE_ADDR=PROXY_IP
HTTP_VIA =PROXY IP
HTTP_X_FOREARD_FOR=Random IP ADDRESS
与匿名代理相同,如果使用了混淆代理,别人还是能知道你在用代理,但是会得到一个假的IP地址,伪装的更逼真

***高匿代理(Elite Proxy 或High Anonymity Proxy) ***

REMOTE_ADDR=PROXY_IP
HTTP_VIA = not determined
HTTP_X_FORWARD_FOR= not determined
可以看出,高匿代理让别人无法发现你是在用代理,是爬虫最好的选择

代理IP的获取

@Test
    public void testHttpProxy() throws  Exception{
        HttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet("http://www.baidu.com");
        //使用代理服务器
        HttpHost httpHost = new HttpHost("220.194.55.160",3128);
        RequestConfig config = RequestConfig.custom().setProxy(httpHost).build();
        httpGet.setConfig(config);
        CloseableHttpResponse response = (CloseableHttpResponse) httpClient.execute(httpGet);
        HttpEntity entity = response.getEntity();
        //输出网页内容
        System.out.println("网页内容:");
        System.out.println(EntityUtils.toString(entity,"utf-8"));
        response.close();
    }

httpclient代理配置

HttpClient支持复杂的路由方案和代理链,同样也支持直接或者只通过一跳的连接
使用代理服务器最简单的方式,执行一个默认的默认的代理

HttpHost proxy = new HttpHost("someproxy", 8080);  
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);  
CloseableHttpClient httpclient = HttpClients.custom()  
        .setRoutePlanner(routePlanner)  
        .build();

HttpClient使用jre代理服务器

SystemDefaultRoutePlanner routePlanner = new SystemDefaultRoutePlanner(  
        ProxySelector.getDefault());  
CloseableHttpClient httpclient = HttpClients.custom()  
        .setRoutePlanner(routePlanner)  
        .build();  

手动配置RoutePlanner,这样就可以完全控制Http路由的过程

HttpRoutePlanner routePlanner = new HttpRoutePlanner() {    
    public HttpRoute determineRoute(  
            HttpHost target,  
            HttpRequest request,  
            HttpContext context) throws HttpException {  
        return new HttpRoute(target, null,  new HttpHost("someproxy", 8080),  
                "https".equalsIgnoreCase(target.getSchemeName()));  
    }   
};  
CloseableHttpClient httpclient = HttpClients.custom()  
        .setRoutePlanner(routePlanner)  
        .build();  
    }  
} 
posted on 2018-04-14 06:37  ssgao  阅读(1278)  评论(0编辑  收藏  举报