使用HttpUrlConnection访问www.163.com遇到503问题,用设置代理加以解决

一次我使用如下程序连接到网易,意图获取其网站的html文本:

try {
            String urlPath = "http://www.163.com/";

            URL url = new URL(urlPath);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.connect();
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                InputStream inputStream = connection.getInputStream();
                File dir = new File("D:\\logs\\");
                if (!dir.exists()) {
                    dir.mkdirs();
                }
                File file = new File(dir, "163.txt");
                FileOutputStream fos = new FileOutputStream(file);
                byte[] buf = new byte[1024 * 8];
                int len = -1;
                while ((len = inputStream.read(buf)) != -1) {
                    fos.write(buf, 0, len);
                }
                fos.flush();
                fos.close();
            }else {
                System.out.println("download file failed because responseCode="+responseCode);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

但是,实质性代码没有进去,而是进去了else分支,原因是返回码是503。

503是服务器未准备好的意思,但是我用浏览器访问网易是正常的,于是我想有以下可能:

1.网易采用了防爬机制,得在头信息里加入浏览器信息以绕过。

2.未必是网易给我返回的503,中途路由一样可以给我返回。

经测试后,发现头信息加入浏览器信息无效。

这时想浏览器里有代理设置,HttpUrlConnection没有代理怎么可以上网呢,于是在代码开头处加入了代理;

            // SetProxy
            System.setProperty("http.proxyHost", "pkg.proxy.prod.jp.local");
            System.setProperty("http.proxyPort", "10080");

然后测试就顺利通过了。

下面是全部代码,供大家参考:

package urlconn;

import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class DownloadFileTest {
    public static void main(String[] args) {
        try {
            // SetProxy
            System.setProperty("http.proxyHost", "pkg.proxy.prod.jp.local");
            System.setProperty("http.proxyPort", "10080");

            String urlPath = "http://www.163.com/";

            URL url = new URL(urlPath);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.connect();
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                InputStream inputStream = connection.getInputStream();
                File dir = new File("D:\\logs\\");
                if (!dir.exists()) {
                    dir.mkdirs();
                }
                File file = new File(dir, "163.txt");
                FileOutputStream fos = new FileOutputStream(file);
                byte[] buf = new byte[1024 * 8];
                int len = -1;
                while ((len = inputStream.read(buf)) != -1) {
                    fos.write(buf, 0, len);
                }
                fos.flush();
                fos.close();
            }else {
                System.out.println("download file failed because responseCode="+responseCode);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

--2020-03-03--

posted @ 2020-03-03 08:44  逆火狂飙  阅读(3089)  评论(0)    收藏  举报
生当作人杰 死亦为鬼雄 至今思项羽 不肯过江东