python爬虫BUG(爬取航班信息)

python爬取中遇到的一些错误以及解决方案:

must be str, not ReadTimeout

must be str, not ConnectionError

429 Too Many Requests 

乱码(gb2312)

 1 错误信息:
 2 AS1084航班爬取错误
 3 must be str, not ProxyError 错误信息未处理
 4 解决方案:
 5 使用try exceptprint(记录错误航班) pass跳出错误继续爬取
 6 
 7 错误信息:
 8 CA3767航班爬取错误
 9 local variable 'ok' referenced before assignment   未赋值前被引用
10 解决方案:
11 赋值改为全局变量 global ok
12 
13 错误信息:
14 MF1930航班爬取完成!
15 must be str, not ReadTimeout 获取网页超时
16                content = requests.get(
17                    'http://happiness.variflight.com/info/detail?fnum',
18                    proxies=proxies,timeout=30).text
19 解决方案:
20 超时即 except:pass重新连接页面
21 
22 错误信息:
23 NS8185航班爬取完成!
24 must be str, not ConnectionError 数据库连接错误
25 解决方案:
26 重连数据库,记录并 pass跳过此条航班信息
27 
28 错误信息:
29 429 Too Many Requests  错误页面
30 403
31 502
32 解决方案:
33 频繁访问页面,判断为正常页面 爬取即可
34 
35 解决方案:
36 unc = stringa.decode("gb2312") #先decode
37 print unc.encode("utf-8") #后转utf-8
38 HTML乱码 此编码方式为gb2312
39 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
40 <HTML><HEAD>
41 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=gb2312"> 
42 <TITLE>′í?ó£o?ú?ù???óμ?í??·£¨URL£??T·¨??è?</TITLE>
43 <STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}PRE{font-family:sans-serif}--></STYLE>
44 </HEAD><BODY>
45 <H1>′í?ó</H1>
46 <H2>?ú?ù???óμ?í??·£¨URL£??T·¨??è?</H2>
47 <HR noshade size="1px">
48 <P>
49 μ±3¢ê??áè?ò???í??·£¨URL£?ê±£o
50 <A HREF="http://happiness.variflight.com/info/detail?fnum=CZ3134&amp;dep=TSN&amp;arr=CAN&amp;date=2017-12-28&amp;type=1">http://happiness.variflight.com/info/detail?fnum=CZ3134&amp;dep=TSN&amp;arr=CAN&amp;date=2017-12-28&amp;type=1</A>
51 <P>
52 ·¢éúá???áDμ?′í?ó£o
53 <UL>
54 <LI>
55 <STRONG>
56 Read Error
57 <BR>
58 ?áè?′í?ó
59 </STRONG>
60 </UL>
61 
62 <P>
63 ?μí3??ó|£o
64 <PRE><I>    (104) Connection reset by peer</I></PRE>
65 
66 <P>
67 An error condition occurred while reading data from the network.  Please
68 retry your request.
69 <BR>
70 ?y?úí¨1yí????áè?êy?Yê±·¢éúá?′í?ó£?????D?3¢ê??£
71 </P>
72 <P>±??o′?·t???÷1üàí?±£o<A HREF="mailto:support@chinacache.com">support@chinacache.com</A>

 

posted @ 2017-12-29 15:35  Abraham2017  阅读(1766)  评论(0编辑  收藏  举报