爬虫报错

1.  scrapy  代理ip 报错 》》Connection was refused by otherside: 111: Connection refused.

0

暂无解。。。。。。50%概率性出现

 

 

2. HttpConnetentErro

手机版url 用 手机版 ua

url格式

3. 编码

content = response.read().decode('gbk')

UnicodeDecodeError: 'gbk' codec can'tdecode byte 0x8b in position 1: illegal multibyte sequence

 

请求头Headers = {  }  里参数设置错误

 

例子:

self.request_headers =  {

                     #错误的

           # 'Host': ',

           # 'accept': 'application/json, text/javascript, */*; q=0.01',

           # 'accept-encoding': 'gzip, deflate, br',    # 编码格式

           # 'accept-language': 'zh-CN,zh;q=0.8',

           # 'cache-control': 'no-cache',

           # 'content-length': '1098',

           # 'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',

            # 'pragma': 'no-cache',

           # 'referer': ',

           # 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/62.0.3178.0 Safari/537.36',

           # 'x-requested-with': 'XMLHttpRequest'

 

           'Host': 'login,

           'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0)Gecko/20100101 Firefox/35.0',

           'Referer': 'https://login',

           'Content-Type': 'application/x-www-form-urlencoded',

           'Connection': 'Keep-Alive'

       }

4. Pandas  dateFrame 报错

result = result.T.sort(['confidence','support'], ascending = False)

报以下错误:

AttributeError:'DataFrame' object has no attribute 'sort'

解决方式:

sort_values()即可解决

 

5. Etree.HTML()

UnicodeDecodeError: 'utf-8' codec can'tdecode byte 0xc1 in position 374: invalid start byte

编码问题

response =requests.get(url=url10, headers=header, proxies=proxy_ip_01(),cookies=cookie01)

con = response.content.decode('gbk')

html =etree.HTML(con)

 

# print(con)

 

aa =html.xpath("//div[@class='popup-inner']/ul/li[3]/div//ul/li[1]//a/text()")

print(aa)

6. 获取到内容  乱码

想要获得正确网页内容,而非乱码的话,就有两种方式了:

1.不要设置Accept-EncodingHeader

//req.Headers.Add("Accept-Encoding","gzip,deflate");

2.设置Accept-EncodingHeader,同时设置对应的自动解压缩的模式

req.Headers["Accept-Encoding"]= "gzip,deflate"; 
req.AutomaticDecompression = DecompressionMethods.GZip;

 

7 MySQL 报错

You have an error in your SQL syntax; checkthe manual that corresponds to your MySQL server version for the right syntaxto use near

 

"update tmall_bijiben_goods settitle='%s' where goods_id='%s'" % (title, item_id)

%s   需加上引号

8.requests.exceptions.ConnectionError:('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Cookie过时

 

9 .selenuium + chrome 模拟登录失败

尝试 换selenium + 火狐  

10.尝试 换selenium + 火狐   模拟登陆 自动输入

frompynput.mouse import Button, Controller

frompynput.keyboard import Controller as KeyController

frompynput.keyboard import Key

 

self.driver.get('https://login')

            time.sleep(1)

            m = Controller()

               # 设置鼠标位置到输入框

            m.position = (1208, 266)

            m.click(Button.left)

            time.sleep(0.2)

            m.release(Button.left)

            time.sleep(1)

            k = KeyController()

               # 清空输入框

            for i in range(18):

                k.press(Key.backspace)

                k.release(Key.backspace)

            time.sleep(0.5)

            user = random.choice(self.user_list)

               # 输入用户名

            for i in user['user']:

                k.type(i)sss

                time.sleep(random.uniform(0.5,1.5))

            k.press(Key.tab)

            k.release(Key.tab)

            time.sleep(1)

               # 输入密码

            for i in user['password']:

                k.type(i)

                time.sleep(random.uniform(0.5,1.5))

               # 鼠标移动到登录按钮位置

            m.position = (1040, 460)

            time.sleep(0.5)

            m.click(Button.left)

            time.sleep(2)sss

 

 9. 403页面

请求头里添加 referer

 

posted @ 2018-04-28 09:44  殇夜00  阅读(25)  评论(0)    收藏  举报