爬虫报错
1. scrapy 代理ip 报错 》》Connection was refused by otherside: 111: Connection refused.
0
暂无解。。。。。。50%概率性出现
2. HttpConnetentErro
手机版url 用 手机版 ua
url格式
3. 编码
content = response.read().decode('gbk')
UnicodeDecodeError: 'gbk' codec can'tdecode byte 0x8b in position 1: illegal multibyte sequence
请求头Headers = { } 里参数设置错误
例子:
self.request_headers = {
#错误的
# 'Host': ',
# 'accept': 'application/json, text/javascript, */*; q=0.01',
# 'accept-encoding': 'gzip, deflate, br', # 编码格式
# 'accept-language': 'zh-CN,zh;q=0.8',
# 'cache-control': 'no-cache',
# 'content-length': '1098',
# 'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'pragma': 'no-cache',
# 'referer': ',
# 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/62.0.3178.0 Safari/537.36',
# 'x-requested-with': 'XMLHttpRequest'
'Host': 'login,
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0)Gecko/20100101 Firefox/35.0',
'Referer': 'https://login',
'Content-Type': 'application/x-www-form-urlencoded',
'Connection': 'Keep-Alive'
}
4. Pandas dateFrame 报错
result = result.T.sort(['confidence','support'], ascending = False)
报以下错误:
AttributeError:'DataFrame' object has no attribute 'sort'
解决方式:
sort_values()即可解决
5. Etree.HTML()
UnicodeDecodeError: 'utf-8' codec can'tdecode byte 0xc1 in position 374: invalid start byte
编码问题
response =requests.get(url=url10, headers=header, proxies=proxy_ip_01(),cookies=cookie01)
con = response.content.decode('gbk')
html =etree.HTML(con)
# print(con)
aa =html.xpath("//div[@class='popup-inner']/ul/li[3]/div//ul/li[1]//a/text()")
print(aa)
6. 获取到内容 乱码
想要获得正确网页内容,而非乱码的话,就有两种方式了:
1.不要设置Accept-Encoding的Header
//req.Headers.Add("Accept-Encoding","gzip,deflate");
2.设置Accept-Encoding的Header,同时设置对应的自动解压缩的模式
req.Headers["Accept-Encoding"]= "gzip,deflate";
req.AutomaticDecompression = DecompressionMethods.GZip;
7 MySQL 报错
You have an error in your SQL syntax; checkthe manual that corresponds to your MySQL server version for the right syntaxto use near
"update tmall_bijiben_goods settitle='%s' where goods_id='%s'" % (title, item_id)
%s 需加上引号
8.requests.exceptions.ConnectionError:('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Cookie过时
9 .selenuium + chrome 模拟登录失败
尝试 换selenium + 火狐
10.尝试 换selenium + 火狐 模拟登陆 自动输入
frompynput.mouse import Button, Controller
frompynput.keyboard import Controller as KeyController
frompynput.keyboard import Key
self.driver.get('https://login')
time.sleep(1)
m = Controller()
# 设置鼠标位置到输入框
m.position = (1208, 266)
m.click(Button.left)
time.sleep(0.2)
m.release(Button.left)
time.sleep(1)
k = KeyController()
# 清空输入框
for i in range(18):
k.press(Key.backspace)
k.release(Key.backspace)
time.sleep(0.5)
user = random.choice(self.user_list)
# 输入用户名
for i in user['user']:
k.type(i)sss
time.sleep(random.uniform(0.5,1.5))
k.press(Key.tab)
k.release(Key.tab)
time.sleep(1)
# 输入密码
for i in user['password']:
k.type(i)
time.sleep(random.uniform(0.5,1.5))
# 鼠标移动到登录按钮位置
m.position = (1040, 460)
time.sleep(0.5)
m.click(Button.left)
time.sleep(2)sss
、
9. 403页面
请求头里添加 referer

浙公网安备 33010602011771号