摘要:
```
# 使用自造的cookies登录GitHub import requests
from lxml import etree str = '_octo=GH1.1.518803230.1537264616; logged_in=no; _ga=GA1.2.102113046.1537264618; _gh_sess=RTIralVlQ1pHaG0vVG44b3NsV0s4Z2VZTTVi... 阅读全文
posted @ 2019-05-04 22:07
hank-li
阅读(235)
评论(0)
推荐(0)
摘要:
```
# 使用自造的cookies登录马蜂窝
import requests
from lxml import etree str = 'mfw_uuid=5bcfcc20-b235-fbbe-c1d6-ae01e1f68d82; _r=baidu; _rp=a%3A2%3A%7Bs%3A1%3A%22p%22%3Bs%3A19%3A%22www.baidu.com%2Fbaidu%22%3B... 阅读全文
posted @ 2019-05-04 21:32
hank-li
阅读(137)
评论(0)
推荐(0)
摘要:
```
# 利用cookies登录马蜂窝 import requests
from lxml import etree session = requests.Session()
phone_number = '13521093039'
password = 'pro123,./'
data = {'passport': phone_number, 'password': password}
h... 阅读全文
posted @ 2019-05-04 21:19
hank-li
阅读(129)
评论(0)
推荐(0)
摘要:
```
# 模拟登录GitHub
import requests
from lxml import etree class Login(): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozill... 阅读全文
posted @ 2019-05-04 18:12
hank-li
阅读(123)
评论(0)
推荐(0)
摘要:
```
# 模拟登录马蜂窝
import requests
from lxml import etree session = requests.Session()
phone_number = input('电话')
password = input('密码')
data = {'passport': phone_number, 'password': password}
header = {
... 阅读全文
posted @ 2019-05-04 18:11
hank-li
阅读(151)
评论(0)
推荐(0)
摘要:
什么是模拟登录? 要抓取的信息,只有在登录之后才能查看。这种情况下,就需要爬虫做模拟登录,绕过登录页。 cookies和session的区别: cookie数据存放在客户的浏览器上,session数据放在服务器上; cookie不是很安全,别人可以分析存放在本地的COOKIE并进行COOKIE欺骗, 阅读全文
posted @ 2019-05-04 18:05
hank-li
阅读(206)
评论(0)
推荐(0)
摘要:
1_info.py 2_pie_chart.py 3_hist.py 4_ratio.py 阅读全文
posted @ 2019-05-04 17:54
hank-li
阅读(120)
评论(0)
推荐(0)
摘要:
ershoufang.py zufang_spider.py items.py middlewares.py pipelines.py settings.py 阅读全文
posted @ 2019-05-04 17:48
hank-li
阅读(147)
评论(0)
推荐(0)
摘要:
``` import redis import telnetlib import urllib.request from bs4 import BeautifulSoup r = redis.Redis(host='127.0.0.1', port=6379) for d in range(1, 3 阅读全文
posted @ 2019-05-04 16:57
hank-li
阅读(331)
评论(0)
推荐(0)
摘要:
taobao.py items.py middlewares.py pipelines.py settings.py 阅读全文
posted @ 2019-05-04 13:30
hank-li
阅读(193)
评论(0)
推荐(0)
摘要:
```
# python执行lua脚本 import requests
from urllib.parse import quote lua = '''
function main(splash) return 'hello'
end
''' url = 'http://localhost:8050/execute?lua_source=' + quote(lua)
response... 阅读全文
posted @ 2019-05-04 11:13
hank-li
阅读(97)
评论(0)
推荐(0)
摘要:
```
# 抓取《我不是药神》的豆瓣评论 import csv
import time
import requests
from lxml import etree fw = open('douban_comments.csv', 'w')
writer = csv.writer(fw)
writer.writerow(['comment_time','comment_content']) ... 阅读全文
posted @ 2019-05-04 10:57
hank-li
阅读(106)
评论(0)
推荐(0)
摘要:
```
# 抓取今日头条,对比渲染和没有渲染的效果 import requests
from lxml import etree # url = 'http://localhost:8050/render.html?url=https://www.toutiao.com&timeout=30&wait=0.5'
url = 'https://www.toutiao.com' response... 阅读全文
posted @ 2019-05-04 10:36
hank-li
阅读(117)
评论(0)
推荐(0)
摘要:
``` import requests import json import re from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.sup... 阅读全文
posted @ 2019-05-04 10:32
hank-li
阅读(159)
评论(0)
推荐(0)
摘要:
```
# 抓取简书博客总阅读量
# https://www.jianshu.com/u/130f76596b02
import requests
import json
import re
from lxml import etree header = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,... 阅读全文
posted @ 2019-05-04 10:05
hank-li
阅读(190)
评论(0)
推荐(0)