京东商品评价爬虫

最近因为一些事情,需要爬一下京东商品的评论(大部分是书籍)

  • 准备环境:PyCharm,python3.5.2

话不多说赶紧上代码:

# -*- coding: utf-8 -*-

import re, json, requests
import codecs
from bs4 import BeautifulSoup
import csv
import os

s = requests.session()
url = 'https://club.jd.com/comment/productPageComments.action'
data = {
	'callback': 'fetchJSON_comment98vv13933',
	# 需要抓取评论的商品id
	'productId': '11936238',

	# score 参数说明:
	# 0  抓取所有评论(好评在前)
	# 1  抓取所有差评
	# 2  抓取所有中评
	# 3  抓取所有追评
	# 4  抓取所有配图评论
	'score': 1,

	'sortType': 5,
	'page': 0,
	'pageSize': 10,
	'isShadowSku': 0,
	'fold': 1
}

# 设置抓取目标评论数
target_cnt = 100

# 设置保存文件名
target_file = str(data['productId']) + '_' + str(data['score']) + '.csv'

cnt = 1

with open(target_file, "w", encoding='utf8', newline='') as csvFile:
	writer = csv.writer(csvFile, quoting=csv.QUOTE_ALL)
	writer.writerow(["stars", "time", "comment"])
	while cnt <= target_cnt:
    	t = s.get(url, params=data).text
    	try:
        	t = re.search(r'(?<=fetchJSON_comment98vv13933\().*(?=\);)', t).group(0)
    	except Exception as e:
        	break
    	j = json.loads(t)
    	commentSummary = j['comments']
    	for comment in commentSummary:
        	c_content = comment['content']  # 评论
        	c_time = comment['referenceTime']
        	c_name = comment['nickname']
        	c_client = comment['userClientShow']
        	score = comment['score']
       		print(score)
        	print('{} {} {}\n{}\n'.format(c_name, c_time, c_client, c_content))
        	writer.writerow([score, c_time, c_content])
    	data['page'] += 1
    	cnt = cnt + 1

csvFile.close()

大概没什么其他需要讲的了吧,当然这个爬虫是在别的地方找的。而且也是最简单的一类,没有做反反爬处理。这些以后会找机会记录。

来源:Github

posted on 2018-05-23 15:24  George_Yang  阅读(321)  评论(0)    收藏  举报