scrapy实战之爬取双创信息并存入mysql数据库(改为存成json文本)

2016-06-30 10:21 LI桥IL 阅读(265) 评论(0) 收藏举报

代码主要把握点:

存取字典dict内容变为中文非unicode字符json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False)
GetNowTime()位置
if os.path.isdir(path_files)==False: os.makedirs(path_files1)bug点

bug点:

项目文件结构如下:

pipline代码:

同scrapy爬虫存入mysql数据库相比,只需改变pipline

 1 # -*- coding: utf-8 -*-
 2 
 3 import sys
 4 import json
 5 import codecs
 6 import os
 7 import time
 8 
 9 
10 reload(sys)
11 sys.setdefaultencoding("utf-8")
12 
13 def GetNowTime():
14     return time.strftime("%Y-%m-%d",time.localtime(time.time()))
15 
16 class ShuangchuangGetDaybyPipeline(object):
17     def process_item(self, item, spider):
18         dict_item={}
19         dict_item['name']=item['name'].encode("UTF-8")
20         dict_item['url']=item['url'].encode("UTF-8")
21         dict_item['pubTime']=item['pubTime'].encode("UTF-8")
22         dict_item['pickTime']=item['pickTime'].encode("UTF-8")
23         time_now=GetNowTime()
24         time_list=time_now.split("-")
25         filesname=time_list[0]+time_list[1]
26         filename=time_list[2]
27         path_files = 'C:\\youedata\\icnpp\\to'+'\\show\\'+filesname
28         path_files1=path_files+'\\'
29         if os.path.isdir(path_files)==False:
30             os.makedirs(path_files1)
31         path_file=path_files1+filename+'.json'
32         json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False)
33         with codecs.open(path_file,'ab','utf-8') as filein:
34             filein.write(',')

刷新页面返回顶部

Don't Die With A Strong Leg The Throne Is Mine

scrapy实战之爬取双创信息并存入mysql数据库(改为存成json文本)

About