代码改变世界

scrapy实战之爬取双创信息并存入mysql数据库(改为存成json文本)

2016-06-30 10:21  LI桥IL  阅读(265)  评论(0)    收藏  举报

代码主要把握点:

  1. 存取字典dict内容变为中文非unicode字符json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False)
  2. GetNowTime()位置
  3. if os.path.isdir(path_files)==False: os.makedirs(path_files1)bug点

 bug点:

 

项目文件结构如下:

pipline代码:

同scrapy爬虫存入mysql数据库相比,只需改变pipline

 1 # -*- coding: utf-8 -*-
 2 
 3 import sys
 4 import json
 5 import codecs
 6 import os
 7 import time
 8 
 9 
10 reload(sys)
11 sys.setdefaultencoding("utf-8")
12 
13 def GetNowTime():
14     return time.strftime("%Y-%m-%d",time.localtime(time.time()))
15 
16 class ShuangchuangGetDaybyPipeline(object):
17     def process_item(self, item, spider):
18         dict_item={}
19         dict_item['name']=item['name'].encode("UTF-8")
20         dict_item['url']=item['url'].encode("UTF-8")
21         dict_item['pubTime']=item['pubTime'].encode("UTF-8")
22         dict_item['pickTime']=item['pickTime'].encode("UTF-8")
23         time_now=GetNowTime()
24         time_list=time_now.split("-")
25         filesname=time_list[0]+time_list[1]
26         filename=time_list[2]
27         path_files = 'C:\\youedata\\icnpp\\to'+'\\show\\'+filesname
28         path_files1=path_files+'\\'
29         if os.path.isdir(path_files)==False:
30             os.makedirs(path_files1)
31         path_file=path_files1+filename+'.json'
32         json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False)
33         with codecs.open(path_file,'ab','utf-8') as filein:
34             filein.write(',')