scrapy实战之爬取双创信息并存入mysql数据库(改为存成json文本)
2016-06-30 10:21 LI桥IL 阅读(265) 评论(0) 收藏 举报代码主要把握点:
- 存取字典dict内容变为中文非unicode字符json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False)
- GetNowTime()位置
- if os.path.isdir(path_files)==False: os.makedirs(path_files1)bug点
bug点:
项目文件结构如下:

pipline代码:
同scrapy爬虫存入mysql数据库相比,只需改变pipline
1 # -*- coding: utf-8 -*- 2 3 import sys 4 import json 5 import codecs 6 import os 7 import time 8 9 10 reload(sys) 11 sys.setdefaultencoding("utf-8") 12 13 def GetNowTime(): 14 return time.strftime("%Y-%m-%d",time.localtime(time.time())) 15 16 class ShuangchuangGetDaybyPipeline(object): 17 def process_item(self, item, spider): 18 dict_item={} 19 dict_item['name']=item['name'].encode("UTF-8") 20 dict_item['url']=item['url'].encode("UTF-8") 21 dict_item['pubTime']=item['pubTime'].encode("UTF-8") 22 dict_item['pickTime']=item['pickTime'].encode("UTF-8") 23 time_now=GetNowTime() 24 time_list=time_now.split("-") 25 filesname=time_list[0]+time_list[1] 26 filename=time_list[2] 27 path_files = 'C:\\youedata\\icnpp\\to'+'\\show\\'+filesname 28 path_files1=path_files+'\\' 29 if os.path.isdir(path_files)==False: 30 os.makedirs(path_files1) 31 path_file=path_files1+filename+'.json' 32 json.dump(dict_item, open(path_file, 'a'),ensure_ascii=False) 33 with codecs.open(path_file,'ab','utf-8') as filein: 34 filein.write(',')
声明:本博客仅用于个人记录,请勿进行任何形式转载
浙公网安备 33010602011771号