Python tips
1, input and raw_input
1.1, treat input as string? or meaningful object
>>> raw_input_A = raw_input("raw_input: ")
raw_input: PythonTab.com>>> print raw_input_A PythonTab.com>>> input_A = input("Input: ")Input: PythonTab.comTraceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <module>NameError: name 'PythonTab' is not defined>>> >>> input_A = input("Input: ")Input: "PythonTab.com">>> print input_APythonTab.com>>>1.2, data type
>>> raw_input_B = raw_input("raw_input: ")
raw_input: 2015>>> type(raw_input_B)<type 'str'>>>> input_B = input("input: ")input: 2015>>> type(input_B)<type 'int'> 2, spider issue1
------- issue: class DmozSpider(scrapy.Spider):
AttributeError: 'module' object has no attribute 'Spider'
AttributeError: 'module' object has no attribute 'Spider'
-------solution
sudo apt-get install scrapy // version is too low
sudo apt-get install python-pip
sudo pip install scrapy //no head file for scrapy
sudo apt-get install python-dev
sudo pip install scrapy --upgrade
sudo pip install scrapy --upgrade
3, Spider issue2
2015-12-07 15:33:21 [boto] ERROR: Caught exception reading instance data
解决方法------http://blog.csdn.net/liyuetao680/article/details/48313313
在setting.py中禁用s3 download就可以了 , say .................DOWNLOAD_HANDLERS = {'s3':None,}...........must be lowcase and :
another way is :
from scrapy import optional_featuresoptional_features.remove('boto')4, Spider
http://www.myexception.cn/web/1646523.html //contains ........scrapy crawl dmoz -o items.json -t json
http://doc.scrapy.org/en/latest/intro/tutorial.html
http://blog.csdn.net/pleasecallmewhy/article/details/19642329
5, advanced data type(structure)
http://liuzhichao.com/p/1645.html
list,tuple,Dictionary,set,file
6,
- >>> dict = { 1 : 2, 'a' : 'b', 'hello' : 'world' }
- >>> dict.values()
- ['b', 2, 'world']
- >>> dict.keys()
- ['a', 1, 'hello']
- >>> dict.items()
- [('a', 'b'), (1, 2), ('hello', 'world')]
- >>>
浙公网安备 33010602011771号