python之pandas&&DataFrame

1.Series  Series是一个一维数组

pandas会默认从0开始作为Series的index

>>> test = pd.Series(['num0','num1','num2','num3'])
>>> test
0    num0
1    num1
2    num2
3    num3
dtype: object

也可以自己指定index

>>> test = pd.Series(['num0','num1','num2','num3'],index=['A','B','C','D'])
>>> test
A    num0
B    num1
C    num2
D    num3
dtype: object

Series还可以用dictionary来构造一个Series

>>> cities = {'beijing':55000,'shanghai':60000,'shenzhen':20000,'guangzhou':25000,'suzhou':None}
>>> test = pd.Series(cities)
>>> test
beijing      55000.0
guangzhou    25000.0
shanghai     60000.0
shenzhen     20000.0
suzhou           NaN
dtype: float64
>>> print type(test)
<class 'pandas.core.series.Series'>
>>> test['beijing']
55000.0
>>> test[['beijing','shanghai','shenzhen']]
beijing     55000.0
shanghai    60000.0
shenzhen    20000.0
dtype: float64

2.DataFrame DataFrame是一个二维的数组 DataFrame可以由一个dictionary构造得到

创建DataFrame

>>> data = {'city':['beijing','shanghai','guangzhou','shenzhen','hangzhou','chognqing'],'years':[2010,2011,2012,2013,2014,2015],'population':[2100,2300,2400,2500,
>>> print data
{'city': ['beijing', 'shanghai', 'guangzhou', 'shenzhen', 'hangzhou', 'chognqing'], 'population': [2100, 2300, 2400, 2500, 2600, 2600], 'years': [2010, 2011, 2012, 2013, 2014, 2015]}
>>> pd.DataFrame(data)
        city  population  years
0    beijing        2100   2010
1   shanghai        2300   2011
2  guangzhou        2400   2012
3   shenzhen        2500   2013
4   hangzhou        2600   2014
5  chognqing        2600   2015

调整列的排序和行的名称

>>> pd.DataFrame(data,columns= ['years','city','population'])
   years       city  population
0   2010    beijing        2100
1   2011   shanghai        2300
2   2012  guangzhou        2400
3   2013   shenzhen        2500
4   2014   hangzhou        2600
5   2015  chognqing        2600
>>> pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F'])
   years       city  population
A   2010    beijing        2100
B   2011   shanghai        2300
C   2012  guangzhou        2400
D   2013   shenzhen        2500
E   2014   hangzhou        2600
F   2015  chognqing        2600
>>> 

DataFrame的每一个列,每一行都是一个Series

>>> mmap = pd.DataFrame(data,columns= ['years','city','population'],index = ['A','B','C','D','E','F'])
>>> print mmap
   years       city  population
A   2010    beijing        2100
B   2011   shanghai        2300
C   2012  guangzhou        2400
D   2013   shenzhen        2500
E   2014   hangzhou        2600
F   2015  chognqing        2600
>>> type(mmap)
<class 'pandas.core.frame.DataFrame'>
>>> type(mmap['city'])
<class 'pandas.core.series.Series'>
>>> 
>>> mmap.ix['C']
years              2012
city          guangzhou
population         2400
Name: C, dtype: object
>>> type(mmap.ix['C'])
<class 'pandas.core.series.Series'>

DataFrame的赋值操作

>>> mmap['population']['A']
2100
>>> mmap['population']['A'] = 2000
__main__:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
 
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
>>> mmap['population']['A']
2000
>>> mmap['years'] = 2017
>>> mmap
   years       city  population
A   2017    beijing        2000
B   2017   shanghai        2300
C   2017  guangzhou        2400
D   2017   shenzhen        2500
E   2017   hangzhou        2600
F   2017  chognqing        2600
>>> 

赋值操作

>>> mmap.years = np.arange(6)
>>> mmap
   years       city  population
A      0    beijing        2000
B      1   shanghai        2300
C      2  guangzhou        2400
D      3   shenzhen        2500
E      4   hangzhou        2600
F      5  chognqing        2600
>>> val = pd.Series([200,300,400],index=['A','B','C'])
>>> val
A    200
B    300
C    400
dtype: int64
>>> mmap['year] = val
  File "<stdin>", line 1
    mmap['year] = val
                    ^
SyntaxError: EOL while scanning string literal
>>> mmap['year'] = val
>>> mmap
   years       city  population   year
A      0    beijing        2000  200.0
B      1   shanghai        2300  300.0
C      2  guangzhou        2400  400.0
D      3   shenzhen        2500    NaN
E      4   hangzhou        2600    NaN
F      5  chognqing        2600    NaN
>>> mmap['years'] = 2017
>>> mmap
   years       city  population   year
A   2017    beijing        2000  200.0
B   2017   shanghai        2300  300.0
C   2017  guangzhou        2400  400.0
D   2017   shenzhen        2500    NaN
E   2017   hangzhou        2600    NaN
F   2017  chognqing        2600    NaN
>>> mmap.columns
Index([u'years', u'city', u'population', u'year'], dtype='object')
>>> mmap.index
Index([u'A', u'B', u'C', u'D', u'E', u'F'], dtype='object')

posted on 2017-12-04 23:51  `Elaine  阅读(232)  评论(0编辑  收藏  举报

导航