python数据分析之pandas里的Series1 Series

1 Series

线性的数据结构, series是一个一维数组

Pandas 会默然用0到n-1来作为series的index, 但也可以自己指定index( 可以把index理解为dict里面的key )

1.1创造一个serise数据

    import pandas as pd
  import numpy as np
 
  s = pd.Series([9, 'zheng', 'beijing', 128])
 
  print(s)
[/code]

* 打印

```code
  0         9
  1     zheng
  2   beijing
  3       128
  dtype: object
[/code]

* 访问其中某个数据

```code
  print(s[1:2])
 
  # 打印
  1   zheng
  dtype: object
[/code]

## 1.2 指定index

```code
  import pandas as pd
  import numpy as np
 
  s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])
 
  print(s)
[/code]

* 打印

```code
  1          9
  2      zheng
  3    beijing
  e        128
  f        usa
  g        990
  dtype: object
[/code]

* 根据索引找出值

```code
  print(s['f'])    # usa
[/code]

## 1.3 用dictionary构造一个series

```code
  import pandas as pd
  import numpy as np
 
  s = {"ton": 20, "mary": 18, "jack": 19, "car": None}
 
  sa = pd.Series(s, name="age")
 
  print(sa)
[/code]

* 打印

```code
  car     NaN
  jack   19.0
  mary   18.0
  ton     20.0
  Name: age, dtype: float64
[/code]

* 检测类型

```code
  print(type(sa))   # <class 'pandas.core.series.Series'>
[/code]

## 1.4 用numpy ndarray构造一个Series

* 生成一个随机数
[code]     import pandas as pd

  import numpy as np
 
  num_abc = pd.Series(np.random.randn(5), index=list('abcde'))
  num = pd.Series(np.random.randn(5))
 
  print(num)
  print(num_abc)
 
  # 打印
  0   -0.102860
  1   -1.138242
  2    1.408063
  3   -0.893559
  4    1.378845
  dtype: float64
  a   -0.658398
  b    1.568236
  c    0.535451
  d    0.103117
  e   -1.556231
  dtype: float64
[/code]

## 1.5 选择数据

```code
  import pandas as pd
  import numpy as np
 
  s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])
 
  print(s[1:3])  # 选择第1到3个, 包左不包右 zheng beijing
  print(s[[1,3]])  # 选择第1个和第3个, zheng 128
  print(s[:-1]) # 选择第1个到倒数第1个, 9 zheng beijing 128 usa
[/code]

## 1.6 操作数据

```code
  import pandas as pd
  import numpy as np
 
  s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])
 
  sum = s[1:3] + s[1:3]
  sum1 = s[1:4] + s[1:4]
  sum2 = s[1:3] + s[1:4]
  sum3 = s[:3] + s[1:]
 
  print(sum)
  print(sum1)
  print(sum2)
  print(sum3)
[/code]

* 打印

```code
  2        zhengzheng
  3    beijingbeijing
  dtype: object
  2        zhengzheng
  3    beijingbeijing
  e               256
  dtype: object
  2        zhengzheng
  3    beijingbeijing
  e               NaN
  dtype: object
  1               NaN
  2        zhengzheng
  3    beijingbeijing
  e               NaN
  f               NaN
  g               NaN
  dtype: object
[/code]

## 1.7 查找

* 是否存在

```code
  USA in s # true
[/code]

* 范围查找

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(sa[sa>19])
[/code]

![](https://img-
blog.csdn.net/20180311131749213?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQva2luZ292/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

* 中位数

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(sa.median()) # 20
[/code]

* 判断是否大于中位数

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(sa>sa.median())
[/code]

![](https://img-
blog.csdn.net/20180311132042901?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQva2luZ292/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

* 找出大于中位数的数

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(sa[sa > sa.median()])
[/code]

![](https://img-
blog.csdn.net/20180311132206419?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQva2luZ292/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

* 中位数

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  more_than_midian = sa>sa.median()
   
  print(more_than_midian)
   
  print('---------------------')
   
  print(sa[more_than_midian])
[/code]

![](https://img-
blog.csdn.net/20180311132520743?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQva2luZ292/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

 

## 1.8 Series赋值

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(s)
   
  print('----------------')
   
  sa['ton'] = 99
   
  print(sa)
[/code]

![](https://img-
blog.csdn.net/20180311132813516?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQva2luZ292/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

 

## 1.9 满足条件的统一赋值

```code
  import pandas as pd
  import numpy as np
   
  s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
   
  sa = pd.Series(s, name="age")
   
  print(s) # 打印原字典
   
  print('---------------------')   # 分割线
   
  sa[sa>19] = 88 # 将所有大于19的同一改为88
   
  print(sa) # 打印更改之后的数据
   
  print('---------------------')   # 分割线
   
  print(sa / 2) # 将所有数据除以2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

posted on 2021-07-07 16:19  BabyGo000  阅读(221)  评论(0)    收藏  举报