读取一个csv文件,并对其数据进行求平均值,最大值,最小值,缺失值的数量,最后再用箱形图判断异常点
一、数据集
import pandas as pd import numpy as np import matplotlib.pyplot as plt #导入图像库 data = pd.read_csv(r'C:\Users\Administrator\Desktop\catering_sale.csv') print(data)
日期 销量
0 2015-03-01 51.0
1 2015-02-28 2618.2
2 2015-02-27 2608.4
3 2015-02-26 2651.9
4 2015-02-25 3442.1
5 2015-02-24 3393.1
6 2015-02-23 3136.6
7 2015-02-22 3744.1
8 2015-02-21 6607.4
9 2015-02-20 4060.3
10 2015-02-19 3614.7
11 2015-02-18 3295.5
12 2015-02-16 2332.1
13 2015-02-15 2699.3
14 2015-02-14 NaN
15 2015-02-13 3036.8
16 2015-02-12 865.0
17 2015-02-11 3014.3
18 2015-02-10 2742.8
19 2015-02-09 2173.5
20 2015-02-08 3161.8
21 2015-02-07 3023.8
22 2015-02-06 2998.1
23 2015-02-05 2805.9
24 2015-02-04 2383.4
25 2015-02-03 2620.2
26 2015-02-02 2600.0
27 2015-02-01 2358.6
28 2015-01-31 2682.2
29 2015-01-30 2766.8
.. ... ...
171 2014-08-31 3494.7
172 2014-08-30 3691.9
173 2014-08-29 2929.5
174 2014-08-28 2760.6
175 2014-08-27 2593.7
176 2014-08-26 2884.4
177 2014-08-25 2591.3
178 2014-08-24 3022.6
179 2014-08-23 3052.1
180 2014-08-22 2789.2
181 2014-08-21 2909.8
182 2014-08-20 2326.8
183 2014-08-19 2453.1
184 2014-08-18 2351.2
185 2014-08-17 3279.1
186 2014-08-16 3381.9
187 2014-08-15 2988.1
188 2014-08-14 2577.7
189 2014-08-13 2332.3
190 2014-08-12 2518.6
191 2014-08-11 2697.5
192 2014-08-10 3244.7
193 2014-08-09 3346.7
194 2014-08-08 2900.6
195 2014-08-07 2759.1
196 2014-08-06 2915.8
197 2014-08-05 2618.1
198 2014-08-04 2993.0
199 2014-08-03 3436.4
200 2014-08-02 2261.7
二、求最大值,平均值,最小值,缺失值的数量
print(data.describe())#求最大值,最小值,平均值
print(“缺失值的数量:”)
print() print(data.isnull().sum())#缺失值的数量
运行结果:
销量
count 200.000000
mean 2755.214700
std 751.029772
min 22.000000
25% 2451.975000
50% 2655.850000
75% 3026.125000
max 9106.440000
缺失值的数量:
日期 0
销量 1
dtype: int64
三、箱形图判断异常点
#去除中文乱码
plt.rcParams['font.sans-serif']=[u'SimHei']
plt.rcParams['axes.unicode_minus']=False
p = data.boxplot(return_type='dict') #画箱式图
x = p['fliers'][0].get_xdata()
y = p['fliers'][0].get_ydata()
y.sort()
for i in range(len(x)):
if i > 0:
plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.05 - 0.8/(y[i]-y[i-1]), y[i]))
else:
plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.08, y[i]))
plt.show()
运行结果:

使用的csv文件:

浙公网安备 33010602011771号