【Python数据分析】新冠肺炎数据--获取

本文作为【Python数据分析】板块的第一篇文章,主要内容为从Wind提供的Python接口获取数据集,简单处理后保存至本地。

Δ在此为本系列博文作如下约定:#表示操作讲解或代码注释,有行号的灰色框内为代码,紧随其后的没有行号的灰色框(或截图)为代码运行结果。

 

1、

#导入pandas、万得的Python接口WindPy

#如果使用的是jupyter且提示找不到WindPy模块,可以将WindPy.pth复制到jupyter安装路径下

1 import pandas as pd
2 from WindPy import *

2、

#启动Windpy,导入20201018往前一年的新冠肺炎确诊案例累计数,并赋值给Wdata。以S开头的代码表示全国各省。

1 w.start()
2 Wdata=w.edb("S6274770,S6289292,S6274299,S6289303,S6274354,S6274437,S6289302,S6289313,S6274256,S6289297,S6289296, \
3             S6274810,S6289290,S6289294,S6289295,S6289291,S6275679,S6275447,S6275558,S6275394,S6289298,S6289301,S6289289, \
4             S6289293,S6275360,S6289300,S6275203,S6289299,S6274391,S6289316,S6274477,S6289310",
5             "2019-10-19", "2020-10-18","Fill=Previous")
Welcome to use Wind Quant API for Python (WindPy)!

COPYRIGHT (C) 2020 WIND INFORMATION CO., LTD. ALL RIGHTS RESERVED.
IN NO CIRCUMSTANCE SHALL WIND BE RESPONSIBLE FOR ANY DAMAGES OR LOSSES CAUSED BY USING WIND QUANT API FOR Python.

3、

#可以看到Wind量化接口的欢迎提示,接下来打印Wdata查看获取到的数据。

#查看Wdata数据,其中.Codes表示各省代码,.Times表示日期,.Data表示数据

1 print(Wdata)
.ErrorCode=0
.Codes=[S6274770,S6289292,S6274299,S6289303,S6274354,S6274437,S6289302,S6289313,S6274256,S6289297,...]
.Fields=[CLOSE]
.Times=[20200116,20200117,20200118,20200119,20200120,20200121,20200122,20200123,20200124,20200125,...]
.Data=[[nan,nan,nan,nan,291.0,440.0,571.0,830.0,1287.0,1975.0,...],[nan,nan,nan,nan,nan,nan,nan,26.0,36.0,51.0,...],[nan,nan,nan,nan,nan,2.0,4.0,5.0,8.0,10.0,...],[nan,nan,nan,nan,nan,nan,1.0,2.0,8.0,13.0,...],[nan,nan,nan,nan,nan,nan,1.0,1.0,6.0,9.0,...],[nan,nan,nan,nan,nan,nan,nan,1.0,2.0,7.0,...],[nan,nan,nan,nan,nan,nan,2.0,4.0,12.0,19.0,...],[nan,nan,nan,nan,nan,nan,nan,3.0,4.0,4.0,...],[nan,nan,nan,nan,nan,nan,2.0,4.0,9.0,15.0,...],[nan,nan,nan,0.0,2.0,9.0,16.0,20.0,33.0,40.0,...],...]

4、

#可以看到,Wdata.Data是一个276列的数据,所以将日期设置为DataFrame的列,将代码设置为DataFrame的行索引。但用日期表示行,用各省代码作为列更符合习惯,因此使用.T方法转置。

1 covid_cases_cumsum=pd.DataFrame(Wdata.Data,index=Wdata.Codes,columns=Wdata.Times).T

5、

#打印数据,看到顺利生成了新冠确诊病例累计值数据,但是各省使用代码表示,不够直观。

1 print(covid_cases_cumsum.head())
  S6274770  S6289292  S6274299  S6289303  S6274354  S6274437  \
2020-01-16       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-17       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-18       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-19       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-20     291.0       NaN       NaN       NaN       NaN       NaN   

            S6289302  S6289313  S6274256  S6289297  ...  S6289289  S6289293  \
2020-01-16       NaN       NaN       NaN       NaN  ...       NaN       NaN   
2020-01-17       NaN       NaN       NaN       NaN  ...       NaN       NaN   
2020-01-18       NaN       NaN       NaN       NaN  ...       NaN       NaN   
2020-01-19       NaN       NaN       NaN       0.0  ...       NaN       NaN   
2020-01-20       NaN       NaN       NaN       2.0  ...       NaN       NaN   

            S6275360  S6289300  S6275203  S6289299  S6274391  S6289316  \
2020-01-16       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-17       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-18       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-19       NaN       NaN       NaN       NaN       NaN       NaN   
2020-01-20       NaN       NaN       NaN       NaN       NaN       NaN   

            S6274477  S6289310  
2020-01-16       NaN       NaN  
2020-01-17       NaN       NaN  
2020-01-18       NaN       NaN  
2020-01-19       NaN       0.0  
2020-01-20       NaN       0.0  

[5 rows x 32 columns]

6、

#建立一个字典,键为地区代码,值为地区名称。并打印字典查看是否建立成功

1 code_list=['S6274770','S6289292','S6274299','S6289303','S6274354','S6274437','S6289302','S6289313','S6274256','S6289297',\
2            'S6289296','S6274810','S6289290','S6289294','S6289295','S6289291','S6275679','S6275447','S6275558','S6275394',\
3            'S6289298','S6289301','S6289289','S6289293','S6275360','S6289300','S6275203','S6289299','S6274391','S6289316',\
4            'S6274477','S6289310',]
5 area_list=['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南',\
6            '湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']
7 code_to_area_dict={code_list[i]:area_list[i] for i in range(32)}
8 print(code_to_area_dict)
{'S6274770': '全国', 'S6289292': '北京', 'S6274299': '天津', 'S6289303': '河北', 'S6274354': '山西', 'S6274437': '内蒙古', 'S6289302': '辽宁', 'S6289313': '吉林', 'S6274256': '黑龙江', 'S6289297': '上海', 'S6289296': '江苏', 'S6274810': '浙江', 'S6289290': '安徽', 'S6289294': '福建', 'S6289295': '江西', 'S6289291': '山东', 'S6275679': '河南', 'S6275447': '湖北', 'S6275558': '湖南', 'S6275394': '广东', 'S6289298': '广西', 'S6289301': '海南', 'S6289289': '重庆', 'S6289293': '四川', 'S6275360': '贵州', 'S6289300': '云南', 'S6275203': '西藏', 'S6289299': '陕西', 'S6274391': '甘肃', 'S6289316': '青海', 'S6274477': '宁夏', 'S6289310': '新疆'}

 

7、

#使用Pandas.rename方法修改列名,并传入code_to_area_dict字典作为参数,inplace设置为True,直接修改原数据。

1 covid_cases_cumsum.rename(columns=code_to_area_dict,inplace=True)
2 print(covid_cases_cumsum.info())
<class 'pandas.core.frame.DataFrame'>
Index: 276 entries, 2020-01-16 to 2020-10-17
Data columns (total 32 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   全国      272 non-null    float64
 1   北京      269 non-null    float64
 2   天津      271 non-null    float64
 3   河北      270 non-null    float64
 4   山西      270 non-null    float64
 5   内蒙古     269 non-null    float64
 6   辽宁      270 non-null    float64
 7   吉林      269 non-null    float64
 8   黑龙江     270 non-null    float64
 9   上海      273 non-null    float64
 10  江苏      270 non-null    float64
 11  浙江      269 non-null    float64
 12  安徽      270 non-null    float64
 13  福建      270 non-null    float64
 14  江西      271 non-null    float64
 15  山东      268 non-null    float64
 16  河南      269 non-null    float64
 17  湖北      276 non-null    float64
 18  湖南      271 non-null    float64
 19  广东      273 non-null    float64
 20  广西      270 non-null    float64
 21  海南      270 non-null    float64
 22  重庆      270 non-null    float64
 23  四川      270 non-null    float64
 24  贵州      270 non-null    float64
 25  云南      271 non-null    float64
 26  西藏      262 non-null    float64
 27  陕西      269 non-null    float64
 28  甘肃      269 non-null    float64
 29  青海      267 non-null    float64
 30  宁夏      270 non-null    float64
 31  新疆      273 non-null    float64
dtypes: float64(32)
memory usage: 71.2+ KB
None

8、 

#将covid_cases_cumsum数据保存至本地,该数据已分享至百度云盘。 链接:https://pan.baidu.com/s/100z9ZUpsI-xIcirhy3hkpA 提取码:4eo8

1 covid_cases_cumsum.to_csv(r'C:\Users\92342\Desktop\python\Python数据分析学习之路\1、疫情数据\covid_cases_cumsum.csv')

 

posted @ 2020-10-19 21:19  隐岐  阅读(530)  评论(0)    收藏  举报