运用Pandas(Python3以上)连接MySQL进行读写,查询

Pandas是Python处理数据分析包。当需要处理大量数据时能比EXCEL提供更强大的运行效率。可以与读入,写出如:CSV,CST,Excel,MySQL等数据存储方式。

任务:

本次将要介绍的是如何利用Pandas以及一些辅助包来:

1. 将Python中的DataFrame数据写入MySQL

2. 将MySQL中的数据导入Pyhon,并且利用MySQL语句进行操作

准备:

利用到的模拟数据来自Stackoverflow 中的Stack Overflow Annual Developer Survey 2020 (CSV File)(https://insights.stackoverflow.com/survey)关于程序员的调查数据

在操作之前确保已经安装相应的库:Pandas,SQLAlchemy,pymysql

用到的工具:Anaconda,JupyterNotebook(IDLE 或 Pycharm也可,但Jupyter的数据界面比较清晰),MySQL Workbench(8.0版本)。

方法如下(假设已经安装Anaconda):搜索Anaconda Prompt

pip install PyMySQL

pip install Pandas

pip install SQLAlchemy



作者:Mingjie_MEL
链接:https://www.jianshu.com/p/78763cab9f40
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
 
 
 

操作阶段

 

 

 

1. 导入数据到Jupyter

将下载的CSV导入Jupyter,操作如下:https://insights.stackoverflow.com/survey#数据集下载地址

import pandas as pd
df = pd.read_csv(r"C:\Users\Administrator\11月\developer_survey_2019\survey_results_public.csv",nrows=100)
df.head()
RespondentMainBranchHobbyistOpenSourcerOpenSourceEmploymentCountryStudentEdLevelUndergradMajor...WelcomeChangeSONewContentAgeGenderTransSexualityEthnicityDependentsSurveyLengthSurveyEase
0 1 I am a student who is learning to code Yes Never The quality of OSS and closed source software ... Not employed, and not looking for work United Kingdom No Primary/elementary school NaN ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 14.0 Man No Straight / Heterosexual NaN No Appropriate in length Neither easy nor difficult
1 2 I am a student who is learning to code No Less than once per year The quality of OSS and closed source software ... Not employed, but looking for work Bosnia and Herzegovina Yes, full-time Secondary school (e.g. American high school, G... NaN ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 19.0 Man No Straight / Heterosexual NaN No Appropriate in length Neither easy nor difficult
2 3 I am not primarily a developer, but I write co... Yes Never The quality of OSS and closed source software ... Employed full-time Thailand No Bachelor’s degree (BA, BS, B.Eng., etc.) Web development or web design ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 28.0 Man No Straight / Heterosexual NaN Yes Appropriate in length Neither easy nor difficult
3 4 I am a developer by profession No Never The quality of OSS and closed source software ... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 22.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
4 5 I am a developer by profession Yes Once a month or more often OSS is, on average, of HIGHER quality than pro... Employed full-time Ukraine No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 30.0 Man No Straight / Heterosexual White or of European descent;Multiracial No Appropriate in length Easy

5 rows × 85 columns

 

2. 提前设置好Mysql

在Mysql中建立新的数据库(Schema)命名为sample_db

import pymysql
from sqlalchemy import create_engine


3. 创立连接

首先import sqlalchemy 和 pymysql库

create_engine 为一个方法(method),执行建立连接的操作

engine变量储存所创立的连接,以便重复利用

df.to_sql() 为写入sql的一个方法(method),其中arg1 为数据库中数据表的名称(此时新建的),engine 代表前期利用create_engine创建的连接。if_exists 检查是否之前已经存在此数据表,‘append’代表如果存在附加,‘replace’代表如果存在,覆盖原来数据。

操作完毕,登录数据库,此时在数据库(sample_db)中将能看到新创建的数据表(sample_table)
 
engine  = create_engine("mysql+pymysql://root:root@localhost/test_python_mysql")
df.to_sql("sample_table",engine,if_exists="append", index=False, index_label=False)#sample_table就是在该数据库下的新建的表
#执行完毕此行命令,自动创建该table

 

#4 接着读取数据库中的数据

#还是利用第三步创立的连接engine,在Jupyter中接着输入

datas = pd.read_sql('sample_table',engine,index_col = 'Respondent')
datas
MainBranchHobbyistOpenSourcerOpenSourceEmploymentCountryStudentEdLevelUndergradMajorEduOther...WelcomeChangeSONewContentAgeGenderTransSexualityEthnicityDependentsSurveyLengthSurveyEase
Respondent                     
1 I am a student who is learning to code Yes Never The quality of OSS and closed source software ... Not employed, and not looking for work United Kingdom No Primary/elementary school None Taught yourself a new language, framework, or ... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 14.0 Man No Straight / Heterosexual None No Appropriate in length Neither easy nor difficult
2 I am a student who is learning to code No Less than once per year The quality of OSS and closed source software ... Not employed, but looking for work Bosnia and Herzegovina Yes, full-time Secondary school (e.g. American high school, G... None Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 19.0 Man No Straight / Heterosexual None No Appropriate in length Neither easy nor difficult
3 I am not primarily a developer, but I write co... Yes Never The quality of OSS and closed source software ... Employed full-time Thailand No Bachelor’s degree (BA, BS, B.Eng., etc.) Web development or web design Taught yourself a new language, framework, or ... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 28.0 Man No Straight / Heterosexual None Yes Appropriate in length Neither easy nor difficult
4 I am a developer by profession No Never The quality of OSS and closed source software ... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 22.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
5 I am a developer by profession Yes Once a month or more often OSS is, on average, of HIGHER quality than pro... Employed full-time Ukraine No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 30.0 Man No Straight / Heterosexual White or of European descent;Multiracial No Appropriate in length Easy
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96 I used to be a developer by profession, but no... No Never The quality of OSS and closed source software ... Employed full-time Canada No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 43.0 Man No Straight / Heterosexual South Asian Yes Too long Easy
97 I am not primarily a developer, but I write co... Yes Never The quality of OSS and closed source software ... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) A social science (ex. anthropology, psychology... Taken an online course in programming or softw... ... Somewhat more welcome now than last year Tech articles written by other developers;Indu... 25.0 Non-binary, genderqueer, or gender non-conforming Yes Bisexual White or of European descent No Appropriate in length Easy
98 I code primarily as a hobby Yes Less than once a month but more than once per ... OSS is, on average, of LOWER quality than prop... Independent contractor, freelancer, or self-em... Bosnia and Herzegovina No Bachelor’s degree (BA, BS, B.Eng., etc.) Another engineering discipline (ex. civil, ele... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 30.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
99 I am a student who is learning to code Yes Less than once per year The quality of OSS and closed source software ... Employed part-time India Yes, full-time Secondary school (e.g. American high school, G... None Taken an online course in programming or softw... ... Somewhat more welcome now than last year Industry news about technologies you're intere... 21.0 Man No Straight / Heterosexual South Asian No Appropriate in length Neither easy nor difficult
100 I am a developer by profession Yes Once a month or more often The quality of OSS and closed source software ... Employed full-time Canada No Some college/university study without earning ... Computer science, computer engineering, or sof... Taught yourself a new language, framework, or ... ... Just as welcome now as I felt last year None 39.0 Woman No Bisexual White or of European descent No Appropriate in length Easy

100 rows × 84 columns

 
# 5 利用Python操作查询导入想要的数据
# pd.read_sql_query (query 代表查询),arg1 为查询的条件(SQL语法),代表从sample_table中以age>30 为条件,* 代表查询所有的数据。

# 验证sql_df_test['Age'].min() 为31

# 作者:Mingjie_MEL
# 链接:https://www.jianshu.com/p/78763cab9f40
# 来源:简书
# 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

datas_query = pd.read_sql_query("select  * from sample_table where age=30",engine, index_col="Respondent")
datas_query
MainBranchHobbyistOpenSourcerOpenSourceEmploymentCountryStudentEdLevelUndergradMajorEduOther...WelcomeChangeSONewContentAgeGenderTransSexualityEthnicityDependentsSurveyLengthSurveyEase
Respondent                     
5 I am a developer by profession Yes Once a month or more often OSS is, on average, of HIGHER quality than pro... Employed full-time Ukraine No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 30.0 Man No Straight / Heterosexual White or of European descent;Multiracial No Appropriate in length Easy
42 I am a developer by profession Yes Less than once a month but more than once per ... OSS is, on average, of HIGHER quality than pro... Employed full-time Bangladesh No Secondary school (e.g. American high school, G... None Taken a part-time in-person course in programm... ... Somewhat more welcome now than last year Tech articles written by other developers;Tech... 30.0 Man No Straight / Heterosexual Middle Eastern None Appropriate in length Easy
95 I am a developer by profession No Less than once per year OSS is, on average, of HIGHER quality than pro... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) Mathematics or statistics Taken an online course in programming or softw... ... Just as welcome now as I felt last year None 30.0 Man No Straight / Heterosexual White or of European descent Yes Too long Easy
98 I code primarily as a hobby Yes Less than once a month but more than once per ... OSS is, on average, of LOWER quality than prop... Independent contractor, freelancer, or self-em... Bosnia and Herzegovina No Bachelor’s degree (BA, BS, B.Eng., etc.) Another engineering discipline (ex. civil, ele... Taken an online course in programming or softw... ... Just as welcome now as I felt last year Tech meetups or events in your area;Courses on... 30.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy

4 rows × 84 columns

 

 

datas_query["Age"].max()==datas_query["Age"].min()##因为这里查询的都是为30的,所以相等为True

 

6 Pandas语句查询与Mysql语句查询对比

假设想查询模拟数据中,参与调查的中国程序员的平均年龄。

1利用pandas语句

df.groupby("Country")["Age"].mean()["China"] #按照国家进行分组,然后计算  中国china组内的所有人的 年龄的平均值
#输出 29.5

df.groupby("Country")["Age"].mean()["United States"]
#输出
33.04545454545455

 

 

2下面利用mysql语句(生成DataFrame格式,方便在Pandas中做进一步操作)

 

china_mean  = pd.read_sql_query("Select  avg(Age),Country from sample_table group by Country having country = 'China'",engine)
china_mean ##DataFrame格式

avg(Age)Country029.5China

all_mean  = pd.read_sql_query("Select  avg(Age),Country from sample_table group by Country ",engine)
print(type(all_mean))
print(type(china_mean))
all_mean
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
	avg(Age)	Country
0	22.000000	Antigua and Barbuda
1	31.500000	Argentina
2	26.500000	Australia
3	35.000000	Austria
4	25.000000	Azerbaijan
5	30.000000	Bangladesh
6	24.500000	Bosnia and Herzegovina
7	33.500000	Brazil
8	29.666667	Canada
9	29.500000	China
10	22.000000	Colombia
11	34.500000	Czech Republic
12	23.000000	Denmark
13	31.000000	France
14	31.600000	Germany
15	23.700000	India
16	24.000000	Ireland
17	23.000000	Israel
18	42.000000	Italy
19	38.000000	Lithuania
20	NaN	Malaysia
21	19.000000	Netherlands
22	29.333333	New Zealand
23	26.000000	Pakistan
24	21.000000	Philippines
25	25.000000	Poland
26	20.000000	Russian Federation
27	34.000000	Serbia
28	22.000000	South Africa
29	27.000000	Spain
30	21.000000	Sri Lanka
31	37.000000	Sweden
32	23.000000	Switzerland
33	28.000000	Thailand
34	NaN	Turkey
35	36.000000	Ukraine
36	24.600000	United Kingdom
37	33.045455	United States

  

 

使用mysql语句(简单的结果输出)

china_mean_new = engine.execute("select avg(Age),country from sample_table group by Country having Country='China'").fetchall()
china_mean_new
[(29.5, 'China')]


print(type(china_mean_new))
<class 'list'>

 

 

 

 

posted @ 2020-11-11 10:34  guofen3399  阅读(1009)  评论(0)    收藏  举报