运用Pandas（Python3以上）连接MySQL进行读写，查询

Pandas是Python处理数据分析包。当需要处理大量数据时能比EXCEL提供更强大的运行效率。可以与读入，写出如：CSV，CST，Excel，MySQL等数据存储方式。

任务：

本次将要介绍的是如何利用Pandas以及一些辅助包来：

1. 将Python中的DataFrame数据写入MySQL

2. 将MySQL中的数据导入Pyhon，并且利用MySQL语句进行操作

准备：

利用到的模拟数据来自Stackoverflow 中的Stack Overflow Annual Developer Survey 2020 （CSV File）（https://insights.stackoverflow.com/survey）关于程序员的调查数据

在操作之前确保已经安装相应的库：Pandas，SQLAlchemy，pymysql

用到的工具：Anaconda，JupyterNotebook(IDLE 或 Pycharm也可，但Jupyter的数据界面比较清晰)，MySQL Workbench（8.0版本）。

方法如下（假设已经安装Anaconda）：搜索Anaconda Prompt

pip install PyMySQL

pip install Pandas

pip install SQLAlchemy

作者：Mingjie_MEL
链接：https://www.jianshu.com/p/78763cab9f40
来源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

操作阶段

1. 导入数据到Jupyter

将下载的CSV导入Jupyter，操作如下：https://insights.stackoverflow.com/survey#数据集下载地址

import pandas as pd
df = pd.read_csv(r"C:\Users\Administrator\11月\developer_survey_2019\survey_results_public.csv",nrows=100)
df.head()

Respondent	MainBranch	Hobbyist	OpenSourcer	OpenSource	Employment	Country	Student	EdLevel	UndergradMajor	...	WelcomeChange	SONewContent	Age	Gender	Trans	Sexuality	Ethnicity	Dependents	SurveyLength	SurveyEase
0	1	I am a student who is learning to code	Yes	Never	The quality of OSS and closed source software ...	Not employed, and not looking for work	United Kingdom	No	Primary/elementary school	NaN	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	14.0	Man	No	Straight / Heterosexual	NaN	No	Appropriate in length	Neither easy nor difficult
1	2	I am a student who is learning to code	No	Less than once per year	The quality of OSS and closed source software ...	Not employed, but looking for work	Bosnia and Herzegovina	Yes, full-time	Secondary school (e.g. American high school, G...	NaN	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	19.0	Man	No	Straight / Heterosexual	NaN	No	Appropriate in length	Neither easy nor difficult
2	3	I am not primarily a developer, but I write co...	Yes	Never	The quality of OSS and closed source software ...	Employed full-time	Thailand	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Web development or web design	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	28.0	Man	No	Straight / Heterosexual	NaN	Yes	Appropriate in length	Neither easy nor difficult
3	4	I am a developer by profession	No	Never	The quality of OSS and closed source software ...	Employed full-time	United States	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	22.0	Man	No	Straight / Heterosexual	White or of European descent	No	Appropriate in length	Easy
4	5	I am a developer by profession	Yes	Once a month or more often	OSS is, on average, of HIGHER quality than pro...	Employed full-time	Ukraine	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	30.0	Man	No	Straight / Heterosexual	White or of European descent;Multiracial	No	Appropriate in length	Easy

5 rows × 85 columns

2. 提前设置好Mysql

在Mysql中建立新的数据库（Schema）命名为sample_db

import pymysql
from sqlalchemy import create_engine


3. 创立连接

首先import sqlalchemy 和 pymysql库

create_engine 为一个方法（method），执行建立连接的操作

engine变量储存所创立的连接，以便重复利用

df.to_sql() 为写入sql的一个方法（method），其中arg1 为数据库中数据表的名称（此时新建的），engine 代表前期利用create_engine创建的连接。if_exists 检查是否之前已经存在此数据表，‘append’代表如果存在附加，‘replace’代表如果存在，覆盖原来数据。

操作完毕，登录数据库，此时在数据库（sample_db）中将能看到新创建的数据表（sample_table）
 
engine  = create_engine("mysql+pymysql://root:root@localhost/test_python_mysql")

df.to_sql("sample_table",engine,if_exists="append", index=False, index_label=False)#sample_table就是在该数据库下的新建的表
#执行完毕此行命令，自动创建该table

#4 接着读取数据库中的数据

#还是利用第三步创立的连接engine，在Jupyter中接着输入

datas = pd.read_sql('sample_table',engine,index_col = 'Respondent')
datas

MainBranch	Hobbyist	OpenSourcer	OpenSource	Employment	Country	Student	EdLevel	UndergradMajor	EduOther	...	WelcomeChange	SONewContent	Age	Gender	Trans	Sexuality	Ethnicity	Dependents	SurveyLength	SurveyEase
Respondent
1	I am a student who is learning to code	Yes	Never	The quality of OSS and closed source software ...	Not employed, and not looking for work	United Kingdom	No	Primary/elementary school	None	Taught yourself a new language, framework, or ...	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	14.0	Man	No	Straight / Heterosexual	None	No	Appropriate in length	Neither easy nor difficult
2	I am a student who is learning to code	No	Less than once per year	The quality of OSS and closed source software ...	Not employed, but looking for work	Bosnia and Herzegovina	Yes, full-time	Secondary school (e.g. American high school, G...	None	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	19.0	Man	No	Straight / Heterosexual	None	No	Appropriate in length	Neither easy nor difficult
3	I am not primarily a developer, but I write co...	Yes	Never	The quality of OSS and closed source software ...	Employed full-time	Thailand	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Web development or web design	Taught yourself a new language, framework, or ...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	28.0	Man	No	Straight / Heterosexual	None	Yes	Appropriate in length	Neither easy nor difficult
4	I am a developer by profession	No	Never	The quality of OSS and closed source software ...	Employed full-time	United States	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	22.0	Man	No	Straight / Heterosexual	White or of European descent	No	Appropriate in length	Easy
5	I am a developer by profession	Yes	Once a month or more often	OSS is, on average, of HIGHER quality than pro...	Employed full-time	Ukraine	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	30.0	Man	No	Straight / Heterosexual	White or of European descent;Multiracial	No	Appropriate in length	Easy
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
96	I used to be a developer by profession, but no...	No	Never	The quality of OSS and closed source software ...	Employed full-time	Canada	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech articles written by other developers;Indu...	43.0	Man	No	Straight / Heterosexual	South Asian	Yes	Too long	Easy
97	I am not primarily a developer, but I write co...	Yes	Never	The quality of OSS and closed source software ...	Employed full-time	United States	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	A social science (ex. anthropology, psychology...	Taken an online course in programming or softw...	...	Somewhat more welcome now than last year	Tech articles written by other developers;Indu...	25.0	Non-binary, genderqueer, or gender non-conforming	Yes	Bisexual	White or of European descent	No	Appropriate in length	Easy
98	I code primarily as a hobby	Yes	Less than once a month but more than once per ...	OSS is, on average, of LOWER quality than prop...	Independent contractor, freelancer, or self-em...	Bosnia and Herzegovina	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Another engineering discipline (ex. civil, ele...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	30.0	Man	No	Straight / Heterosexual	White or of European descent	No	Appropriate in length	Easy
99	I am a student who is learning to code	Yes	Less than once per year	The quality of OSS and closed source software ...	Employed part-time	India	Yes, full-time	Secondary school (e.g. American high school, G...	None	Taken an online course in programming or softw...	...	Somewhat more welcome now than last year	Industry news about technologies you're intere...	21.0	Man	No	Straight / Heterosexual	South Asian	No	Appropriate in length	Neither easy nor difficult
100	I am a developer by profession	Yes	Once a month or more often	The quality of OSS and closed source software ...	Employed full-time	Canada	No	Some college/university study without earning ...	Computer science, computer engineering, or sof...	Taught yourself a new language, framework, or ...	...	Just as welcome now as I felt last year	None	39.0	Woman	No	Bisexual	White or of European descent	No	Appropriate in length	Easy

100 rows × 84 columns

# 5 利用Python操作查询导入想要的数据
# pd.read_sql_query （query 代表查询）,arg1 为查询的条件（SQL语法），代表从sample_table中以age>30 为条件，* 代表查询所有的数据。

# 验证sql_df_test['Age'].min() 为31

# 作者：Mingjie_MEL
# 链接：https://www.jianshu.com/p/78763cab9f40
# 来源：简书
# 著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

datas_query = pd.read_sql_query("select  * from sample_table where age=30",engine, index_col="Respondent")
datas_query

MainBranch	Hobbyist	OpenSourcer	OpenSource	Employment	Country	Student	EdLevel	UndergradMajor	EduOther	...	WelcomeChange	SONewContent	Age	Gender	Trans	Sexuality	Ethnicity	Dependents	SurveyLength	SurveyEase
Respondent
5	I am a developer by profession	Yes	Once a month or more often	OSS is, on average, of HIGHER quality than pro...	Employed full-time	Ukraine	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Computer science, computer engineering, or sof...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	30.0	Man	No	Straight / Heterosexual	White or of European descent;Multiracial	No	Appropriate in length	Easy
42	I am a developer by profession	Yes	Less than once a month but more than once per ...	OSS is, on average, of HIGHER quality than pro...	Employed full-time	Bangladesh	No	Secondary school (e.g. American high school, G...	None	Taken a part-time in-person course in programm...	...	Somewhat more welcome now than last year	Tech articles written by other developers;Tech...	30.0	Man	No	Straight / Heterosexual	Middle Eastern	None	Appropriate in length	Easy
95	I am a developer by profession	No	Less than once per year	OSS is, on average, of HIGHER quality than pro...	Employed full-time	United States	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Mathematics or statistics	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	None	30.0	Man	No	Straight / Heterosexual	White or of European descent	Yes	Too long	Easy
98	I code primarily as a hobby	Yes	Less than once a month but more than once per ...	OSS is, on average, of LOWER quality than prop...	Independent contractor, freelancer, or self-em...	Bosnia and Herzegovina	No	Bachelor’s degree (BA, BS, B.Eng., etc.)	Another engineering discipline (ex. civil, ele...	Taken an online course in programming or softw...	...	Just as welcome now as I felt last year	Tech meetups or events in your area;Courses on...	30.0	Man	No	Straight / Heterosexual	White or of European descent	No	Appropriate in length	Easy

4 rows × 84 columns

datas_query["Age"].max()==datas_query["Age"].min()##因为这里查询的都是为30的，所以相等为True

6 Pandas语句查询与Mysql语句查询对比

假设想查询模拟数据中，参与调查的中国程序员的平均年龄。

1利用pandas语句

df.groupby("Country")["Age"].mean()["China"] #按照国家进行分组，然后计算  中国china组内的所有人的 年龄的平均值
#输出 29.5

df.groupby("Country")["Age"].mean()["United States"]
#输出 33.04545454545455

2下面利用mysql语句（生成DataFrame格式，方便在Pandas中做进一步操作）

china_mean  = pd.read_sql_query("Select  avg(Age),Country from sample_table group by Country having country = 'China'",engine)
china_mean ##DataFrame格式

avg(Age)Country029.5China

all_mean  = pd.read_sql_query("Select  avg(Age),Country from sample_table group by Country ",engine)
print(type(all_mean))
print(type(china_mean))
all_mean

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

	avg(Age)	Country
0	22.000000	Antigua and Barbuda
1	31.500000	Argentina
2	26.500000	Australia
3	35.000000	Austria
4	25.000000	Azerbaijan
5	30.000000	Bangladesh
6	24.500000	Bosnia and Herzegovina
7	33.500000	Brazil
8	29.666667	Canada
9	29.500000	China
10	22.000000	Colombia
11	34.500000	Czech Republic
12	23.000000	Denmark
13	31.000000	France
14	31.600000	Germany
15	23.700000	India
16	24.000000	Ireland
17	23.000000	Israel
18	42.000000	Italy
19	38.000000	Lithuania
20	NaN	Malaysia
21	19.000000	Netherlands
22	29.333333	New Zealand
23	26.000000	Pakistan
24	21.000000	Philippines
25	25.000000	Poland
26	20.000000	Russian Federation
27	34.000000	Serbia
28	22.000000	South Africa
29	27.000000	Spain
30	21.000000	Sri Lanka
31	37.000000	Sweden
32	23.000000	Switzerland
33	28.000000	Thailand
34	NaN	Turkey
35	36.000000	Ukraine
36	24.600000	United Kingdom
37	33.045455	United States

使用mysql语句（简单的结果输出）

china_mean_new = engine.execute("select avg(Age),country from sample_table group by Country having Country='China'").fetchall()
china_mean_new

[(29.5, 'China')]

print(type(china_mean_new))

<class 'list'>

posted @ 2020-11-11 10:34 guofen3399 阅读(1009) 评论(0) 收藏举报

刷新页面返回顶部

guofen3399

运用Pandas（Python3以上）连接MySQL进行读写，查询

任务：

准备：

操作阶段

6 Pandas语句查询与Mysql语句查询对比

2下面利用mysql语句（生成DataFrame格式，方便在Pandas中做进一步操作）

使用mysql语句（简单的结果输出）

公告