运用Pandas(Python3以上)连接MySQL进行读写,查询
Pandas是Python处理数据分析包。当需要处理大量数据时能比EXCEL提供更强大的运行效率。可以与读入,写出如:CSV,CST,Excel,MySQL等数据存储方式。
任务:
本次将要介绍的是如何利用Pandas以及一些辅助包来:
1. 将Python中的DataFrame数据写入MySQL
2. 将MySQL中的数据导入Pyhon,并且利用MySQL语句进行操作
准备:
利用到的模拟数据来自Stackoverflow 中的Stack Overflow Annual Developer Survey 2020 (CSV File)(https://insights.stackoverflow.com/survey)关于程序员的调查数据
在操作之前确保已经安装相应的库:Pandas,SQLAlchemy,pymysql
用到的工具:Anaconda,JupyterNotebook(IDLE 或 Pycharm也可,但Jupyter的数据界面比较清晰),MySQL Workbench(8.0版本)。
方法如下(假设已经安装Anaconda):搜索Anaconda Prompt
pip install PyMySQL
pip install Pandas
pip install SQLAlchemy
作者:Mingjie_MEL
链接:https://www.jianshu.com/p/78763cab9f40
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
操作阶段
1. 导入数据到Jupyter
将下载的CSV导入Jupyter,操作如下:https://insights.stackoverflow.com/survey#数据集下载地址
import pandas as pd df = pd.read_csv(r"C:\Users\Administrator\11月\developer_survey_2019\survey_results_public.csv",nrows=100)
df.head()
| Respondent | MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | ... | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | I am a student who is learning to code | Yes | Never | The quality of OSS and closed source software ... | Not employed, and not looking for work | United Kingdom | No | Primary/elementary school | NaN | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 14.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
| 1 | 2 | I am a student who is learning to code | No | Less than once per year | The quality of OSS and closed source software ... | Not employed, but looking for work | Bosnia and Herzegovina | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 19.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
| 2 | 3 | I am not primarily a developer, but I write co... | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | Thailand | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Web development or web design | ... | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 28.0 | Man | No | Straight / Heterosexual | NaN | Yes | Appropriate in length | Neither easy nor difficult |
| 3 | 4 | I am a developer by profession | No | Never | The quality of OSS and closed source software ... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 22.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
| 4 | 5 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Employed full-time | Ukraine | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 30.0 | Man | No | Straight / Heterosexual | White or of European descent;Multiracial | No | Appropriate in length | Easy |
5 rows × 85 columns
2. 提前设置好Mysql
在Mysql中建立新的数据库(Schema)命名为sample_db
import pymysql from sqlalchemy import create_engine 3. 创立连接 首先import sqlalchemy 和 pymysql库 create_engine 为一个方法(method),执行建立连接的操作 engine变量储存所创立的连接,以便重复利用 df.to_sql() 为写入sql的一个方法(method),其中arg1 为数据库中数据表的名称(此时新建的),engine 代表前期利用create_engine创建的连接。if_exists 检查是否之前已经存在此数据表,‘append’代表如果存在附加,‘replace’代表如果存在,覆盖原来数据。 操作完毕,登录数据库,此时在数据库(sample_db)中将能看到新创建的数据表(sample_table)
engine = create_engine("mysql+pymysql://root:root@localhost/test_python_mysql")
df.to_sql("sample_table",engine,if_exists="append", index=False, index_label=False)#sample_table就是在该数据库下的新建的表 #执行完毕此行命令,自动创建该table
#4 接着读取数据库中的数据 #还是利用第三步创立的连接engine,在Jupyter中接着输入 datas = pd.read_sql('sample_table',engine,index_col = 'Respondent')
datas
# 5 利用Python操作查询导入想要的数据 # pd.read_sql_query (query 代表查询),arg1 为查询的条件(SQL语法),代表从sample_table中以age>30 为条件,* 代表查询所有的数据。 # 验证sql_df_test['Age'].min() 为31 # 作者:Mingjie_MEL # 链接:https://www.jianshu.com/p/78763cab9f40 # 来源:简书 # 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。 datas_query = pd.read_sql_query("select * from sample_table where age=30",engine, index_col="Respondent") datas_query
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | ... | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Respondent | |||||||||||||||||||||
| 5 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Employed full-time | Ukraine | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taken an online course in programming or softw... | ... | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 30.0 | Man | No | Straight / Heterosexual | White or of European descent;Multiracial | No | Appropriate in length | Easy |
| 42 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of HIGHER quality than pro... | Employed full-time | Bangladesh | No | Secondary school (e.g. American high school, G... | None | Taken a part-time in-person course in programm... | ... | Somewhat more welcome now than last year | Tech articles written by other developers;Tech... | 30.0 | Man | No | Straight / Heterosexual | Middle Eastern | None | Appropriate in length | Easy |
| 95 | I am a developer by profession | No | Less than once per year | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Mathematics or statistics | Taken an online course in programming or softw... | ... | Just as welcome now as I felt last year | None | 30.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Too long | Easy |
| 98 | I code primarily as a hobby | Yes | Less than once a month but more than once per ... | OSS is, on average, of LOWER quality than prop... | Independent contractor, freelancer, or self-em... | Bosnia and Herzegovina | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Another engineering discipline (ex. civil, ele... | Taken an online course in programming or softw... | ... | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 30.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
4 rows × 84 columns
datas_query["Age"].max()==datas_query["Age"].min()##因为这里查询的都是为30的,所以相等为True
6 Pandas语句查询与Mysql语句查询对比
假设想查询模拟数据中,参与调查的中国程序员的平均年龄。
1利用pandas语句
df.groupby("Country")["Age"].mean()["China"] #按照国家进行分组,然后计算 中国china组内的所有人的 年龄的平均值 #输出 29.5
df.groupby("Country")["Age"].mean()["United States"]
#输出 33.04545454545455
2下面利用mysql语句(生成DataFrame格式,方便在Pandas中做进一步操作)
china_mean = pd.read_sql_query("Select avg(Age),Country from sample_table group by Country having country = 'China'",engine)
china_mean ##DataFrame格式
avg(Age)Country029.5China
all_mean = pd.read_sql_query("Select avg(Age),Country from sample_table group by Country ",engine) print(type(all_mean)) print(type(china_mean)) all_mean
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'>
avg(Age) Country 0 22.000000 Antigua and Barbuda 1 31.500000 Argentina 2 26.500000 Australia 3 35.000000 Austria 4 25.000000 Azerbaijan 5 30.000000 Bangladesh 6 24.500000 Bosnia and Herzegovina 7 33.500000 Brazil 8 29.666667 Canada 9 29.500000 China 10 22.000000 Colombia 11 34.500000 Czech Republic 12 23.000000 Denmark 13 31.000000 France 14 31.600000 Germany 15 23.700000 India 16 24.000000 Ireland 17 23.000000 Israel 18 42.000000 Italy 19 38.000000 Lithuania 20 NaN Malaysia 21 19.000000 Netherlands 22 29.333333 New Zealand 23 26.000000 Pakistan 24 21.000000 Philippines 25 25.000000 Poland 26 20.000000 Russian Federation 27 34.000000 Serbia 28 22.000000 South Africa 29 27.000000 Spain 30 21.000000 Sri Lanka 31 37.000000 Sweden 32 23.000000 Switzerland 33 28.000000 Thailand 34 NaN Turkey 35 36.000000 Ukraine 36 24.600000 United Kingdom 37 33.045455 United States
使用mysql语句(简单的结果输出)
china_mean_new = engine.execute("select avg(Age),country from sample_table group by Country having Country='China'").fetchall()
china_mean_new
[(29.5, 'China')]
print(type(china_mean_new))
<class 'list'>

浙公网安备 33010602011771号