Python&MySQL的简单使用(中国大学排名数据处理、csv导入数据库。。。)
目录:
-------------------------------------
一:使用mysql时,cmd笔记
二:如何使用python把csv文件内容导入mysql数据库
三:python作业:任务26,数据库实践
-------------------------------------
一.使用cmd弄mysql时的一些内容(部分)
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation。保留所有权利。
C:\Users\Administrator>D:\
'D:\' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\Administrator>D:/
'D:/' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\Administrator>
C:\Users\Administrator>D:
D:\>cd D:\BtSoft\mysql\MySQL5.5\bin>
The syntax of the command is incorrect.
D:\>cd D:\BtSoft\mysql\MySQL5.5\bin
D:\BtSoft\mysql\MySQL5.5\bin>mysql -hlocalhost -uroot -p
Enter password: ******
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 61
Server version: 5.5.62-log MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use mysql;
Database changed
mysql> INSERT INTO user
    -> (host, user,password,
    -> select_priv,inser_priv,update_priv)
    -> VALUES('localhost','test2',
    -> PASSWORD('123456'),'Y','Y','Y');
ERROR 1054 (42S22): Unknown column 'inser_priv' in 'field list'
mysql> INSERT INTO user
    -> (host, user,password,
    ->  select_priv,insert_priv,update_priv)
    -> VALUES('localhost','test2',
    -> PASSWORD('123456'),'Y','Y','Y');
ERROR 1062 (23000): Duplicate entry 'localhost-test2' for key 'PRIMARY'
mysql> INSERT INTO user
    -> (host, user,password,
    -> select_priv,insert_priv,update_priv)
    ->  VALUES('localhost','test3',
    -> PASSWORD('123456'),'Y','Y','Y');
Query OK, 1 row affected, 3 warnings (0.00 sec)
mysql> SELECT host, user, password FROM user WHERE user = 'test3';
+-----------+-------+-------------------------------------------+
| host      | user  | password                                  |
+-----------+-------+-------------------------------------------+
| localhost | test3 | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+-----------+-------+-------------------------------------------+
1 row in set (0.00 sec)
mysql> show grants
    -> show grants;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'show grants' at line 2
mysql> show grants for root@'localhost';
+----------------------------------------------------------------------------------------------------------------------------------------+
| Grants for root@localhost                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY PASSWORD '*6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9' WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> show grants for test2@'localhost';
+--------------------------------------------------------------------------------------------------------------+
| Grants for test2@localhost                                                                                   |
+--------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'test2'@'localhost' IDENTIFIED BY PASSWORD '*6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9' |
| GRANT ALL PRIVILEGES ON `test2`.* TO 'test2'@'localhost'                                                     |
+--------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql>
二.如何使用python把csv文件导入MySQL库

注:这种方法可在同个文件夹下放多个csv文件,这里只用一个文件来弄
csv文件来源:https://www.citibikenyc.com/system-data/

 
1.向数据库daiguoxi_test创建数据表
# -*- coding: utf-8 -*-
import pymysql
# 1.链接数据库
db = pymysql.connect(
      host='127.0.0.1',
      port=3306,
      user='test2',
      passwd='123456',
      db='daiguoxi_test',
      charset='utf8')
# 建立链接游标
cursor = db.cursor()
print ('>> 已连接数据表,处理中...')
# 2.添加数据库表头(创建的字段,不要使用空格)
sql = '''CREATE TABLE IF NOT EXISTS daiguoxi_test (
        `ID` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        `tripduration` CHAR(10),
        `starttime` CHAR(30),
        `stoptime` CHAR(30),
        `start_station_id` CHAR(10),
        `start_station_name` CHAR(60),
        `start_station_latitude` CHAR(20),
        `start_station_longitude` CHAR(20),
        `end_station_id` CHAR(10),
        `end_station_name` CHAR(60),
        `end_station_latitude` CHAR(20),
        `end_station_longitude` CHAR(20),
        `bikeid` CHAR(10),
        `usertype` CHAR(15),
        `birth_year` CHAR(10),
        `gender` CHAR(2)        
        )'''
cursor.execute(sql)
# 3.提交并关闭链接
cursor.close()
db.close()
print ('>> Done.') 
运行结果:
>> 已连接数据表,处理中... >> Done. [Finished in 2.2s]
2. 先读取指定目录的所有CSV文件,然后逐个读取并逐条写入MySQL
# -*- coding: utf-8 -*-
import pymysql,time
import glob,os
import pandas as pd
# 1.准备,指定目录
time_start = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) # 记录当前时间
print('>> 当前时间:',time_start)
print('>> 开始处理……')
filelocation = r"C:/Users/Administrator/Desktop/作业和实验/代码文件/mysql/"
# 2.链接数据库
print('>> 连接MySQL...')
db = pymysql.connect(
      host='127.0.0.1',
      port=3306,
      user='test2',
      passwd='123456',
      db='daiguoxi_test',
      charset='utf8')
# 建立链接游标
cursor = db.cursor()
print ('>> 已连接数据表。')
# 3.查看本地新文件名
filenames=[]
os.chdir(filelocation) #指定目录
for i in glob.glob("*.csv"): # 获取指定目标下所有的CSV文件名
    filenames.append(i[:-4]) # 文件名不包含“.csv”
count = len(filenames)
print('>> 本地文件:',count,'个') # 如下是以“Num.**”为序号打印出每个文件名
for i in range(0,count): # 把0-9的数字用0补齐2位,也可以用zfill函数或者format格式化实现
    if i<9:
        ii = i+1
        ij = '0'+str(ii)
    else:
        ij = i+1
    print(' - Num.', end='')
    print(ij, filenames[i])
# 4.把新文件的数据提交mysql
print('>> 读取中...')
# MySQL语句
insert_sql = 'insert into daiguoxi_test (tripduration,starttime,stoptime,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bikeid,usertype,birth_year,gender) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
# 开始逐个文件处理
for file_name in filenames:
    print(" + 正在处理:", file_name,'(第',filenames.index(file_name)+1,'个)')
    time_now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())  # 记录处理每个文件的时间
    print(' - 当前时间:', time_now)
    data_csv = pd.read_csv(open(filelocation + file_name+'.csv')) # 使用Pandas读取数据文件
    # print(data_csv.head(3)) # 查看前3条数据
    # print(data_csv.info()) # 查看数据表信息
    # print(len(data_csv.index)) # 查看数据量
    # print(data_csv.loc[2].values) # 查看指定某一行的数据
    ii = 0 # 用于统计每个文件的数据量
    for i in range(0,data_csv.shape[0]): # 逐行读取
        row = data_csv.loc[i].values # 获取第i行数据
        # print(i,'>>:',data_csv.loc[i].values) # 打印第i行数据
        cursor.execute(insert_sql, (str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]), str(row[5]), str(row[6]), str(row[7]),
        str(row[8]), str(row[9]), str(row[10]), str(row[11]), str(row[12]), str(row[13]), str(row[14])))
        ii = i + 1
    print(' - 提交数量:',ii,'条')
# 5.结束
db.commit() # 提交记录
db.close() # 关闭db
cursor.close() # 关闭游标
time_finish = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) # 记录当前时间
print('>> 当前时间:',time_finish)
print('\n',end='')
print('>> Done.') #完毕
运行结果:
>> 当前时间: 2020-05-26 11:51:53 >> 开始处理…… >> 连接MySQL... >> 已连接数据表。 >> 本地文件: 1 个 - Num.01 201501-citibike-tripdata >> 读取中... + 正在处理: 201501-citibike-tripdata (第 1 个) - 当前时间: 2020-05-26 11:51:53 - 提交数量: 285552 条 >> 当前时间: 2020-05-26 11:56:12 >> Done. [Finished in 259.7s]
3.在SQLyog上查看MySQL中的daiguoxi_test数据库中的数据

三.Python作业&数据库实践
1.把中国大学排名的csv表存入MySQL数据库
向数据库daiguoxi_40创建数据表
# -*- coding: utf-8 -*-
import pymysql
 
# 1.链接数据库
db = pymysql.connect(
      host='127.0.0.1',
      port=3306,
      user='test4',
      passwd='123456',
      db='daiguoxi_40',
      charset='utf8')
# 建立链接游标
cursor = db.cursor()
print ('>> 已连接数据表,处理中...')
 
# 2.添加数据库表头(创建的字段,不要使用空格)
sql = '''CREATE TABLE IF NOT EXISTS daiguoxi_40 (
        `ID` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        `排名` CHAR(100),
        `学校名称` CHAR(100),
        `省份` CHAR(100),
        `总分` CHAR(100),
        `生源质量` CHAR(100),
        `培养结果` CHAR(100),
        `科研规模` CHAR(100),
        `科研质量` CHAR(100),
        `顶尖成果` CHAR(100),
        `顶尖人才` CHAR(100),
        `科技服务` CHAR(100),
        `产学研究合作` CHAR(100),
        `成果转化` CHAR(100),
        `学生国际化` CHAR(100)      
        )'''
cursor.execute(sql)
 
# 3.提交并关闭链接
cursor.close()
db.close()
print ('>> Done.')
先读取指定目录的所有CSV文件,然后逐个读取并逐条写入MySQL
# -*- coding: utf-8 -*-
import pymysql,time
import glob,os
import pandas as pd
 
# 1.准备,指定目录
time_start = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) # 记录当前时间
print('>> 当前时间:',time_start)
print('>> 开始处理……')
filelocation = r"C:/Users/Administrator/Desktop/作业和实验/代码文件/mysql2/"
 
# 2.链接数据库
print('>> 连接MySQL...')
db = pymysql.connect(
      host='127.0.0.1',
      port=3306,
      user='test4',
      passwd='123456',
      db='daiguoxi_40',
      charset='utf8')
# 建立链接游标
cursor = db.cursor()
print ('>> 已连接数据表。')
 
# 3.查看本地新文件名
filenames=[]
os.chdir(filelocation) #指定目录
for i in glob.glob("*.csv"): # 获取指定目标下所有的CSV文件名
    filenames.append(i[:-4]) # 文件名不包含“.csv”
count = len(filenames)
print('>> 本地文件:',count,'个') # 如下是以“Num.**”为序号打印出每个文件名
for i in range(0,count): # 把0-9的数字用0补齐2位,也可以用zfill函数或者format格式化实现
    if i<9:
        ii = i+1
        ij = '0'+str(ii)
    else:
        ij = i+1
    print(' - Num.', end='')
    print(ij, filenames[i])
 
# 4.把新文件的数据提交mysql
print('>> 读取中...')
# MySQL语句
insert_sql = 'insert into daiguoxi_40 (排名,学校名称,省份,总分,生源质量,培养结果,科研规模,科研质量,顶尖成果,顶尖人才,科技服务,产学研究合作,成果转化,学生国际化) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
# 开始逐个文件处理
for file_name in filenames:
    print(" + 正在处理:", file_name,'(第',filenames.index(file_name)+1,'个)')
    time_now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())  # 记录处理每个文件的时间
    print(' - 当前时间:', time_now)
    data_csv = pd.read_csv(open(filelocation + file_name+'.csv')) # 使用Pandas读取数据文件
    # print(data_csv.head(3)) # 查看前3条数据
    # print(data_csv.info()) # 查看数据表信息
    # print(len(data_csv.index)) # 查看数据量
    # print(data_csv.loc[2].values) # 查看指定某一行的数据
    ii = 0 # 用于统计每个文件的数据量
    for i in range(0,data_csv.shape[0]): # 逐行读取
        row = data_csv.loc[i].values # 获取第i行数据
        # print(i,'>>:',data_csv.loc[i].values) # 打印第i行数据
        cursor.execute(insert_sql, (str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]), str(row[5]), str(row[6]), str(row[7]),
        str(row[8]), str(row[9]), str(row[10]), str(row[11]), str(row[12]), str(row[13])))
        ii = i + 1
    print(' - 提交数量:',ii,'条')
 
# 5.结束
db.commit() # 提交记录
db.close() # 关闭db
cursor.close() # 关闭游标
time_finish = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) # 记录当前时间
print('>> 当前时间:',time_finish)
print('\n',end='')
print('>> Done.') #完毕
运行结果
>> 当前时间: 2020-05-27 14:32:31 >> 开始处理…… >> 连接MySQL... >> 已连接数据表。 >> 本地文件: 1 个 - Num.01 dgx40 >> 读取中... + 正在处理: dgx40 (第 1 个) - 当前时间: 2020-05-27 14:32:31 - 提交数量: 500 条 >> 当前时间: 2020-05-27 14:32:32 >> Done. [Finished in 1.3s]
在SQLyog上查看MySQL中的daiguoxi_40库的导入结果

2.查询广东技术师范大学的排名和得分
import csv
import os
import requests
import pandas
from bs4 import BeautifulSoup
allUniv = []
def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = 'utf-8'
        return r.text
    except:
        return ""
def fillUnivList(soup):
    data = soup.find_all('tr')
    for tr in data:
        ltd = tr.find_all('td')
        if len(ltd)==0:
            continue
        singleUniv = []
        for td in ltd:
            singleUniv.append(td.string)
        allUniv.append(singleUniv)
def findUnivData(num):
    a="广东技术师范大学"
    print("{0:^10}\t{1:{3}^10}\t{2:^10}".format("排名","学校名称","得分",chr(12288)))
    for i in range(num):
        u=allUniv[i]
        if a in u:
            print("{0:^10}\t{1:{3}^10}\t{2:^10}".format(u[0],u[1],eval(u[4]),chr(12288)))
def main():
    url = 'http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html'
    html = getHTMLText(url)
    soup = BeautifulSoup(html, "html.parser")
    fillUnivList(soup)
    num=len(allUniv)
    findUnivData(num)
main()
运行结果
排名 学校名称 得分 287 广东技术师范大学 37.7 [Finished in 2.1s]
3.查询广东省大学的排名和得分
import csv
import os
import requests
import pandas
from bs4 import BeautifulSoup
allUniv = []
def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = 'utf-8'
        return r.text
    except:
        return ""
def fillUnivList(soup):
    data = soup.find_all('tr')
    for tr in data:
        ltd = tr.find_all('td')
        if len(ltd)==0:
            continue
        singleUniv = []
        for td in ltd:
            singleUniv.append(td.string)
        allUniv.append(singleUniv)
def findUnivData(num):
    a="广东"
    print("{0:^10}\t{1:{3}^10}\t{2:^10}".format("排名","学校名称","得分",chr(12288)))
    for i in range(num):
        u=allUniv[i]
        if a in u:
            print("{0:^10}\t{1:{3}^10}\t{2:^10}".format(u[0],u[1],eval(u[4]),chr(12288)))
def main():
    url = 'http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html'
    html = getHTMLText(url)
    soup = BeautifulSoup(html, "html.parser")
    fillUnivList(soup)
    num=len(allUniv)
    findUnivData(num)
main()
运行结果
    排名    	   学校名称   	    得分    
    9     	   中山大学   	   79.7   
    20    	  华南理工大学  	   73.4   
    35    	  南方科技大学  	   77.0   
    51    	   暨南大学   	   68.1   
    58    	   深圳大学   	   59.3   
    85    	  华南师范大学  	   60.4   
    95    	   汕头大学   	   58.4   
   108    	 广东外语外贸大学 	   58.2   
   111    	  广东工业大学  	   49.9   
   122    	  华南农业大学  	   51.3   
   139    	   广州大学   	   49.1   
   250    	  东莞理工学院  	   37.2   
   284    	 佛山科学技术学院 	   34.9   
   287    	 广东技术师范大学 	   37.7   
   305    	  广东金融学院  	   40.3   
   324    	 广东石油化工学院 	   23.0   
   336    	 广东第二师范学院 	   35.5   
   336    	   五邑大学   	   31.2   
   348    	  广东海洋大学  	   34.9   
   372    	   韶关学院   	   25.8   
   391    	  广东财经大学  	   44.0   
   391    	  岭南师范学院  	   28.7   
   401    	 仲恺农业工程学院 	   32.2   
   419    	   惠州学院   	   31.7   
   422    	   肇庆学院   	   29.0   
   473    	  韩山师范学院  	   24.6   
   485    	   嘉应学院   	   25.0   
[Finished in 10.2s]



 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号