Python Json分别存入Mysql、MongoDB数据库,使用Xlwings库转成Excel表格

前情提要:将 [第5天] Python 爬虫基础 - 小能日记 (cnblogs.com) 的电影数据 data.json 数据通过xlwings库转换成excel表格,存入mysql,mongodb数据库中。

学习网站:w3school  92python  runoob

数据下载:2020_3/data.json

xlwings文档: xlwings中文文档

MongoDB资料:2020_3/MongoDB 实战.pdf

总共用时:2小时 (代码在最后面)

学习内容:python基础语法、xlwings库、mysql库、pymongo库、mongoDB数据库复习


Excel表格

MYSQL

MongoDB


踩过的坑

1、‘gbk’ codec can’t decode byte 0xae

open(path+'/5_json/data.json',encoding = "utf-8")

2、python 打开文件,保存文件时相对路径报错

import os

# ^ 获取当前py脚本文件夹路径
path = os.path.dirname(__file__)

# path + '/test.txt'  合并为绝对路径

3、python中with...as的用法

晚点更


4、python中list与string的转换

(6条消息) python中list与string的转换_bufengzj的博客-CSDN博客_python str转list

晚点更


5、python mysql插入null数据

None if i['release'] == "" else i['release'],

将想要存储为Null的值填为None


6、python 集合里不能放列表,可以放元组


我的代码

# pip install xlwings -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install pymongo -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install mysql-connector -i https://pypi.tuna.tsinghua.edu.cn/simple
import json
import xlwings as xw
import os
import mysql.connector
import pymongo
import sys
import requests

# ^ 获取当前py脚本文件夹路径
path = os.path.dirname(__file__)

# ^ 阿里云OSS读取数据集文件
content = requests.get("https://xiaonenglife.oss-cn-hangzhou.aliyuncs.com/static/cnblogs/2020_3/films_data.json")

if content.status_code != 200:
    print("文件加载不成功")
    sys.exit()  # ^ 退出程序

f = open(path+'/data.json', 'w', encoding="utf-8")
f.write(content.text)
f.close()

# ^ 加载json文件
# WARN 'gbk' codec can't decode byte 0xae in position
file = open(path+'/data.json', 'r', encoding="utf-8")
# ^ json转换为字典数组
films = json.loads(file.read())

print(file)
filmArr = []
for i in films:
    filmArr.append([  # ^ 不用 list() 因为给定了八个参数,list只要一个
        i['chineseName'],
        i['foreignName'],
        '、'.join(list(str(s) for s in i['tags'])),
        i['address'],
        i['time'],
        i['release'],
        i['desc'],
        i['score']
    ])

# ^ excel
wb = xw.Book()
sht = wb.sheets['sheet1']
sht.range('A1').value = ['中文名', '外语名', '类别', '出版地', '时长', '发行日期', '介绍', '评分']
sht.range('A2').value = filmArr
wb.save(path+'/优秀电影.xlsx')

# ^ mysql
sqlDB = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="sql2008",
    auth_plugin='mysql_native_password' # ^ 验证方式必须要有
)

mycursor = sqlDB.cursor()

mycursor.execute("drop DATABASE if exists test")
mycursor.execute("CREATE DATABASE test")

sqlDB = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="sql2008",
    database="test",
    auth_plugin='mysql_native_password'
)

mycursor = sqlDB.cursor()
mycursor.execute("drop table if exists films")
mycursor.execute("""
                 CREATE TABLE films(
                 id INT AUTO_INCREMENT PRIMARY KEY,
                 chinesename VARCHAR(255),
                 foreignName VARCHAR(255),
                 tags json,
                 address VARCHAR(255),
                 time decimal(5,1),
                 releasetime date,
                 description mediumtext,
                 score decimal(3,1)
                 )
                 """)

sql = "INSERT INTO films (chinesename, foreignName,tags,address,time,releasetime,description,score) VALUES (%s, %s,%s,%s,%s,%s,%s,%s)"
filmArr = [] # ^ 里面放元组
for i in films:
    filmArr.append((
        i['chineseName'],
        i['foreignName'],
        json.dumps(i['tags']),
        i['address'],
        i['time'],
        None if i['release'] == "" else i['release'],
        i['desc'],
        i['score']
    ))

mycursor.executemany(sql, filmArr)  # ^ 执行多条

sqlDB.commit()  # ^ 修改数据库的时候必须加

print(mycursor.rowcount, "条数据被插入")

# ^ mongoDB

client = pymongo.MongoClient('mongodb://localhost:27017')

mongoDB = client['test']  # ^ 数据库

col = mongoDB['films']  # ^ 集合

col.drop() # ^ 删除集合

filmArr = []
for i in films:
    filmArr.append(dict(
        chineseName=i['chineseName'],
        foreignName=i['foreignName'],
        tags=i['tags'],
        address=i['address'],
        time=float(i['time']),
        release=i['release'],
        desc=i['desc'],
        score=float(i['score'])
    ))

x = col.insert_many(filmArr)
# print(x.inserted_ids)

x = col.find_one()
print(x)

for x in col.find({}, {'chineseName': 1, 'time': 1, 'score': 1}).sort('score'):
    print(x)

print('----------------------')

for x in col.find({}, {'chineseName': 1, 'time': 1, 'score': 1}).limit(5).sort('score', -1):
    print(x)

print('----------------------')

# ^ 查找评分大于等于9.0且时长小于90分钟的电影
query = {'score': {"$gte": 9.0}, 'time': {"$lt": 90}}
for x in col.find(query, {'chineseName': 1, 'time': 1, 'score': 1}).sort('score'):
    print(x)

点赞是一种积极的生活态度,喵喵喵!(疯狂暗示)

posted @ 2022-03-14 17:14  小能日记  阅读(267)  评论(0编辑  收藏  举报