20192307 2019-2020-2 《Python程序设计》实验四报告

课程：《Python程序设计》
班级： 1923班
姓名：常万里
学号： 20192307
实验教师：王志强
实验日期：2020年6月10日
必修/选修：公选课

1.实验内容

Python综合应用：爬虫、数据处理、可视化、机器学习、神经网络、游戏、网络安全等；
经过选择，我选择了编写爬虫程序，可视化展示和生成HTML文件，来进行程序编写。

2. 实验过程及结果

2.1 爬虫程序的确定与编写

本次数据通过爬虫技术爬取丁香园获得
首先导入需要运用到的库

"""
文件名：Python语言基础实践10
描  述：实验四
作  者：20192307
日  期：2020/06/01
"""
import json
import matplotlib.pyplot as plt
import requests
from matplotlib import ticker
from re import search, S
from json import loads, dump
from requests import get
import datetime
from pyecharts.charts import Map
from pyecharts import options as opts

先爬取丁香园实时统计数据，保存到data目录下，以当前日期作为文件名，存JSON文件

url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_other'
html = requests.get(url)
message = json.loads(html.text)
mes = json.loads(message['data'])
mes_dict = mes["dailyNewAddHistory"]
date = []
country = []
hubei = []
nothubei = []
n = 0
for d in mes_dict:
    date.append(d['date'])
    country.append(d['country'])
    hubei.append(d['hubei'])
    nothubei.append(d['notHubei'])
    n = n + 1
    if n > 40:
        break
x = date
y1 = country
y2 = hubei
y3 = nothubei
plt.figure(figsize=(20, 10))
plt.title(
    "Chart of the number of newly confirmed cases per day in February 2020")
plt.xlabel('Date')
plt.ylabel('Number of newly confirmed cases')
plt.bar(x, y2, facecolor='pink', edgecolor='white', label='Hubei')
plt.bar(x, y3, facecolor='#ff9999', edgecolor='white', label='notHubei')
plt.gca().xaxis.set_major_locator(ticker.MultipleLocator(10))
plt.annotate(r"$add\ clinically\ diagnosed\ cases$",
             xy=('02.12', 15153),
             xycoords='data',
             xytext=(+30, -100),
             textcoords='offset points',
             arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=.2"))
for x, y in zip(x, y1):
    plt.text(x, y + 1, y, ha='left')
plt.legend()
plt.show()

today = datetime.date.today().strftime('%Y%m%d')


def crawl_dxy_data():
    response = get('https://ncov.dxy.cn/ncovh5/view/pneumonia')
    # request.get()用于请求目标网站
    print(response.status_code)
    # 打印状态码
    try:
        url_text = response.content.decode()
        url_content = search(r'window.getAreaStat = (.*?)}]}catch', url_text,
                             S)
        texts = url_content.group()  # 获取匹配正则表达式的整体结果
        content = texts.replace('window.getAreaStat = ',
                                '').replace('}catch', '')  # 去除多余的字符
        json_data = loads(content)
        with open(today + '.json', 'w', encoding='UTF-8') as f:
             dump(json_data, f, ensure_ascii=False)
    except (Exception):
        print('<Response [%s]>' % response.status_code)


def crawl_statistics_data():

    with open(today + '.json', 'r', encoding='UTF-8') as file:
        json_array = loads(file.read())

    statistics_data = {}
    for province in json_array:
        response = get(province['statisticsData'])
        try:
            statistics_data[province['provinceShortName']] = loads(
                response.content.decode())['data']
        except (Exception):
            print('<Response [%s]> for url: [%s]' %
                  (response.status_code, province['statisticsData']))
    with open("statistics_data.json", "w", encoding='UTF-8') as f:
        dump(statistics_data, f, ensure_ascii=False)

然后是主程序

if __name__ == '__main__':
    crawl_dxy_data()
    crawl_statistics_data()
today = datetime.date.today().strftime('%Y%m%d')
datafile = today + '.json'
with open(datafile, 'r', encoding='UTF-8') as file:
    json_array = loads(file.read())
china_data = []
for province in json_array:
    china_data.append(
        (province['provinceShortName'], province['confirmedCount']))
china_data = sorted(china_data, key=lambda x: x[1], reverse=True)
# reverse=True,表示降序，反之升序
print(china_data)
pieces = [
    {
        'min': 10000,
        'color': '#540d0d'
    },
    {
        'max': 9999,
        'min': 1000,
        'color': '#9c1414'
    },
    {
        'max': 999,
        'min': 500,
        'color': '#d92727'
    },
    {
        'max': 499,
        'min': 100,
        'color': '#ed3232'
    },
    {
        'max': 99,
        'min': 10,
        'color': '#f27777'
    },
    {
        'max': 9,
        'min': 1,
        'color': '#f7adad'
    },
    {
        'max': 0,
        'color': '#f7e4e4'
    },
]
labels = [data[0] for data in china_data]
counts = [data[1] for data in china_data]

Echarts 是一个由百度开源的数据可视化工具，凭借着良好的交互性，精巧的图表设计，得到了众多开发者的认可。而 Python 是一门富有表达力的语言，很适合用于数据处理。

m = Map()
m.add("累计确诊", [list(z) for z in zip(labels, counts)], 'china')
# 系列配置项,可配置图元样式、文字样式、标签样式、点线样式等
m.set_series_opts(label_opts=opts.LabelOpts(font_size=12), is_show=False)
# 全局配置项,可配置标题、动画、坐标轴、图例等
m.set_global_opts(
    title_opts=opts.TitleOpts(title='全国实时确诊数据', subtitle='数据来源：丁香园'),
    legend_opts=opts.LegendOpts(is_show=False),
    visualmap_opts=opts.VisualMapOpts(
        pieces=pieces,
        is_piecewise=True,  # 是否为分段型
        is_show=True))  # 是否显示视觉映射配置
 # render（）会生成本地 HTML 文件，默认会在当前目录生成 render.html 文件，也可以传入路径参数，如 m.render("mycharts.html")
 m.render(path='全国实时确诊数据.html')

2.2 运行结果的展示

运行结果截图为：

生成网站的展示：

提交代码，推送到码云仓库。
码云仓库地址为"https://gitee.com/python_programming/chang_wanli"

3. 实验过程中遇到的问题和解决过程

问题1：在第一次下载requests、pyecharts库的时候一直失败；
问题1解决方案：利用清华的镜像网站进行下载，下载速度会有极大提升，下载成功率也会有极大提高。
问题2：格式化代码时遇到问题；
问题2解决方案：在《Python编程：从入门到实践》一书中找到了相关课程，重新学习后，对于格式化代码以及数组，元组，序列，集合进行了重新编写。
问题3：正则表达式运用不成功；
问题3解决方案：在菜鸟教程上找到了正确的正则表达式运用格式。熟悉正则表达式格式后重新编写代码。
问题4：进行网站爬虫的时候不成功；
问题4解决方案：在菜鸟教程上找到了正确的网站爬虫运用格式。熟悉爬虫格式后重新编写代码。

4. 感悟与思考

在这次实验过程中，我遇到了许多问题，其中既有知识上的漏洞，也有不细心导致的马虎，这一切都补充，完善，丰富，扩展了我的python语言知识体系。在不断修复问题的过程中，我使用了很多方式去查询资料，例如：《Python编程：从入门到实践》，《[笨办法]学Python3（第三版）》，博客园平台，CDSN平台，码云平台，知乎app,等。在这个爬虫程序的编写中，我还进一步熟悉了vscode这个IDE平台的使用与运行方式，提高了自己自主学习的能力，为我接下来学习python语言程序设计打下了坚实的基础，并在不断探索的过程中逐步提升了自己。

参考资料

《Python编程：从入门到实践》
《[笨办法]学Python3（第三版）》
《Python基础教程（第3版）》
《Python核心编程（第3版）》

posted @ 2020-06-10 23:22 20192307常万里阅读(253) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

20192307常万里

20192307 2019-2020-2 《Python程序设计》实验四报告

20192307 2019-2020-2 《Python程序设计》实验四报告

1.实验内容

2. 实验过程及结果

2.1 爬虫程序的确定与编写

2.2 运行结果的展示

3. 实验过程中遇到的问题和解决过程

4. 感悟与思考

参考资料

公告