1030-实体关系整理与部分数据补充

实体关系整理

实体关系

 

 待解决问题

 

 部分数据补充

诗人头像链接

考虑到后期可视化展示,有诗人的头像会更加生动一些,当初收集诗人数据时未进行爬取

import requests
from bs4 import BeautifulSoup
from lxml import etree
import re

headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}#创建头部信息
pom_list=[]
k=1
for i in range(1,2010):
    url='https://www.xungushici.com/authors/p-'+str(i)
    r=requests.get(url,headers=headers)
    content=r.content.decode('utf-8')
    soup = BeautifulSoup(content, 'html.parser')

    hed=soup.find('div',class_='col col-sm-12 col-lg-9')
    list=hed.find_all('div',class_="card mt-3")

    origin_url='https://www.xungushici.com'

    for it in list:
        print("" + str(k) + "")
        content = {}
        # 1.1获取单页所有诗集
        title = it.find('h4', class_='card-title')
        poemauthor=title.find_all('a')[1].text
        #print(poemauthor)
        if it.find('a',class_='ml-2 d-none d-md-block')!=None:
            src=it.find('a',class_='ml-2 d-none d-md-block').img['src']
        else:
            src="http://www.huihua8.com/uploads/allimg/20190802kkk01/1531722472-EPucovIBNQ.jpg"
        print(src+poemauthor)
        content['author']=poemauthor
        content['src']=src
        pom_list.append(content)
        k=k+1


import xlwt

xl = xlwt.Workbook()
# 调用对象的add_sheet方法
sheet1 = xl.add_sheet('sheet1', cell_overwrite_ok=True)

sheet1.write(0,0,"author")
sheet1.write(0,1,'src')



for i in range(0,len(pom_list)):
    sheet1.write(i+1,0,pom_list[i]['author'])
    sheet1.write(i+1, 1, pom_list[i]['src'])

xl.save("src.xlsx")

生成数据

对于没有头像的诗人,网上找了一个头像来代替,可以发现还挺有规律~~~~

明天任务

1.完成诗人朋友爬取

2.根据诗人生平获取对应的轨迹地点

3.将上述的属性信息更新到图数据库中

 

posted @ 2021-10-30 20:17  清风紫雪  阅读(106)  评论(0编辑  收藏  举报