python 统计全唐诗最多出现的地名

import jieba
import re
import matplotlib.pyplot as plt
#_*_ coding:utf-8 _*_
txt=open("tangshi.txt","r",encoding="utf-8").read()
words=jieba.lcut(txt)

place=open("places.txt","r",encoding="utf-8").read() #读出地名

places=re.split(r'[\n ]',place) # 去除文本中的 空格和换行

counts={}
for word in words:
    if word in places:
        counts[word]=counts.get(word,0)+1

items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)  # 排序
len=items.__len__()
for i in range(len):
    word,count=items[i]
    print("{0:<5}{1:>5}".format(word, count))

目前还不知道应该怎么统计全唐诗的地名就自己网上找了找，最后还是整的词频统计
附上自己整理的地名

place.txt

建业 建康 金陵 石头城
蓟城 燕都 燕京 涿郡 幽州 南京 中都 大都 京师 顺天府 北平
西安
姑苏 吴 吴都 吴中 东吴 吴门 平江 长洲
维扬 江都 广陵
临安
荥阳 管州 登封
歙县 徽州
庐州 庐阳 合淝
松江府
南海郡 百越 羊城
豫章 洪都
榕城 三山 东越 左海
会稽

#https://m.yxlady.com/jingyan/246238.shtml

发表于 2021-04-25 16:57 Zycc++ 阅读(499) 评论(0) 收藏举报

Loading

place.txt

公告