第四周作业

#英文词频
1、

strSuddenly = '''suddenlywe make our pacts wearing the pendant we dont borrow boyfriends and we do our hair anyway we would like. we figured out that we are attractive and we look around and now we loved to live the single life and then we tell ourselves we'll never fall in love again but then he comes around and suddenly we understand that we've never been living in love before and suddenly you know what all the love songs that they write are all about and suddenly you dont care if its right or swrong as long as he's around and suddenly the things that used to sound clishe are perfectly right in your eyes perfectly right with this sky i know its shrewd but we are connected and in some strange and crazy way i think ? that we have always been and now he's here and he says he loves me and it feels so right and i could feel so good that i cant sleep at night but i just told myself i will not fall in love again but he just came around and then he made me understand that i have never been living in love before and suddenly you know what all the love songs that they write are all about and suddenly you dont care if its right or wrong as long as he's around and suddenly the things that used to sound clishe are perfectly right in my eyes perfectly right when he's here and yes i know you might get impatient but look around he might be walking right in front of you and if he touches you and you feel your skin is burning kisses you and you feel your stomach turning he's the one He is the one and suddenly you know what all the love songs that they write are all about and suddenly you dont care if its right or wrong as long as your baby's around and suddenly the things that used to sound clishe are perfectly right in your ears perfectly right when he's there perfectly right when he's there Perfectly right with this sky there .''' fo = open ('suddenly.txt','r',encoding='utf-8') #open,读文本文件 sudd = fo.read().lower() #预处理之大写改小写 fo.close() #关闭文档 print(sudd) #打印输出 #字符串预处理 sep = '''. , : ? ; ! ~ ` _ -''' #去除标点符号 for ch in sep: strSuddenly = strSuddenly.replace(ch,' ') print(strSuddenly) p2 = str.lower(strSuddenly) #大写转小写 print(p2)

 运行结果:

 

2、

2.1

strList = strSuddenly.split(' ')  #分解提取单词
print(len(strList),strList)

 运行结果:

2.2

strSet = set(strList)     #单词计数字典
print(len(strSet),strSet)

strDict = {}
for word in strSet:
    strDict[word] = strList.count(word)
    print(len(strDict),strDict)

 运行结果:

2.3

wcList = list(strDict.items())   #词频排序
print(wcList)
wcList.sort(key=lambda x:x[1],reverse=True)
print(wcList)

 运行结果:

2.4

for i in range(20):      #输出TOP20
    print(wcList[i])

 运行结果:

2.5

strSet = set(strSet)     #排除语法型词汇,代词、冠词、连词等无语义词
exclude = {'a','the','and','you','oh'}
strSet = strSet-exclude
print(len(strSet),strSet)

 运行结果:

#中文词频

3、

import  jieba
strgu = '''《百年孤独》,是哥伦比亚作家加西亚·马尔克斯的代表作,也是拉丁美洲魔幻现实主义文学的代表作。被誉为“再现拉丁美洲历史社会图景的鸿篇巨著”。
  全书近30万字,内容庞杂,人物众多,情节曲折离奇,再加上神话故事、宗教典故、民间传说以及作家独创的从未来的角度来回忆过去的新颖倒叙手法等等,令人眼花缭乱。但阅毕全书,读者可以领悟, 作家是要通过布恩迪亚家族7代人充满神秘色彩的坎坷经历来反映哥伦比亚乃至拉丁美洲的历史演变和社会现实,要求读者思考造成马孔多百年孤独的原因,从而去寻找摆脱命运捉弄的正确途径。
无论走到哪里,都应该记住,过去都是假的,回忆是一条没有尽头的路,一切以往的春天都不复存在,就连那最坚韧而又狂乱的爱情归根结底也不过是一种转瞬即逝的现实。
即使以为自己的感情已经干涸得无法给予,也总会有一个时刻一样东西能拨动心灵深处的弦;我们毕竟不是生来就享受孤独的。
生命中曾经有过的所有灿烂,原来终究,都需要用寂寞来偿还。
一个幸福晚年的秘决不是别的,而是与孤寂签订一个体面的协定。
我们打了这么多年仗,一切只不过是为了别把我们的房子涂成蓝色。
What matters in life is not what happens to you but what you remember and how you remember it.
生命中真正重要的不是你遭遇了什么,而是你记住了哪些事,又是如何铭记的。
只是觉得人的内心苦楚无法言说,人的很多举措无可奈何,百年一参透,百年一孤寂。'''
du = open ('将相和.txt','r',encoding='utf-8')  #open,读文本文件
du.close()       #关闭文档
print(du)      #打印输出

#字符串预处理
sep = ''' ?  。 , ;“ ‘ ” ! ~ ` _ -'''  #去除标点符号
for ch in sep:
    strSuddenly = strgu.replace(ch,' ')
    print(strgu)
p4 = str.lower(strgu)   #大写转小写
print(p4)

strList = strgu.split(' ')  #分解提取单词
print(len(strList),strList)

strSet = set(strList)     #单词计数字典
print(len(strSet),strSet)

strDict = {}
for word in strSet:
    strDict[word] = strList.count(word)
   # print(len(strDict),strDict)

wcList = list(strDict.items())   #词频排序
#print(wcList)
wcList.sort(key=lambda x:x[1],reverse=True)
#print(wcList)

for i in range(5):      #输出TOP5
    print(wcList[i])

 运行结果:

posted @ 2018-09-29 01:02  MISTanglijuan  阅读(170)  评论(0编辑  收藏  举报