"Hard To Get"歌词分析
#导入歌词文件,把换行符替换成空格 sing="" with open ("D:\python_fx\HardToGet.txt","r") as f: for line in f.readlines(): sing += line.replace("\n"," ")
发现歌词中有一句中文
#先把所有英文字符变小,在根据asll编码把中文去掉,由上图发现歌词最后有一个空格 sing1 = sing.lower() sing2 = "".join(i for i in sing1 if ord(i) < 256) result = result.strip()
处理后歌词如下
#进行词频分析,进行降序排列 dic = {} for i in set(music): dic[i] = music.count(i) sorted(dic,key= lambda d:d[1],reverse=True)
发现歌词最多的五个单词为“you”,“i”,“to”,“play”,“get”,一共有288个英文词汇