第三次作业——组合数据类型,中英文词频统计

总结

元祖、列表、元祖、集合、字典的区别
元祖:一旦初始化就不能更改,与列表非常相似
tuple = ('1','hh','j3j')
列表:处理一组有序项目的数据结构,即你可以在一个列表中存储一个序列的项目,可以修改
list = ['45','gdg','dd']
字典:键(名字)和值(详细情况)联系在一起。注意,键必须是唯一的,就像如果有两个人恰巧同名的话,你无法找到正确信息。不允许有重复的values
d3 = ['david','jack','mike']
score = [90,34,78]
d1 = dict(zip(d3,score))
执行结果:['david':90,'gdg':34,'dd':78]
集合:字典的键,也是一个无序不重复的元素集,用一对{}表示
set = ('1','3d','4d')

遍历

1、列表
a = list('This is list')
for i in a:
print(i)

2、字典
b = {'A':'33','B':'45','C':'34'}
print(b)
for i in b:
print(i,b[i])

3、集合
c = set('This is a jihe')
print(c)
for i in c:
print(i)

4、元祖
d = tuple('1234556')
print(d)
for i in d:
print(d)

英文词频

-下载一首英文的歌词或文章str
-分隔出一个一个的单词 list
-统计每个单词出现的次数 dict
str ='''Lost in your mind
I wanna know
Am I losing my mind
Never let me go
If this night is not forever
At least we are together
I know I'm not alone
I know I'm not alone
Anywhere whenever
Apart but still together
I know I'm not alone
I know I'm not alone
I know I'm not alone
I know I'm not alone
Unconscious mind
I'm wide awake
Wanna feel one last time
Take my pain away
If this night is not forever
At least we are together
I know I'm not alone
I know I'm not alone
Anywhere whenever
Apart but still together
I know I'm not alone
I know I'm not alone
I know I'm not alone
I know I'm not alone
I'm not alone
I'm not alone
I'm not alone
I know I'm not alone
I'm not alone
I'm not alone
I'm not alone
I know I'm not alone'''

将大写转换为小写

str = str.lower()
print(str.upper())

将小写转换为大写

print(str.lower())

去标点

sep = ''',.:!?'"'''
for c in sep:
str = str.replace(c,' ')
str.split()
print(str)

统计alone和know出现次数

print(str.count('alone'),str.count('know'))


中文词频

导入文件

!pip install jieda

!python -m pip install --upgrade jieba

import jieba
with open('女上司.txt','r',encoding='GBK') as f:
b = f.read()
print(b)
f.close()

去标点

sep = ',。?!;:“”`‘-——<_/>'
for en in sep:
b = b.replace(en, '')
b
print(b)

预处理

预处理

bList = list(jieba.cut(b))
bdict={}
for word in bList:
if len(word)==1:
continue
else:
bdict[word]=bdict.get(word,0)+1

lambda

wordlist = list(bdict.items())
wordlist.sort(key=lambda x:x[1],reverse=True)
b

统计词频

for a in range(10):
print(wordlist[a])
c = wordlist[a]
print(c)

出现次数

items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(30):
word , count = items[i]
print("{:<10}{:>5}".format(word,count))

posted on 2018-10-08 14:02  16信管黄梓航  阅读(212)  评论(0)    收藏  举报

导航