综合练习:英文词频统计

  1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10
    str='''Earthquake early warning detection is more effective for minor quakes than major ones.
    This is according to a new study from the United States Geological Survey.
    Seismologists modelled ground shaking along California's San Andreas Fault, where an earthquake of magnitude 6.5 or more is expected within 30 years.
    They found that warning time could be increased for residents if they were willing to tolerate a number of "false alarms" for smaller events.
    This would mean issuing alerts early in an earthquake's lifespan, before its full magnitude is determined. Those living far from the epicentre would occasionally receive warnings for ground shaking they could not feel.
    "We can get [greater] warning times for weak ground motion levels, but we can't get long warning times for strong shaking," Sarah Minson, lead author of the study, told BBC News.
    "Alternatively, we could warn you every time there was an earthquake that might produce weak ground shaking at your location... A lot of baby earthquakes don't grow up to become big earthquakes," she added.
    The physics of earthquakes is one of the reasons why a single, universal warning system hasn't been rolled out across all quake prone countries.
    California and Japan have populations living directly alongside fault lines, and cannot waste precious seconds before warning their citizens.
    In both countries, the p-waves and some very rapid algorithms determine the potential magnitude and dispatch an alert.
    But in Mexico, the capital city is about 300km from the nearest tectonic plate boundary.
    This allows geologists to use a system that can take some more time to issue a warning. They wait to detect the s-waves.
    Sirens blare in the streets of Mexico City whenever ground shaking above M5 is detected.'''
    
    # 将分隔符替换为空格
    symbol=[".",",","'",'"']
    for i in range(len(symbol)):
        str=str.replace(symbol[i]," ")
    
    # 将所有大写转换为小写
    str=str.lower()
    
    # 生成单词列表
    str=str.split()
    
    d=dict(zip())
    #生成词频统计
    for key in str:
        d[key]=str.count(key)
    
    #排除语法型词汇,代词、冠词、连词
    str1=['a','an','more','for','is','of','to','from','or','that','if','the','were','in','s','not','can','get','could','might','up','and','this','t']
    for i in str1:
        del d[i]
    
    # 排序
    d=sorted(d.items(),key=lambda e:e[1],reverse=True)
    
    # 输出词频最大TOP10
    for i in range(10):
        print(d[i])

     

 

posted @ 2018-03-26 17:54  191钟菲菲  阅读(180)  评论(0编辑  收藏  举报