软工作业4：词频统计 ——基本功能

一：基本信息

1、编译环境：python3.7

2、成员：1613072028陈志华

1613072029徐东

3、项目地址：

https://gitee.com/ntucs/PairProg/tree/SE028_029

4、作业地址：

https://edu.cnblogs.com/campus/ntu/Embedded_Application/homework/2088

二：项目分析

1.1、读文件到缓冲区

def process_file(dst, f):  # 读文件到缓冲区
    try:  # 打开文件
        doc = open(dst, 'r')
    except IOError as s:
        print(s)
        return None
    try:  # 读文件到缓冲区
        bvffer = doc.read()
    except:
        print("Read File Error!")
        return None
    doc.close()
    return bvffer

1.2、统计行数

        count = 0
        for i in bvffer:
            if i =='\n':
                count += 1

1.3、统计单词

        choice = int(input())
        if choice != 1 and choice != 2 and choice != 3:
            print("输入错误,请再输一遍：")
            choice = int(input())
        if choice == 1: #一个单词
            last_words = remain_words
        elif choice == 2: #两个单词
            for i in range(len(remain_words) - 1):
                phrase = "%s %s" % (remain_words[i - 1], remain_words[i])
                last_words.append(phrase)
        else: #三个单词
            for i in range(len(remain_words) - 1):
                phrase = "%s %s %s" % (remain_words[i - 2], remain_words[i - 1], remain_words[i])
                last_words.append(phrase)

1.4、输出单词

def output_result(word_freq):
    if word_freq:
        sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)
        for item in sorted_word_freq[:10]:  # 输出 Top 10 的单词
            print("<%s>:%d " % (item[0], item[1]))
            f = open("result.txt", 'a')
            print("<%s>:%d " % (item[0], item[1]), file=f)
            f.close()

1.5、主函数

def main():
    dst = "Gone_with_the_wind.txt"
    bvffer = process_file(dst)
    word_freq = process_buffer(bvffer)
    output_result(word_freq)

1.6、性能测试

if __name__ == "__main__":

 
    import cProfile
    import pstats
    #直接把分析结果打印到控制台
    cProfile.run("main()", filename="result_out")
    # 创建Stats对象
    p = pstats.Stats('result_out')
    # 输出调用此处排前十的函数
    # sort_stats(): 排序
    # print_stats(): 打印分析结果，指定打印前几行
    p.sort_stats('calls').print_stats(10)
    # 输出按照运行时间排名前十的函数
    # strip_dirs(): 去掉无关的路径信息
    p.strip_dirs().sort_stats("cumulative", "name").print_stats(10)

    # 根据上面的运行结果发现函数process_buffer()最耗时间
    # 查看process_buffer()函数中调用了哪些函数
    p.print_callees("process_buffer")

2.1、时空复杂度

3.1程序运行截图

1.单词组