统计文本当中词语出现的频率

使用方法

[root@dy1 tmp]# ls
1.txt  count_word.sh

[root@dy1 tmp]# cat 1.txt 
Kali Linux is a Debian-based Linux distribution aimed at advanced Penetration Testing and Security Auditing. Kali contains several hundred tools which are geared towards various information security tasks, such as Penetration Testing, Security research, Computer Forensics and Reverse Engineering. Kali Linux is developed, funded and maintained by Offensive Security, a leading information security training company.

Kali Linux was released on the 13th March, 2013 as a complete, top-to-bottom rebuild of BackTrack Linux, adhering completely to Debian development standards.

More than 600 penetration testing tools included: After reviewing every tool that was included in BackTrack, we eliminated a great number of tools that either simply did not work or which duplicated other tools that provided the same or similar functionality. Details on what’s included are on the Kali Tools site.
Free (as in beer) and always will be: Kali Linux, like BackTrack, is completely free of charge and always will be. You will never, ever have to pay for Kali Linux.
Open source Git tree: We are committed to the open source development model and our development tree is available for all to see. All of the source code which goes into Kali Linux is available for anyone who wants to tweak or rebuild packages to suit their specific needs.
FHS compliant: Kali adheres to the Filesystem Hierarchy Standard, allowing Linux users to easily locate binaries, support files, libraries, etc.
Wide-ranging wireless device support: A regular sticking point with Linux distributions has been supported for wireless interfaces. We have built Kali Linux to support as many wireless devices as we possibly can, allowing it to run properly on a wide variety of hardware and making it compatible with numerous USB and other wireless devices.
Custom kernel, patched for injection: As penetration testers, the development team often needs to do wireless assessments, so our kernel has the latest injection patches included.
Developed in a secure environment: The Kali Linux team is made up of a small group of individuals who are the only ones trusted to commit packages and interact with the repositories, all of which is done using multiple secure protocols.
GPG signed packages and repositories: Every package in Kali Linux is signed by each individual developer who built and committed it, and the repositories subsequently sign the packages as well.
Multi-language support: Although penetration tools tend to be written in English, we have ensured that Kali includes true multilingual support, allowing more users to operate in their native language and locate the tools they need for the job.
Completely customizable: We thoroughly understand that not everyone will agree with our design decisions, so we have made it as easy as possible for our more adventurous users to customize Kali Linux to their liking, all the way down to the kernel.
ARMEL and ARMHF support: Since ARM-based single-board systems like the Raspberry Pi and BeagleBone Black, among others, are becoming more and more prevalent and inexpensive, we knew that Kali’s ARM support would need to be as robust as we could manage, with fully working installations for both ARMEL and ARMHF systems. Kali Linux is available on a wide range of ARM devices and has ARM repositories integrated with the mainline distribution so tools for ARM are updated in conjunction with the rest of the distribution.
Kali Linux is specifically tailored to the needs of penetration testing professionals, and therefore all documentation on this site assumes prior knowledge of, and familiarity with, the Linux operating system in general. Please see Should I Use Kali Linux? for more details on what makes Kali unique.


[root@dy1 tmp]# /bin/bash count_word.sh 1.txt 
这个单词with,出现的次数是:8次
这个单词as,出现的次数是:9次
这个单词for,出现的次数是:10次
这个单词is,出现的次数是:10次
这个单词of,出现的次数是:12次
这个单词Kali,出现的次数是:18次
这个单词Linux,出现的次数是:18次
这个单词to,出现的次数是:19次
这个单词and,出现的次数是:21次
这个单词the,出现的次数是:22次

脚本

[root@dy1 tmp]# cat count_word.sh 
#!/bin/bash
#作者:张贺贺呀
#用途:统计文本当中词语出现的频率
#网站:www.zhanghehe.cn  电话:15564028007
#涉及知识点:while循环、关联数组、for循环

#1.判断用户是否输入了一个文件
if [ $# -ne 1 ];
then
	echo "Usage:$0 filename."
	exit -1
fi

#2.将文本当中的标点符号全部替换掉,只保留单词,并通过xargs将每个单词搞成单独一列,输出到2.txt文件当中。
cat $1 | tr -d . | tr -d , | tr -d : | tr -d ? | xargs -n 1 >./2.txt

#3.声明一个关联数组
declare -A info

#4.从2.txt当中读取一行,其实也就是读取一个单词,将单词作为关联数组info的索引
#每读取一个相同的单词,就对索引,也就是单词做自增,相同的单词每出现一次,索引的值都会加1,初始是0.
while read line
do
        type=$(echo $line)
        let info[$type]++
done<./2.txt

#5.通过for循环,将索引也就是单词赋值到i,然后打印索引的值,输出到6.txt
for i in ${!info[@]}
do
        echo "这个单词$i,出现的次数是:${info[$i]}次" >> ./6.txt
done

#最后对6.txt进行排序,最后删除临时文件
sort -nt: -k2 ./6.txt | tail -10
rm -rf ./6.txt
rm -rf ./2.txt
posted @ 2020-04-18 20:46  张贺贺呀  阅读(755)  评论(0编辑  收藏  举报