蓄水池抽样法
1 #!/usr/bin/env python 2 # coding=utf-8 3 4 ''' 5 蓄水池抽样法。从不知道总数的样本中随机抽取k个,要求只遍历一次样本。 6 ''' 7 8 import sys 9 import random 10 11 if __name__ == '__main__': 12 '''命令行参数:输入文件,抽取多少行,输出文件。输出文件不指定时,默认为输出到控制台''' 13 if len(sys.argv) < 3: 14 print "input [infile] [lines] [outfile=console]" 15 infile = sys.argv[1] 16 k = int(sys.argv[2]) 17 if len(sys.argv) >= 4: 18 f_handler = open(sys.argv[3], 'w') 19 sys.stdout = f_handler 20 rect = [] 21 lineNo = 0 22 with open(infile, 'r') as f_in: 23 for i in range(k): 24 line = f_in.readline() 25 if line: 26 lineNo += 1 27 rect.append(line.strip('\n')) 28 else: 29 break 30 while True: 31 line = f_in.readline() 32 if not line: 33 break 34 lineNo += 1 35 rnd = random.randint(1, lineNo + 1) 36 if rnd < k: 37 rect[rnd] = line.strip('\n') 38 for ele in rect: 39 print ele
本文来自博客园,作者:张朝阳,转载请注明原文链接:https://www.cnblogs.com/zhangchaoyang/articles/4928038.html