Python recipe(10):逐段处理文件

代码何在?

div css xhtml xml Example Source Code Example Source Code [http://www.cnblogs.com/tomsheep/]
'''
Created on 2010-5-22

@author: lk
'''
class Paragraphs:
    def __init__(self, fileobj, seperator = '\n'):
        self.seq = fileobj.readlines()
        self.line_num = 0
        self.para_num = 0
        
        if seperator[-1:]!='\n':seperator += '\n'
        self.seperator = seperator
    
    def __getitem__(self, index):
        if index != self.para_num:
            raise TypeError, 'only sequential access supported'
        #get the first line of current paragraph
        self.para_num += 1
        while 1:
            line = self.seq[self.line_num]
            self.line_num += 1
            if line != self.seperator:
                break
        result = [line]
        
        #get the rest
        while 1:
#            line = self.seq[self.line_num]
#tag1:
            try:
                line = self.seq[self.line_num]
            except IndexError:
                break
            self.line_num += 1
            if line == self.seperator:
                break
            result.append(line)
        return ''.join(result)

          
if __name__ == '__main__':
    text = Paragraphs(open("test.txt"))
    for para in text:
        print para
        

        

以上代码改写自Python Cookbook 4-9

概述:

    逐段处理文件。自定义Paragraphs类,实现容器行为函数__getitem__

代码说明:

1.__getitem__函数可以使自定义类型具有容器行为,x[key]访问

2.在编写代码时,一开始在tag1处没有使用try…except,但是奇怪的是代码运行时并没有抛出IndexError,而是少打出一个para,想了一阵,觉得应该是在__getitem__外层捕捉了IndexError,自己尝试了一下,在__getitem__中手动raise IndexError,果然没有抛出(而ValueError就会抛出)可见__getitem__机制调用外层捕获了IndexError

posted on 2010-05-22 22:48  tomsheep  阅读(324)  评论(0编辑  收藏  举报

导航