ch4-持久存储

1、处理数据和打印

man = []
other = []

try:
    data = open('sketch.txt')

    for each_line in data:
        try:
            (role, line_spoken) = each_line.split(':', 1)
            line_spoken = line_spoken.strip() #从字符串中删除空白符（包括‘\n'、'\r'、'\t'、' '）,将去除空白符后的字符串再赋回自身
            if role == 'Man':
                man.append(line_spoken)
            elif role == 'Other Man':
                other.append(line_spoken)
        except ValueError:
            pass

    data.close()
except IOError:
    print('The datafile is missing!')

print(man)
print(other)

View Code

s.strip(rm)：删除s字符串开头、结尾处，位于删除序列rm的字符，当rm为空时，默认删除空白符（包括‘\n'、'\r'、'\t'、' '）。

2、以写模式打开文件

使用open() BIF打开磁盘文件时，可以指定使用什么访问模式，默认open() BIF使用模式r表示读，无需专门指定r模式。

r模式：只读

w模式：写和读（清除） ----python会打开指定的文件来完成写，如果该文件已经存在，则会清空它现有的内容。

w+模式：写和读（不清除） ----python会打开指定的文件来完成写，如果该文件已经存在，也不清空它现有的内容。

a模式：追加 ---实现内容追加到一个文件中

说明：如果想打开一个不存在的文件完成写，首先会创建该文件，然后打开完成写。

　　out=open('data.out','w') 　　　　　　#out：数据文件对象，data.out：所写文件的文件名

　　print('today is 2016-4-20.',file=out)#要把数据“today is 2016-4-20.”写至一个文件，需要使用file参数指定写入的文件对象，即file=所写数据文件对象的名称

　　out.close() #非常重要！！写完后一定要关闭文件，确保所有数据都写至磁盘，这称为刷新输出（flushing）

man = []
other = []

try:
    data = open('sketch.txt')

    for each_line in data:
        try:
            (role, line_spoken) = each_line.split(':', 1)
            line_spoken = line_spoken.strip()
            if role == 'Man':
                man.append(line_spoken)
            elif role == 'Other Man':
                other.append(line_spoken)
        except ValueError:
            pass

    data.close()
except IOError:
    print('The datafile is missing!')

try:
    man_file = open('man_data.txt', 'w')
    other_file = open('other_data.txt', 'w')

    print(man, file=man_file)
    print(other, file=other_file)

    man_file.close()
    other_file.close()
except IOError:
    print('File error.')

View Code

按F5运行：

>>> 
===== RESTART: D:\workspace\headfirstpython\chapter4\page112\page112.py =====
>>>

View Code

程序运行结果显示，文件夹中创建了两个新文件，打开后内容分别如下，分别包含man和other两个列表中的数据。

3、用finally扩展try

如果写文件的过程中，文件关闭前需要处理IOError，所写数据可能会被破坏，而且只有发生该情况时才能知道，否则根本无法了解这一点。

例如：

print(man, file=man_file) #OK
print(other, file=other_file) #有问题，出现一个IOError

数据可能被破坏，如何处理？----希望不论发生了什么都要确保关闭文件

　　将文件关闭代码放到finally组中，无论什么情况，finally组中的代码都会执行，可以确保文件妥善的关闭（即使出现写错误）

try:
     man_file = open('man_data.txt', 'w')
     other_file = open('other_data.txt', 'w')

     print(man, file=man_file)
     print(other, file=other_file)

except IOError:
     print('File error.')

finally:   #将文件关闭代码放到finally组中，无论什么情况，finally组中的代码都会执行
     man_file.close()
     other_file.close()

View Code

4、如何发现错误的特定信息

　　1).试图打开一个不存在的文件

try:
    data=open('missing.txt')
    print(data.readlline(),end=' ')
except IOError:
        print ('File error')
finally:
    data.close()

View Code

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py =
File error
Traceback (most recent call last):
  File "D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py", line 7, in <module>
    data.close()
NameError: name 'data' is not defined
>>>

View Code

文件missing.txt不存在时，数据文件对象data并未创建，所以不可能在数据对象上调用close()方法，所以会出现NameError错误。

一种简便的修正方法：

在finally组中增加一个简单的测试，调用close()之前先检查data文件名是否存在。

locals() BIF会返回当前作用域中定义的所有名的一个集合。

try:
    data=open('missing.txt')
    print(data.readlline(),end=' ')
except IOError:
        print ('File error')
finally:
    if 'data' in locals():
        data.close()

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py =
File error
>>>

结果显示：没有其他异常了，只会显示你的错误信息；其实，出现其他错误时，异常处理代码捕捉到这个错误，显示“File error”消息，最后关闭所有打开的文件。

但是，还是不清楚导致这个错误的原因。

　　2）.打印特定的错误信息

产生一个异常并由except组处理时，Python解释器将该异常对象传入这个except组。我们只需为异常对象给定一个名称，然后就可以在代码中使用该异常（作为一个标识符）。

"""
try:
    data=open('missing.txt')
    print(data.readlline(),end=' ')
except IOError as err:
        print ('File error: '+str(err))
finally:
    if 'data' in locals():
        data.close()  
"""

try:
    with open('its.txt','w') as data:   #使用with就不再需要finally组了
        print('it is ...',file=data)
except IOError as err:
        print ('File error: '+err)

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py =
Traceback (most recent call last):
  File "D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py", line 2, in <module>
    data=open('missing.txt')
FileNotFoundError: [Errno 2] No such file or directory: 'missing.txt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py", line 5, in <module>
    print ('File error: '+err)
TypeError: Can't convert 'FileNotFoundError' object to str implicitly
>>>

新增了一个异常：TypeError: Can't convert 'FileNotFoundError' object to str implicitly

异常对象和字符串类型不兼容，所以把异常对象与字符串相连接会带来问题。可以使用str() BIF把异常对象转换（或强制转换）为字符串。

try:
    data=open('missing.txt')
    print(data.readlline(),end=' ')
except IOError as err:
        print ('File error: '+str(err))
finally:
    if 'data' in locals():
        data.close()

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py =
File error: [Errno 2] No such file or directory: 'missing.txt'
>>>

虽然问题解决了，但是代码不断的膨胀，而且额外的逻辑会混淆你的代码的真正意图。

5、用with处理文件

　　　　对文件使用with语句时，可以大大的减少try/exception/finally模式需要编写的代码量，因为有了with语句就不再需要包含一个finally组来处理文件的关闭，它可以妥善的关闭一个可能打开的数据文件，所以就无需操心文件关闭的问题了，因为Python解释器会自动为你考虑这一点。

with语句使用了一种名为上下文管理协议的Python技术。

"""
try:
    data=open('missing.txt')
    print(data.readlline(),end=' ')
except IOError as err:
        print ('File error: '+str(err))
finally:
    if 'data' in locals():
        data.close()  
"""

try:
    with open('its.txt','w') as data:   #使用with就不再需要finally组了
        print('it is ...',file=data)
except IOError as err:
        print ('File error: '+str(err))

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page118-119\page118-119.py =
>>>

重写之前的try/except/finally代码：

man = []
other = []

try:
    data = open('sketch.txt')

    for each_line in data:
        try:
            (role, line_spoken) = each_line.split(':')
            line_spoken = line_spoken.strip()
            if role == 'Man':
                man.append(line_spoken)
            elif role == 'Other Man':
                other.append(line_spoken)
            else:
                pass
        except ValueError:
            pass

    data.close()
except IOError:
    print('The datafile is missing!')

try:
    with open('man_data.txt', 'w') as man_file, open('other_data.txt', 'w') as other_file:
        print(man, file=man_file)
        print(other, file=other_file)
except IOError as err:
    print('File error: ' + str(err))

>>> 
===== RESTART: D:\workspace\headfirstpython\chapter4\page122\page122.py =====
>>>

检查文件夹，之前的两个数据文件再次生成成功，并且已经将两个列表的数据保存到文件中了。

至此，代码可以处理Python或者操作系统可能抛出的任何异常。

可以看到列表被保存时，被print（）转换为一个庞大的字符串，文件格式不合适！！

默认的，print（）会模仿Python解释器实际存储列表数据的格式来显示你的数据。所得到的输出不再做进一步处理。。。。其主要作用只是告诉你数据在内存中的样子。

如何采用一种更可用的格式保存数据呢？ -----修改第2章创建的print_lol()函数来实现

import sys
'''这是一个模块，可以打印列表，其中可能包含嵌套列表'''
def print_lol(the_list, indent=False, level=0, fh=sys.stdout):
    """这个函数取一个位置参数the_list，他可以是任何列表，该列表中的每个数据都会递归地打印到屏幕上，各数据项各占一行;
    level参数用来在遇到嵌套列表时插入制表符，实现缩进打印。
    indent参数用来控制实现缩进的代码，默认为false，即不嵌套打印"""

    for each_item in the_list:
        if isinstance(each_item, list):
            print_lol(each_item, indent, level+1, fh)
        else:
            if indent:
                for tab_stop in range(level):
                    print("\t", end='', file=fh)
            print(each_item, file=fh)

from page128 import print_lol  #从page128中导入print_lol模块

man = []
other = []

try:
    data = open('sketch.txt')

    for each_line in data:
        try:
            (role, line_spoken) = each_line.split(':')
            line_spoken = line_spoken.strip()
            if role == 'Man':
                man.append(line_spoken)
            elif role == 'Other Man':
                other.append(line_spoken)
            else:
                pass
        except ValueError:
            pass

    data.close()
except IOError:
    print('The datafile is missing!')

try:
    with open('man_data.txt', 'w') as man_file, open('other_data.txt', 'w') as other_file:
        print_lol(man, fh=man_file)   #注意调用形式，此处参数名必须是fh，不能是其他的名字，否则会报TypeError错误
        print_lol(other, fh=other_file)
except IOError as err:
    print('File error: ' + str(err))

>>> 
===== RESTART: D:\workspace\headfirstpython\chapter4\page128\page122.py =====
>>>

运行后数据以一种易读的格式保存列表数据：

现在读数据很容易了，但是这样编制的特定代码会特定于你针对这个问题创建的格式，

这种做法很脆弱：如果数据格式改变，你的定制代码将失效。

6、使用pickle来“腌制”数据

标准库pickle：可以保存和加载几乎任何Python数据对象，包括列表。

一旦把数据腌制到一个文件，他将持久存储，可以在以后读入到另一个程序中。

腌制数据：Python内存中的数据->pickle引擎->输出腌制后的数据

解除腌制：腌制后的数据->同一个pickle引擎->输出腌制数据的Python版本->在Python内存中重新创建数据，与原来的一样

用dump保存，用load恢复

pickle使用：只需导入所需的模块，然后使用dump()保存数据，以后使用load()恢复数据。

处理腌制数据时唯一的要求：必须以二进制访问模式“b”打开这些文件

import pickle   #导入pickle模块

with open('mydata.pickle','wb') as mysavedata:
    pickle.dump([1,2,'three'], mysavedata)  #使用dump()保存数据
    ...
with open('mydata.pickle', 'rb') as myrestoredata:
    a_list=pickle.load(myrestoredata)   #使用load()从文件恢复数据，并将恢复的数据赋至一个标识符

print(a_list)

import pickle   #导入pickle模块

man = []
other = []

try:
    data = open('sketch.txt')

    for each_line in data:
        try:
            (role, line_spoken) = each_line.split(':')
            line_spoken = line_spoken.strip()
            if role == 'Man':
                man.append(line_spoken)
            elif role == 'Other Man':
                other.append(line_spoken)
            else:
                pass
        except ValueError:
            pass

    data.close()
except IOError:
    print('The datafile is missing!')

try:
    with open('man_data.txt', 'wb') as man_file, open('other_data.txt', 'wb') as other_file:  #“b”使用二进制模式访问
        pickle.dump(man, file=man_file)    #将print_lol()替换为pickle.dump()，使用dump将列表数据保存到文件中
        pickle.dump(other, file=other_file)
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as perr:    #处理pickle模块的PickleError类型异常
    print('Pickling error: ' + str(perr))

>>> 
===== RESTART: D:\workspace\headfirstpython\chapter4\page134\page134.py =====
>>>

可以看出数据被进行了腌制。

pickle模块使用了一种定制的二进制格式（这称为他的协议），不用担心，他本来就这样！

import pickle
import page128
new_man=[]

try:
    with open('man_data.txt', 'rb') as man_file:
        new_man = pickle.load(man_file)  #数据被解腌制，并赋至new_man列表，后面的程序就可以直接使用new_man
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as perr:    #处理pickle模块的PickleError类型异常
    print('Pickling error: ' + str(perr))

print('打印解腌制数据new_man列表的内容')
page128.print_lol(new_man)
print('打印解腌制数据new_man列表的第一句')
print(new_man[0])  #第一句
print('打印解腌制数据new_man列表的最后一句')
print(new_man[-1])  #最后一句

>>> 
= RESTART: D:\workspace\headfirstpython\chapter4\page134\test-pickle-load.py =
打印解腌制数据new_man列表的内容
Is this the right room for an argument?
No you haven't!
When?
No you didn't!
You didn't!
You did not!
Ah! (taking out his wallet and paying) Just the five minutes.
You most certainly did not!
Oh no you didn't!
Oh no you didn't!
Oh look, this isn't an argument!
No it isn't!
It's just contradiction!
It IS!
You just contradicted me!
You DID!
You did just then!
(exasperated) Oh, this is futile!!
Yes it is!
打印解腌制数据new_man列表的第一句
Is this the right room for an argument?
打印解腌制数据new_man列表的最后一句
Yes it is!
>>>

Python负责你的文件I/O细节，这样就可以重点关注你的代码需要做什么。

使用pickle的通用文件I/O才是上策！！！

posted @ 2016-04-20 17:49 垄上行阅读(267) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

垄上行

ch4-持久存储

公告