python处理hdf5文件

在深度学习任务中,如果将所有数据集都放入一个文件中再进行处理效率会更高。有多种数据模型和库可完成这种操作,例如hdf5。

hdf5 是一种存储相同类型数值的大数组的机制,适用于可被层次性组织且数据集需要被元数据标记的数据模型。

 

hdf5 files: 能够存储两类数据对象 dataset 和 group 的容器,其操作类似 python 标准的文件操作;File 实例对象本身就是一个组,以 / 为名,是遍历文件的入口
dataset(array-like): 可类比为 Numpy 数组,每个数据集都有一个名字(name)、形状(shape) 和类型(dtype),支持切片操作
group(folder-like): 可以类比为 字典,它是一种像文件夹一样的容器;group 中可以存放 dataset 或者其他的 group,键就是组成员的名称,值就是组成员对象本身(组或者数据集)

 

import h5py
import numpy as np

def main():
    #===========================================================================
    # Create a HDF5 file.
    f = h5py.File("h5py_example.hdf5", "w")    # mode = {'w', 'r', 'a'}

    # Create two groups under root '/'.
    g1 = f.create_group("bar1")
    g2 = f.create_group("bar2")

    # Create a dataset under root '/'.
    d = f.create_dataset("dset", data=np.arange(16).reshape([4, 4]))

    # Add two attributes to dataset 'dset'
    d.attrs["myAttr1"] = [100, 200]
    d.attrs["myAttr2"] = "Hello, world!"

    # Create a group and a dataset under group "bar1".
    c1 = g1.create_group("car1")
    d1 = g1.create_dataset("dset1", data=np.arange(10))

    # Create a group and a dataset under group "bar2".
    c2 = g2.create_group("car2")
    d2 = g2.create_dataset("dset2", data=np.arange(10))

    # Save and exit the file.
    f.close()

    ''' h5py_example.hdf5 file structure
    +-- '/'
    |   +--    group "bar1"
    |   |   +-- group "car1"
    |   |   |   +-- None
    |   |   |   
    |   |   +-- dataset "dset1"
    |   |
    |   +-- group "bar2"
    |   |   +-- group "car2"
    |   |   |   +-- None
    |   |   |
    |   |   +-- dataset "dset2"
    |   |   
    |   +-- dataset "dset"
    |   |   +-- attribute "myAttr1"
    |   |   +-- attribute "myAttr2"
    |   |   
    |   
    '''

    #===========================================================================
    # Read HDF5 file.
    f = h5py.File("h5py_example.hdf5", "r")    # mode = {'w', 'r', 'a'}

    # Print the keys of groups and datasets under '/'.
    print(f.filename, ":")
    print([key for key in f.keys()], "\n")  

    #===================================================
    # Read dataset 'dset' under '/'.
    d = f["dset"]

    # Print the data of 'dset'.
    print(d.name, ":")
    print(d[:])

    # Print the attributes of dataset 'dset'.
    for key in d.attrs.keys():
        print(key, ":", d.attrs[key])

    print()

    #===================================================
    # Read group 'bar1'.
    g = f["bar1"]

    # Print the keys of groups and datasets under group 'bar1'.
    print([key for key in g.keys()])

    # Three methods to print the data of 'dset1'.
    print(f["/bar1/dset1"][:])        # 1. absolute path

    print(f["bar1"]["dset1"][:])    # 2. relative path: file[][]

    print(g['dset1'][:])        # 3. relative path: group[]



    # Delete a database.
    # Notice: the mode should be 'a' when you read a file.
    '''
    del g["dset1"]
    '''

    # Save and exit the file
    f.close()

if __name__ == "__main__":
    #main()
View Code

 

end

 

posted @ 2022-04-04 20:15  一笑任逍遥  阅读(476)  评论(0编辑  收藏  举报