解析pascal文件格式并提取其关键信息

问题描述

有一文件夹存放了若干个pascal格式的文件,现要将其依次读出并提取关键信息.

pascal文件内容如下:

# PASCAL Annotation Version 1.00

Image filename : "Train/pos/crop001001.png"
Image size (X x Y x C) : 818 x 976 x 3
Database : "The INRIA Rhône-Alpes Annotated Person Database"
Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" }

# Note that there might be other objects in the image
# for which ground truth data has not been provided.

# Top left pixel co-ordinates : (0, 0)

# Details for object 1 ("PASperson")
# Center point -- not available in other PASCAL databases -- refers
# to person head center
Original label for object 1 "PASperson" : "UprightPerson"
Center point on object 1 "PASperson" (X, Y) : (396, 185)
Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705)

# Details for object 2 ("PASperson")
# Center point -- not available in other PASCAL databases -- refers
# to person head center
Original label for object 2 "PASperson" : "UprightPerson"
Center point on object 2 "PASperson" (X, Y) : (119, 385)
Bounding box for object 2 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (31, 326) - (209, 712)

# Details for object 3 ("PASperson")
# Center point -- not available in other PASCAL databases -- refers
# to person head center
Original label for object 3 "PASperson" : "UprightPerson"
Center point on object 3 "PASperson" (X, Y) : (219, 235)
Bounding box for object 3 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (148, 179) - (290, 641)

要提取关键信息列表如下:

1. Image filename : "Train/pos/crop001001.png"
  提取图片文件名:crop001001.png
2. Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" }
  提取方框个数:3
3. Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705)
  提取方框左上角和右下角的坐标:261 109 511 705
将关键信息输出,这个样例的输出格式为
crop001001.png 3 261 109 511 705 31 326 209 712 148 179 290 641

测试代码

import os
import re

def main():
    pascal_path = '/home/maxin/Desktop/pascal_list/'
    pascal_list = os.listdir(pascal_path)
    print(len(pascal_list))

    for pascal_file in pascal_list:
        f = open(pascal_path + pascal_file, encoding='gbk')
        line_list = f.readlines()

        str_line = ''
        for line in line_list:
            if str(line).__contains__('Image filename'):
                str_line = line.strip().split('/')[2][0:-1]     # remove the end of "
                break

        for line in line_list:
            if str(line).__contains__('Objects with ground truth'):
                nums = re.findall(r'\d+', str(line))
                str_line = str_line + ' ' + str(nums[0])
                # print(str_line)
                break

        for index in range(1, int(nums[0]) + 1):
            for line in line_list:
                if str(line).__contains__("Bounding box for object " + str(index)):
                    coordinate = re.findall(r'\d+', str(line))
                    str_line = str_line + ' ' + coordinate[1] + ' ' + coordinate[2] + ' ' + coordinate[3] + ' ' + coordinate[4]
        f.close()

        print(str_line)


if __name__ == "__main__":
    main()

 

posted @ 2019-06-26 14:18  新生代黑马  阅读(668)  评论(0编辑  收藏  举报