解析pascal文件格式并提取其关键信息
问题描述
有一文件夹存放了若干个pascal格式的文件,现要将其依次读出并提取关键信息.
pascal文件内容如下:
# PASCAL Annotation Version 1.00 Image filename : "Train/pos/crop001001.png" Image size (X x Y x C) : 818 x 976 x 3 Database : "The INRIA Rhône-Alpes Annotated Person Database" Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" } # Note that there might be other objects in the image # for which ground truth data has not been provided. # Top left pixel co-ordinates : (0, 0) # Details for object 1 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 1 "PASperson" : "UprightPerson" Center point on object 1 "PASperson" (X, Y) : (396, 185) Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705) # Details for object 2 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 2 "PASperson" : "UprightPerson" Center point on object 2 "PASperson" (X, Y) : (119, 385) Bounding box for object 2 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (31, 326) - (209, 712) # Details for object 3 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 3 "PASperson" : "UprightPerson" Center point on object 3 "PASperson" (X, Y) : (219, 235) Bounding box for object 3 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (148, 179) - (290, 641)
要提取关键信息列表如下:
1. Image filename : "Train/pos/crop001001.png"
提取图片文件名:crop001001.png
2. Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" }
提取方框个数:3
3. Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705)
提取方框左上角和右下角的坐标:261 109 511 705
将关键信息输出,这个样例的输出格式为
crop001001.png 3 261 109 511 705 31 326 209 712 148 179 290 641
测试代码
import os import re def main(): pascal_path = '/home/maxin/Desktop/pascal_list/' pascal_list = os.listdir(pascal_path) print(len(pascal_list)) for pascal_file in pascal_list: f = open(pascal_path + pascal_file, encoding='gbk') line_list = f.readlines() str_line = '' for line in line_list: if str(line).__contains__('Image filename'): str_line = line.strip().split('/')[2][0:-1] # remove the end of " break for line in line_list: if str(line).__contains__('Objects with ground truth'): nums = re.findall(r'\d+', str(line)) str_line = str_line + ' ' + str(nums[0]) # print(str_line) break for index in range(1, int(nums[0]) + 1): for line in line_list: if str(line).__contains__("Bounding box for object " + str(index)): coordinate = re.findall(r'\d+', str(line)) str_line = str_line + ' ' + coordinate[1] + ' ' + coordinate[2] + ' ' + coordinate[3] + ' ' + coordinate[4] f.close() print(str_line) if __name__ == "__main__": main()