Python 视频帧图片批量精简工具：清理无效帧 + 按帧率抽帧

视频帧图片批量精简工具：清理无效帧 + 按帧率抽帧

在处理视频抽帧后的图片数据集时，我们常面临两个问题：一是存在大量空白 / 无效图片（比如纯黑 / 纯白帧、文件体积极小的损坏图），二是原始帧率过高（如 30fps）导致图片数量冗余。今天分享的这个工具能一站式解决这两个问题，自动清理无效帧并按指定帧率精简图片，大幅降低数据集体积且不丢失关键信息。

一、工具核心用途

扫描指定目录下所有含图片帧的子目录，批量处理；
智能清理无效图片：删除文件体积过小（可自定义阈值）或图像内容近乎纯色（灰度方差判定）的空白帧；
按帧率规则精简图片：比如从 30fps 抽帧结果中，按 2fps 保留（每 15 帧保留 1 张），按需调整原始 / 目标帧率；
支持 “干运行” 模式：仅统计删除 / 保留数量，不实际删除文件，安全验证策略；
输出详细处理报告：含各目录原始 / 保留 / 删除数量、总压缩率、耗时等。

二、核心实现方式

工具基于 Python 开发，核心逻辑分为三步，整体流程清晰易懂：

1. 前置准备：目录扫描与依赖加载

扫描指定根目录，过滤掉无关目录（如标注工具目录），仅保留包含图片的子目录；
依赖 Pillow 处理图片、numpy 计算图像方差，缺失时自动提示安装。

2. 无效图片判定（双重过滤）

文件体积过滤：读取文件大小，低于指定字节阈值（默认 5KB）直接判定为无效；
图像内容过滤：将图片转为灰度图，计算像素值方差 —— 方差越低说明像素越均匀（越接近纯色），低于阈值则判定为空白帧。

3. 帧率精简逻辑

从图片文件名中提取帧号（匹配第一段数字）；
计算抽帧步长（步长 = 原始帧率 / 目标帧率），仅保留帧号为步长整数倍的图片，其余删除；
对无法提取帧号的图片，默认保留（避免误删）。

4. 流程管控

逐目录处理，输出每一步操作（删除原因、失败报错）；
处理完成后汇总所有目录数据，计算总保留数、删除数、压缩率。

三、依赖安装方式

工具依赖 Python 第三方库 Pillow（图片处理）和 numpy（数值计算），执行以下命令即可完成安装：

bash

运行

pip install Pillow numpy

注：建议在虚拟环境中安装，避免依赖冲突；Python 版本推荐 3.7 及以上。

四、工具使用说明

1. 基础用法

直接运行脚本（默认处理脚本所在目录，30fps→2fps，5KB 最小文件，方差阈值 50）：

bash

运行

python streamline_frames.py

2. 自定义参数

bash

运行

# 指定根目录、原始帧率25、目标帧率1、最小文件10KB、方差阈值30
python streamline_frames.py --root /path/to/frames --fps-orig 25 --fps-keep 1 --min-size 10240 --var-threshold 30

# 干运行（仅统计，不删除文件）
python streamline_frames.py --root /path/to/frames --dry-run

3. 参数说明

表格

参数	说明	默认值
--root	图片根目录	脚本所在目录
--fps-orig	原始帧率	30
--fps-keep	目标帧率	2
--min-size	最小文件字节数（低于则删除）	5120（5KB）
--var-threshold	灰度方差阈值（低于则判定为空白）	50.0
--dry-run	干运行模式（仅统计）	关闭

五、输出示例

============================================================
  视频帧图片精简 & 无效图清理工具
  根目录：/data/frames
  子目录数量：2
  原始帧率：30 fps  ->  目标帧率：2 fps
  无效图判定：文件 < 5120 B 或 灰度方差 < 50.0
============================================================

============================================================
目录：scene_01  共 300 张图片
策略：每 15 帧保留 1 张（30fps -> 2fps），同时清除无效图片
============================================================
  [DEL-SMALL]  frame_001.jpg  (1024 B)
  [DEL-BLANK]  frame_005.jpg  (方差过低，疑似无内容)
  [DEL] frame_002.jpg (多余帧)
...
  [OK]  保留：20 张
  [DEL] 删除无效图片：5 张
  [DEL] 删除多余帧：  275 张

# 汇总报告
============================================================
  处理完成 - 汇总报告
============================================================
  目录                                 原始    保留    删无效    删多余
  -----------------------------------------------------------------
  scene_01                            300     20        5       275
  scene_02                            285     19        3       263
  -----------------------------------------------------------------
  合计                                585     39        8       538

  原始总计：585 张  ->  精简后：39 张
  共删除：546 张（无效图 8 + 多余帧 538）
  压缩率：93.3%  耗时：2.5s

六、完整代码

点击查看代码

# -*- coding: utf-8 -*-
"""
streamline_frames.py
====================
功能：
  1. 扫描指定根目录下的所有子目录（含图片帧）
  2. 检测并删除无效/空白图片（文件过小 或 图像内容近乎纯色）
  3. 按 30fps -> 2fps 的规则精简帧（每 15 帧保留 1 张）

用法：
  python streamline_frames.py [--root ROOT_DIR] [--fps-orig 30] [--fps-keep 2]
                              [--min-size 5120] [--var-threshold 50] [--dry-run]
"""

import os
import re
import sys
import argparse
import time
from pathlib import Path

try:
    from PIL import Image
    import numpy as np
except ImportError:
    print("[ERROR] 缺少依赖库，请先执行：pip install Pillow numpy")
    sys.exit(1)


def is_blank_image(path, var_threshold):
    """判断图片是否为近乎纯色（无内容）的空白图。"""
    try:
        with Image.open(path) as img:
            gray = img.convert("L")
            arr = np.array(gray, dtype=np.float32)
            variance = float(np.var(arr))
            return variance < var_threshold
    except Exception as e:
        print("  [WARN] 无法读取图片 %s: %s，将视为无效图片" % (path.name, e))
        return True


def parse_frame_number(filename):
    """从文件名中提取帧号（取第一段数字）。"""
    stem = Path(filename).stem
    m = re.search(r'(\d+)', stem)
    return int(m.group(1)) if m else None


def process_directory(dir_path, fps_orig, fps_keep, min_size, var_threshold, dry_run):
    """处理单个子目录，返回统计信息字典。"""

    jpg_files = sorted(
        [f for f in dir_path.iterdir()
         if f.is_file() and f.suffix.lower() in ('.jpg', '.jpeg', '.png')],
        key=lambda f: f.name
    )

    if not jpg_files:
        return {"dir": dir_path.name, "total": 0, "kept": 0,
                "deleted_blank": 0, "deleted_sparse": 0, "errors": 0}

    step = fps_orig // fps_keep
    total = len(jpg_files)
    deleted_blank = 0
    deleted_sparse = 0
    kept = 0
    errors = 0

    sep = "=" * 60
    print("\n" + sep)
    print("目录：%s  共 %d 张图片" % (dir_path.name, total))
    print("策略：每 %d 帧保留 1 张（%dfps -> %dfps），同时清除无效图片" % (step, fps_orig, fps_keep))
    print(sep)

    for f in jpg_files:
        frame_num = parse_frame_number(f.name)
        file_size = f.stat().st_size

        # 步骤一：文件过小 -> 删除
        if file_size < min_size:
            print("  [DEL-SMALL]  %s  (%d B)" % (f.name, file_size))
            if not dry_run:
                try:
                    f.unlink()
                    deleted_blank += 1
                except Exception as e:
                    print("  [ERROR] 删除失败: %s" % e)
                    errors += 1
            else:
                deleted_blank += 1
            continue

        # 步骤二：图像内容近乎纯色 -> 删除
        if is_blank_image(f, var_threshold):
            print("  [DEL-BLANK]  %s  (方差过低，疑似无内容)" % f.name)
            if not dry_run:
                try:
                    f.unlink()
                    deleted_blank += 1
                except Exception as e:
                    print("  [ERROR] 删除失败: %s" % e)
                    errors += 1
            else:
                deleted_blank += 1
            continue

        # 步骤三：帧率精简
        if frame_num is None:
            kept += 1
            continue

        if frame_num % step == 0:
            kept += 1
        else:
            if not dry_run:
                try:
                    f.unlink()
                    deleted_sparse += 1
                except Exception as e:
                    print("  [ERROR] 删除失败: %s" % e)
                    errors += 1
            else:
                deleted_sparse += 1

    print("\n  [OK]  保留：%d 张" % kept)
    print("  [DEL] 删除无效图片：%d 张" % deleted_blank)
    print("  [DEL] 删除多余帧：  %d 张" % deleted_sparse)
    if errors:
        print("  [ERR] 删除失败：    %d 张" % errors)

    return {
        "dir": dir_path.name,
        "total": total,
        "kept": kept,
        "deleted_blank": deleted_blank,
        "deleted_sparse": deleted_sparse,
        "errors": errors,
    }


def main():
    parser = argparse.ArgumentParser(description="视频帧图片精简 & 无效图清理工具")
    parser.add_argument("--root", type=str,
                        default=str(Path(__file__).parent),
                        help="图片根目录（默认脚本所在目录）")
    parser.add_argument("--fps-orig", type=int, default=30, help="原始帧率（默认 30）")
    parser.add_argument("--fps-keep", type=int, default=2,  help="目标帧率（默认 2）")
    parser.add_argument("--min-size", type=int, default=5120,
                        help="文件最小字节，低于此值视为无效（默认 5120=5KB）")
    parser.add_argument("--var-threshold", type=float, default=50.0,
                        help="图像灰度方差阈值，低于此值视为无内容（默认 50）")
    parser.add_argument("--dry-run", action="store_true",
                        help="仅统计，不实际删除文件")
    args = parser.parse_args()

    root = Path(args.root)
    if not root.is_dir():
        print("[ERROR] 根目录不存在：%s" % root)
        sys.exit(1)

    skip_dirs = {"标注要求及说明", "labelImg", "Xanylabeling", ".workbuddy", "__pycache__"}

    target_dirs = []
    for d in root.iterdir():
        if not d.is_dir() or d.name in skip_dirs:
            continue
        try:
            has_img = any(f.suffix.lower() in ('.jpg', '.jpeg', '.png')
                          for f in d.iterdir() if f.is_file())
            if has_img:
                target_dirs.append(d)
        except PermissionError:
            pass

    if not target_dirs:
        print("[INFO] 未找到含图片的子目录，程序退出。")
        return

    sep = "#" * 60
    print("\n" + sep)
    print("  视频帧图片精简 & 无效图清理工具")
    print("  根目录：%s" % root)
    print("  子目录数量：%d" % len(target_dirs))
    print("  原始帧率：%d fps  ->  目标帧率：%d fps" % (args.fps_orig, args.fps_keep))
    print("  无效图判定：文件 < %d B 或 灰度方差 < %.1f" % (args.min_size, args.var_threshold))
    if args.dry_run:
        print("  [!] DRY-RUN 模式：仅统计，不实际删除")
    print(sep)

    t0 = time.time()
    all_stats = []
    for d in sorted(target_dirs):
        stats = process_directory(
            dir_path=d,
            fps_orig=args.fps_orig,
            fps_keep=args.fps_keep,
            min_size=args.min_size,
            var_threshold=args.var_threshold,
            dry_run=args.dry_run,
        )
        all_stats.append(stats)

    elapsed = time.time() - t0

    print("\n" + sep)
    print("  处理完成 - 汇总报告")
    print(sep)
    total_orig   = sum(s["total"]           for s in all_stats)
    total_kept   = sum(s["kept"]            for s in all_stats)
    total_blank  = sum(s["deleted_blank"]   for s in all_stats)
    total_sparse = sum(s["deleted_sparse"]  for s in all_stats)
    total_del    = total_blank + total_sparse

    print("  %-35s %6s %6s %7s %7s" % ("目录", "原始", "保留", "删无效", "删多余"))
    print("  " + "-" * 65)
    for s in all_stats:
        print("  %-35s %6d %6d %7d %7d" % (
            s["dir"], s["total"], s["kept"], s["deleted_blank"], s["deleted_sparse"]))
    print("  " + "-" * 65)
    print("  %-35s %6d %6d %7d %7d" % (
        "合计", total_orig, total_kept, total_blank, total_sparse))
    print("")
    print("  原始总计：%d 张  ->  精简后：%d 张" % (total_orig, total_kept))
    print("  共删除：%d 张（无效图 %d + 多余帧 %d）" % (total_del, total_blank, total_sparse))
    if total_orig > 0:
        print("  压缩率：%.1f%%  耗时：%.1fs" % ((1 - total_kept/total_orig)*100, elapsed))
    print(sep + "\n")


if __name__ == "__main__":
    main()

posted @ 2026-04-10 21:30 Dapenson 阅读(3) 评论(0) 收藏举报

刷新页面返回顶部

Dapenson

Python 视频帧图片批量精简工具：清理无效帧 + 按帧率抽帧

视频帧图片批量精简工具：清理无效帧 + 按帧率抽帧

一、工具核心用途

二、核心实现方式

1. 前置准备：目录扫描与依赖加载

2. 无效图片判定（双重过滤）

3. 帧率精简逻辑

4. 流程管控

三、依赖安装方式

四、工具使用说明

1. 基础用法

2. 自定义参数

3. 参数说明

五、输出示例

六、完整代码

公告