OJ平台远端判题子系统开发（六）：判题结果标准化与安全加固

前几周已经实现了Worker并发控制和系统可观测性。本周的工作集中在判题结果的标准化——补全全部判题状态、完善MLE和OLE的判定逻辑——以及安全加固层面的seccomp正式集成。

从ACM参赛者的角度看，判题结果的准确性是OJ系统最基础也最重要的要求。如果一次Runtime Error被误判为Wrong Answer，参赛者会花费大量时间在错误的调试方向上。同样，如果Memory Limit Exceeded没有被正确识别，参赛者无从得知是算法问题还是空间复杂度问题。

一、判题结果标准化

1.1 完整状态定义

项目实现了11种判题状态，包括4个中间状态和7个终态（另有System Error作为兜底）：

状态	类型	触发条件
Pending	中间态	提交已创建，尚未进入队列
Queueing	中间态	消息已入队列，等待Worker消费
Compiling	中间态	Worker正在编译代码
Running	中间态	正在执行测试点
Accepted	终态	所有测试点输出与预期一致
Wrong Answer	终态	输出与预期不匹配
Compile Error	终态	编译命令退出码≠0
Runtime Error	终态	运行命令退出码≠0
Time Limit Exceeded	终态	运行时间超限
Memory Limit Exceeded	终态	内存使用超限（OOM / 退出码137 / MemoryKB超限）
Output Limit Exceeded	终态	stdout输出量超限
System Error	终态	沙箱异常、语言不支持等系统级错误

1.2 判定顺序与优先级

判定顺序直接影响结果的准确性。经过分析，确定了如下优先级链：

func (s *Service) compareCase(res sandbox.ExecResult, expected string,
    req domain.JudgeRequest) domain.SubmissionStatus {
    // 1. 超时判定（优先级最高——超时意味着程序可能没有完整输出）
    if res.TimedOut || int(res.Runtime.Milliseconds()) > req.TimeLimitMs {
        return domain.StatusTimeLimitExceeded
    }
    // 2. 内存超限判定（OOM / 退出码137）
    if res.OOMKilled || res.ExitCode == 137 {
        return domain.StatusMemoryLimitExceeded
    }
    // 3. 内存超限判定（MemoryKB超过限制）
    if req.MemoryLimitMB > 0 && res.MemoryKB > req.MemoryLimitMB*1024 {
        return domain.StatusMemoryLimitExceeded
    }
    // 4. 输出超限判定
    if req.OutputLimitKB > 0 && len(res.Stdout) > req.OutputLimitKB*1024 {
        return domain.StatusOutputLimitExceeded
    }
    // 5. 运行时错误
    if res.ExitCode != 0 {
        return domain.StatusRuntimeError
    }
    // 6. 输出比对
    if normalizeOutput(res.Stdout) != normalizeOutput(expected) {
        return domain.StatusWrongAnswer
    }
    // 7. 全部通过
    return domain.StatusAccepted
}

判定顺序的设计逻辑：

TLE优先级最高——如果程序超时被kill，后续的输出可能不完整甚至不存在，此时不应比对输出
MLE其次——OOM Killed的情况下程序同样无法正常输出
OLE排在MLE之后——输出超限也意味着部分输出可能被截断
RE排在资源异常之后——进程异常退出（如段错误）与资源超限互斥，先判资源再判错误
WA最后——只有排除了所有系统级别的问题后，输出不一致才能判定为答案错误

1.3 输出比对规则

func normalizeOutput(s string) string {
    return strings.TrimSpace(strings.ReplaceAll(s, "\r\n", "\n"))
}

只做两项处理：

\r\n → \n：统一Windows和Linux的换行符差异
TrimSpace：去除首尾空白字符

不做空白压缩。OJ评测标准要求输出精确匹配，连续空格的数量差异意味着输出格式不一致。这个规范化方式与主流OJ平台的评测行为一致。

1.4 测试点详情结构

type SubmissionCaseResult struct {
    SubmissionID  uint64
    CaseNo        int
    Status        SubmissionStatus
    RuntimeMs     int
    MemoryKB      int
    StdoutBytes   int
    StderrBytes   int
    Signal        int
    StdoutPreview string    // 截断展示
    StderrPreview string
}

每个测试点包含完整的运行时信息，通过 GET /api/submissions/:id/cases 接口返回。对于参赛者调试代码来说，查看每个测试点的详细耗时、内存、错误输出是非常有价值的反馈。

二、Docker集成测试

2.1 测试场景设计

在真实Docker环境中逐个覆盖全部判题状态，每个测试单独运行。

C++17 正确代码 → Accepted：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerAccepted

docker_accepted

C++17 错误答案 → Wrong Answer（提交 a-b 代码，期望 a+b 的答案）：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerWrongAnswer

docker_wrong_answer

C++17 语法错误 → Compile Error：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerCompileError

docker_compile_error

C++17 空指针解引用 → Runtime Error：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerRuntimeError

docker_runtime_error

C++17 sleep(3s) 限 300ms → Time Limit Exceeded：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerTimeLimitExceeded

docker_tle

Python 3.11 正确代码 → Accepted：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerPythonAccepted

docker_python_ac

Go 1.22 正确代码 → Accepted：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerGoAccepted

docker_go_ac

Python 3.11 输出 4096B 限 1KB → Output Limit Exceeded：

go test -v -count=1 -timeout 30s ./internal/judger/ -run TestJudgeWithDockerOutputLimitExceeded

docker_python_ole

8个Docker集成测试全部通过。

2.2 全量测试结果

$ go test ./internal/... -count=1 -timeout 120s
ok  remote_judge/internal/api              0.698s
ok  remote_judge/internal/judger          16.898s
ok  remote_judge/internal/sandbox          0.486s
ok  remote_judge/internal/service          0.525s
ok  remote_judge/internal/worker           0.704s
ok  remote_judge/internal/transport/grpcclient  0.154s

6个包全部通过。judger包的耗时较长（16s+），原因是包中包含Docker集成测试，需要实际的容器创建和执行时间。

all_tests

三、安全加固

3.1 多层安全体系

项目采用的安全措施覆盖了Docker安全模型的主要层次：

层次	措施	Docker参数
Linux Capability	移除所有Capability	`--cap-drop ALL`
Seccomp	自定义黑名单profile	`--security-opt seccomp=profile.json`
网络隔离	禁用网络	`--network none`
文件系统	工作目录使用tmpfs	`--tmpfs /workspace:...`
进程限制	最大64个进程	`--pids-limit 64`
用户权限	非root用户	`--user 1000:1000`
提权防护	禁止setuid等提权操作	`--security-opt no-new-privileges`

3.2 Seccomp黑名单

{
    "defaultAction": "SCMP_ACT_ALLOW",
    "architectures": ["SCMP_ARCH_X86_64"],
    "syscalls": [
        {
            "names": ["mount", "umount2", "pivot_root",
                      "reboot", "swapon", "swapoff",
                      "kexec_load", "bpf", "perf_event_open"],
            "action": "SCMP_ACT_ERRNO"
        }
    ]
}

阻止的系统调用覆盖了三类风险：文件系统破坏（mount/umount2/pivot_root）、系统控制（reboot/swapon/swapoff/kexec_load）、内核接口利用（bpf加载内核模块/perf_event_open读取内核数据）。

选择黑名单而非白名单的原因在week03已经详细讨论。在判题场景中，C++编译器和Go运行时库需要使用大量的系统调用，白名单的维护成本过高且容易因遗漏导致正常程序崩溃。

四、本周总结

完成内容

9种判题状态（Accepted/WA/CE/RE/TLE/MLE/OLE/System Error + 4个中间态）完整实现
判定优先级链（TLE→MLE→OLE→RE→WA→AC）设计和验证
输出规范化规则（换行符统一+TrimSpace）
真实Docker环境9个集成测试全部通过
Seccomp黑名单 + 多层安全加固体系
Benchmark性能验证

调研查阅的资料

Docker Seccomp Security Profiles：https://docs.docker.com/engine/security/seccomp/
Docker Container Security多层安全模型：https://docs.docker.com/engine/security/
OI Wiki判题状态分类标准：https://oi-wiki.org/intro/judge/
Docker Runtime Privilege and Capabilities：https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities

posted @ 2026-06-08 03:15 宋佳奇阅读(29) 评论(0) 收藏举报

刷新页面返回顶部

dinosaur7