【Swallow】Detecting Vulnerabilities using Patch-Enhanced Vulnerability Signatures

名词解释

名词 翻译
Vulnerability 漏洞
patched-function 补丁函数
signature 签名
sanitizing check 合理性检验
data dependency 数据依赖
control dependency 控制依赖
program dependence graph PDG
abstract syntax tree AST

Abstract

Recurring vulnerabilities widely exist and remain undetected in real-world systems, which are often resulted from reused code base or shared code logic. However, the potentially small differences between vulnerable functions and their patched functions as well as the possibly large differences between vulnerable functions and target functions to be detected bring challenges to clone-based and function matching-based approaches to identify these recurring vulnerabilities, i.e., caus-ing high false positives and false negatives.

点击查看翻译

重复出现的漏洞在现实世界中广泛存在且仍未被发现,这些漏洞通常是由重用的代码库或共享的代码逻辑引起的。然而,漏洞函数与其修补函数之间可能存在的微小差异,以及易受攻击函数和要检测的目标函数之间可能存在的巨大差异,这给基于克隆和基于函数匹配的方法带来了挑战,以识别这些反复出现的漏洞,即高误报和漏报

简单来说就是检测补丁与漏洞之间的差别

In this paper, we propose a novel approach to detect recurring vulnerabilities with low false positives and low false neg-atives. We first use our novel program slicing to extract vulnerability and patch signatures from vulnerable function and its patched function at syntactic and semantic levels. Then a target function is identified as potentially vulnerable if it matches the vulnerability signature but does not match the patch signature.
We implement our approach in a tool named MVP.Our evaluation on ten open-source systems has shown that, i)MVP significantly outperformed state-of-the-art clone-based and function matching-based recurring vulnerability detec-tion approaches; ii) MVP detected recurring vulnerabilities that cannot be detected by general-purpose vulnerability detection approaches, i.e., two learning-based approaches and two commercial tools; and iii) MVP has detected 97 new vulnerabilities with 23 CVE identifiers assigned.

点击查看翻译 在本文中,我们提出了一种检测低误报和低假否定的重复性漏洞的新方法。我们首先使用我们的新程序切片从易受攻击的函数及其在句法和语义级别的修补函数中提取漏洞和补丁签名。然后,如果目标函数与漏洞签名匹配但与补丁签名不匹配,则该函数被标识为可能易受攻击。 我们在名为 MVP 的工具中实现我们的方法。我们对10个开源系统的评估表明,i)MVP明显优于最先进的基于克隆和基于函数匹配的重复性漏洞检测方法;ii) MVP检测到通用漏洞进检测方法无法检测到的反复出现的漏洞,即两种基于学习的方法和两种商业工具;iii)MVP检测到97个新漏洞,分配了23个CVE标识符。
通过对比签名的方式来检测

1 Introduction

Due to reusing code base or sharing code logic (e.g., similar processing logic for similar objects in their different usages) in software systems, recurring vulnerabilities which share the similar characteristics with each other widely exist but remain undetected in real-world programs [46, 50, 75]. Therefore, recurring vulnerability detection has gained wide popularity, especially with the increased availability of vulnerabilities. The scope of this paper is to detect recurring vulnerabilities; i.e., given a vulnerability that behaves in a very specific way in a program, we detect whether other programs may have this specific behavior. Differently, general-purpose vulnerability detection techniques (e.g., [1, 2, 41, 44, 77]) leverage the general behaviors of a large fraction of vulnerabilities to find specific instances of these general behaviors in programs.

点击查看翻译 由于在软件系统中重用代码库或共享代码逻辑(例如,不同用途中类似对象的类似处理逻辑),因此广泛存在彼此共享相似特征的重复性漏洞,但在实际程序中仍未检测到[46,50,75]。因此,重复性漏洞检测已获得广泛普及,特别是随着漏洞可用性的增加。本文的范围是检测反复出现的漏洞;也就是说,给定一个在程序中以非常特定的方式表现的漏洞,我们检测其他程序是否可能具有这种特定行为。不同的是,通用漏洞检测技术(例如,[1,2,41,44,77])利用大部分漏洞的一般行为来查找程序中这些一般行为的特定实例。

共用代码导致的重复性漏洞产生原理

Existing Approaches

A general idea to detect recurring vulnerabilities is to match the source code of a target system with known vulnerabilities;

  • Clone-based approaches-- consider the recurring vulnerability detection problem as a code clone detection problem; i.e., they extract token- or syntax-level signature from a known vulnerability, and identify code clones to the signature as potentially vulnerable.
点击查看翻译 基于克隆的方法将反复出现的漏洞检测问题视为代码克隆检测问题;也就是说,它们从已知漏洞中提取令牌或语法级签名,并将签名的代码克隆识别为可能易受攻击。
  • Function matchingbased approaches directly use vulnerable functions in a known vulnerability as the signature and detect code clones to those vulnerable functions. They do not consider any vulnerability characteristics as they are not designed particularly for recurring vulnerability detection.
点击查看翻译 基于函数匹配的方法直接使用已知漏洞中的易受攻击函数作为签名并检测代码克隆到那些易受攻击的函数。它们不考虑任何漏洞特征,因为它们不是专门为重复检测漏洞而设计的。

两种已知检测方式,但前者无法检测到修改之后的漏洞代码,导致假阴,后者无法区分漏洞与补丁,导致假阳。

Challenges

In summary, there are two main challenges in detecting recurring vulnerabilities with both low false positives and low false negatives. The first challenge is how to distinguish already patched vulnerabilities to reduce false positives. The second challenge is how to precisely generate the signature of a known vulnerability to reduce both false positives and false negatives.

点击查看翻译

总之,在检测具有低误报率和低误报率的重复性漏洞方面存在两个主要挑战。第一个挑战是如何区分已经修补的漏洞以减少误报。第二个挑战是如何精确生成已知漏洞的签名以减少误报和漏报。

Our Approaches

To address the two challenges, we propose a novel recurring vulnerability detection approach, named MVP (Matching Vulnerabilities with Patches). Specifically, to address the first challenge, we not only generate a vulnerability signature but also a patch signature to capture how a vulnerability is caused and fixed. We leverage the vulnerability signature to search for potentially vulnerable functions, and use the patch signature to distinguish whether they are already patched or not.
To address the second challenge, we propose a novel slicing method to extract only vulnerability-related and patch-related statements to generate vulnerability and patch signatures at both syntactic level and semantic level. Besides, we apply statement abstraction and entropy-based statement selection to further improve the accuracy of MVP.

点击查看翻译

为了解决这两个挑战,我们提出了一种新的重复性漏洞检测方法,称为 MVP(匹配漏洞与补丁)。具体来说,为了解决第一个挑战,我们不仅生成漏洞签名,还生成补丁签名来捕捉漏洞是如何产生和修复的。我们利用漏洞签名来搜索潜在易受攻击的功能,并使用补丁签名来区分它们是否已经打过补丁。为了解决第二个挑战,我们提出了一种新的切片方法,仅提取与漏洞相关和补丁相关的语句,以在句法级别和语义级别生成漏洞和补丁签名。此外,我们应用语句抽象和基于熵的语句选择来进一步提高 MVP 的准确性。

MVP方法即同时给漏洞和补丁赋予签名,其定义是通过代码切片,以提高区分度。

Contribution

The main contributions of our work are:

  • We proposed and implemented a novel recurring vulnerability detection approach by leveraging vulnerability and patch signatures through our novel slicing technique.
  • We conducted intensive evaluation to compare MVP with four categories of state-of-the-art approaches. MVP significantly outperformed them in accuracy.
  • We found 97 new vulnerabilities in ten open-source systems with 23 CVE identifiers assigned.
点击查看翻译
  • 我们通过我们新颖的切片技术利用漏洞和补丁签名,提出并实施了一种新颖的重复漏洞检测方法。
  • 我们进行了深入评估,将 MVP 与四类最先进的方法进行比较。 MVP 在准确性上明显优于他们。
  • 我们在分配了23 个CVE 标识符的10 个开源系统中发现了97 个新漏洞。

2 Motivation

Problems

We investigate the similarity among vulnerable function (V), patched function (P) and target function (T) to illustrate the problems of existing approaches. P is the result of applying a security patch on V; and T is a vulnerable function in a target
system under detection. We use Sim(f1, f2) to denote the similarity score between function f1 and f2.

计算漏洞与补丁之间的相似度,发现90%以上的漏洞与其对应补丁的功能相似度达到70%以上,而代码几乎一致,这将导致高度假阳性。另一方面,漏洞与目标之间的相似度较低将使得,目前已有的技术无法检测到目标,进而导致假阴性。

A Motivating Example

MVP builds data flow of the local variable vdev_id as the semantic signature. With the help of it, it can detect semantic-equivalent vulnerabilities whose syntax is slightly changed. The detailed signature extraction process will be discussed in § 3.2.


3 Methodology

3.1 Definition

  • The Extracting Function Signature step (§ 3.2) takes a target system as an input, and generates a signature for each function in the target system.
  • The Extracting Vulnerability and Patch Signatures step (§ 3.3) takes a security patch as an input, and generates a vulnerability signature and a patch signature to reflect a vulnerability from the perspective of how it is caused and how it is fixed.
  • The final Detecting Vulnerability step (§ 3.4) determines whether each function in the target system is potentially vulnerable by matching its signature with the vulnerability and patch signatures.
点击查看翻译
  • 目标系统作为输入,生成目标系统中的所有功能的签名
  • 将补丁作为输入,生成漏洞和补丁的签名,用以体现漏洞如何产生且怎样被修复
  • 通过签名匹配,判断目标系统是否存在漏洞威胁

Function signature
Given a C/C++ function f, we define its signature as a tuple (fsyn, fsem), where fsyn is a set of the hash values of all statements in the function; fsem is a set of 3-tuple (h1,h2,type) such that h1 and h2 denote hash values of two statements (i.e., h1,h2 ∈ fsyn), and type ∈ {data, control} denotes the statement whose hash value is h1 has a data or control dependency on the statement whose hash value is h2.
fsyn captures the statements of a target function as the syntactic signature. fsem captures data and control dependencies among statements in the function as semantic signature. They are providing complementary information of a function to help to improve the matching accuracy.
In the remaining of this paper, we assume that each vulnerability is within one function. We use (fv, pv) to denote the pair of a vulnerable function fv and the patched function pv after fixing the vulnerability in fv.

将函数签名定义为一对元组,分别用语句及语句之间的依赖作为函数的签名。

Function Patch
Given a pair of functions (fv, pv), the function patch Pv consists of one or more hunks. A hunk is a basic unit in patch, which consists of context lines, deleted lines and/or added lines. Deleted lines are lines in the fv but missing from pv, while added lines are lines missing in fv but present in pv. The first and last 3 lines in a hunk and lines between deleted and/or added lines are context lines. Given a function pair (fv, pv) and the patch Pv, we further define Sdel as the statements in the fv but missing from pv, Sadd as the statements missing in fv but present in pv, Svul as all statements in fv, Spat as all statements in pv.

定义hunk为漏洞与补丁中代码的加减项及共同内容的集合。

3.2 Extracting Function Signature

  1. Parsing and Analyzing Function
    using parser to generate a code property graph which merges abstract syntax tree(AST), control flow graph and program dependence graph(PDG) into a joint data structure.
  2. Abstracting and Normalizing Function
    performing abstraction to each function before extracting the signature to avoid false negatives and replacing parameters and variables with normalized symbol.
  3. Generating Function Signature

通过分析源代码,得到所有函数的代码属性图,再将变量、常量、字符串等标准化替换,与数据、控制依赖性一同组成函数签名。
简单来说,就是对源代码进行抽象、剪枝,最终得到最精简的信息作为函数签名。

3.3 Extracting Vulnerability and Patch Signatures

  1. Identifying Code Changes
    locating statements to find changed functions.
  2. Computing Slices to Changed Code
    including the data/control dependency as the slicing criterion.
    introducing novel slicing method to capture a vulnerability:normal backward slicing but customized forward slicing according to different statement types--assignment/conditional/return/other statement.
  3. Generating Vulnerability and Patch Signatures

漏洞将删除/添加语句作为语句签名,再计算语义签名;补丁在同样的计算步骤的基础上,只考虑只存在在补丁中的语句,换而言之,去冗余。
另外,考虑到即便有些语句可以作为签名,but not necessary,因此使用信息熵的方法来进一步去除冗余。

3.4 Detecting Vulnerability through Matching

将语句与语义匹配度作为判定标准,设定阈值来匹配。


4 Evaluation

4.1 Evaluation Setup

Research Questions.
accuracy / scalability / sensitivity of the threshold of MVP on recurring vulnerabilities
how useful are the adoption of statement abstraction and statement information
performance on general-purpose vulnerabilities
Dataset

4.2 Accuracy Evaluation

Ground Truth
we manually analyzed potentially vulnerable functions detected by each approach and confirmed whether they were true positives.

Comparison with ReDeBug and VUDDY

  • False Positive Analysis for MVP
    • First, calling context is missing as we do not use inter-procedure analysis when we extract signatures at the semantic level.

    • Second, semantic equivalence is not modeled, thus solution can be various.

    • Third, extracted vulnerability or patch signature is not able to capture the characteristics of a vulnerability due to the various root causes of vulnerabilities.

  • Comparison with ReDeBug and VUDDY
    • First, ReDeBug leverages each of the hunks in a changed function separately to match potentially vulnerable functions, and thus it suffers local matching, especially when the hunk has changes only to blank line, comment, header information, macro, or struct.

    • Second, ReDeBug uses a sliding window (of 4 lines of code by default) to match potentially vulnerable functions, which may cause false positive.

    • Third, ReDeBug does not use the semantics information, which may cause false negative.

    • For VUDDY, apart from missing calling context (causing 8false positives), another major reason is abstraction.

怎么感觉ReDeBug三个缺点都是一个意思,就是因为没有用到语义关系所以导致了局部匹配。
而对于VUDDY,像是过分去冗余导致信息量下降。

  • False Negative Analysis
    • The reason is that MVP does not work at the hunk level but at the function level, which can bring noise into extracted signatures.

    • Moreover, it does not apply abstraction on data types and function calls so that the signature is not generalized enough to capture some vulnerable cases.

    • For ReDeBug, it applies exact matching and does not apply abstraction.

    • For VUDDY, it also uses exact matching。

对于MVP,假阴性的成因是签名不够抽象,另外两者则是过于在意精确度,导致那些不会产生漏洞但与签名匹配的代码块也被误认为漏洞。

Comparison with SourcererCC and CCAligner

to indicate that code clone detection alone without considering any vulnerability characteristics is not suitable for recurring vulnerability detection.

Similarity of Vulnerable and Target Function

MVP can detected target function no matter it's similar to vulnerability.

4.3 Scalability Evaluation

In summary, MVP is slower than ReDeBug and VUDDY,but it still scales to large systems.

尽管作者认为MVP的较高精确度可以节省人工审核的时间,但是审核时间不应该和检测出来的个数有关吗?精度较高并不意味着完成检测后不需要人工再检测。疑惑。

4.4 Threshold Sensitivity Analysis

4.5 Contribution of Statement Abstraction and Statement Information

4.7 Limitations

  • First, we focus on detecting recurring vulnerabilities.

  • Second, MVP uses Joern to generate code property graph.

  • Third, we cannot detect vulnerabilities whose patches are out of functions.

  • Besides, our accuracy evaluation has revealed some root causes that are not well handled.


5 Related Work

6 Conclusions

Notes

  • 为什么不将漏洞产生原因记录下来然后直接测试来检测呢?signature
  • hush value of statement 是如何生成的?内存地址吗?dependency如何度量?hash & dependency
  • 如果在编写代码时就将代码片段加上签名呢?
  • Vsyn为什么是这么算的?del和add不是对等的吗?————似乎删除语句是cause(但不是全部cause,有时只有添加)而添加语句是fixsyn
  • 但是去除冗余并不意味着能提高精确度,有可能删去的特征即是最有分辨性的特征。距离最远的一定是相关性最弱的吗?即便如此,分辨性也不一定与之成正比information
  • 同样的问题,相关性==分辨性吗?should be the most representative onesmatch
  • 有没有可能,即便模板匹配上了,漏洞真正成因也不相同?
  • what's this inter-procedure
  • 没看明白 root cause
点击查看翻译
posted @ 2022-08-25 15:41  Tabshh  阅读(179)  评论(0)    收藏  举报