# 南大《软件分析》——Data Flow Analysis

@(静态分析)

## 目录

1. 数据流分析概述

2. 数据流分析应用

• Reaching Definitions Analysis(may analysis)
• Live Variables Analysis(may analysis)
• Available Expressions Analysis(must analysis)

## 数据流分析

### 相关概念

may analysis: 输出可能是正确的，要做over-approxiamation追求sound，可以有误报
must analysis: 输出必须是正确的，要做under-approxiamation追求complete，可以有漏报

### 前置知识

#### Input and Output States

Input and Output States(输入输出状态)：程序执行前和执行后的状态

• IR语句的每次执行都会从输入状态转换到新的输出状态。
• 输入（输出）状态与语句之前（之后）的程序点相关联。

In each data-flow analysis application, we associate with every program point a data-flow value that represents an abstraction of the set of all possible program states that can be observed for that point.

Data-flow analysis is to find a solution to a set of safe-approximation directed constraints on the IN[s]’s and OUT[s]’s, for all statements s.

• constraints based on semantics of statements (transfer functions)
• constraints based on the flows of control

#### Notations for Transfer Function’s Constraints

Transfer Funciton: 给一个input按预定规则输出一个output

• 前向分析

• 反向分析

#### Notations for Control Flow’s Constraints

• Basic Block内部的
• Basic Block之间的（分为前向和反向）

## 数据流分析方法

### Reaching Definitions Analysis(到达定值分析)

A definition d at program point p reaches a point q if there is a path
from p to q such that d is not “killed” along that path

Reaching definitions can be used to detect possible undefined
variables. e.g., introduce a dummy definition for each variable v at
the entry of CFG, and if the dummy definition of v reaches a point
p where v is used, then v may be used before definition (as
undefined reaches v)

#### 公式分析

$$D: v = x\ op\ y$$

This statement “generates” a definition D of variable v and “kills” all the other definitions in the program that define variable v, while leaving the remaining incoming definitions unaffected.

#### 算法

Iterative algorithm

boundary condition:初始化OUT[entry]为空

##### 为什么迭代会停止？

$gen_B$和$kill_B$是不变的，只有IN[B]在变化，所以说OUT[B]只会增加不会减少，n向量长度是有限的，所以最终肯定会停止。

### Live Variables Analysis(活跃变量分析)

Live variables analysis tells whether the value of variable v at
program point p could be used along some path in CFG starting at p.
If so, v is live at p; otherwise, v is dead at p.

（感觉跟污点分析有点类似，标记污点如果路径中用到了就作污点传播，重定义了就消除污点）

IN变了就继续迭代

### Available Expression Analysis

An expression x op y is available at program point p if (1) all paths from the entry to p must pass through the evaluation of x op y, and (2) after the last evaluation of x op y, there is no redefinition of x or y

#### transfer functions

a被redefine了，a+b被kill掉。剩下x op y

For safety of the analysis, it may report an expression as unavailable even if it is OUT truly = { x available op y } (must analysis -> under-approximation)

## 三种技术对比

&nbsp Reaching Definitions Live Variables Available Expressions
Domain Set of definitions Set of variables Set of expressions
Direction forward backward forward
May/Must May May Must
Boundary OUT[entry]=$\emptyset$ IN[exit]=$\emptyset$ OUT[entry]=$\emptyset$
Initialization OUT[B]=$\emptyset$ IN[B]=$\emptyset$ OUT[B]=$\cup$
Transfer function OUT=gen $\cup$ (IN - kill) OUT=gen $\cup$ (IN - kill) OUT=gen $\cup$ (IN - kill)｜
Meet $\cup$ $\cup$ $\cup$

## 总结

Reaching Definitions只要从赋值语句到点p存在1条路径，则为reaching，结果不一定正确；（May）
Live Variables表示只要从点p到Exit存在1条路径使用了变量v，则为live，结果不一定正确；（May）
Available Expressions表示从Entry到点p的每一条路径都经过了该表达式，则为available，结果肯定正确。（Must）

posted @ 2021-10-09 17:14  twosmi1e  阅读(292)  评论(0编辑  收藏  举报