使用 SVF 分析某个开源项目

因为一些学习任务需要在本地使用 SVF 构建某个项目的 CallGraph，中间遇到一点问题

wLLVM

参考 https://blog.csdn.net/weixin_47778392/article/details/141107768

SVF 编译时终端 `exited with code 1`

需要更新 CMake 版本，尝试更新 CMake：参考 https://blog.csdn.net/qq_42176274/article/details/144149612

使用 SVF 分析一个简单 demo

写一个包含直接调用和间接调用的 c 语言 demo：

#include <stdio.h>

void foo() {
    printf("foo\n");
}

void bar() {
    printf("bar\n");
}

int main() {
    void (*fp)() = bar;
    fp();           // 间接调用
    foo();          // 直接调用
    return 0;
}

获取 bitcode，编译时注意关闭 optnone：

clang -O0 -g -emit-llvm -c demo.c -o demo.bc -Xclang -disable-O0-optnone

编译复杂项目的时候用 wllvm 导出 bitcode，以 httpd 为例：

CC=wllvm CFLAGS="-O0 -g -save-temps=obj -fno-discard-value-names -w -Xclang -disable-O0-optnone" ./configure
make
extract-bc -l llvm-link httpd

使用 Anderson 指针分析构造 callgraph： wpa -ander -dump-callgraph demo.bc

callgraph_initial.dot

digraph "Call Graph" {
	label="Call Graph";

	Node0x55d3e7dd2620 [shape=record,shape=box,label="{CallGraphNode ID: 0 \{fun: foo\}|{<s0>1}}"];
	Node0x55d3e7dd2620:s0 -> Node0x55d3e7da6830[color=black];
	Node0x55d3e7da4ab0 [shape=record,shape=box,label="{CallGraphNode ID: 1 \{fun: bar\}|{<s0>2}}"];
	Node0x55d3e7da4ab0:s0 -> Node0x55d3e7da6830[color=black];
	Node0x55d3e7da4bf0 [shape=record,shape=box,label="{CallGraphNode ID: 2 \{fun: main\}|{<s0>3}}"];
	Node0x55d3e7da4bf0:s0 -> Node0x55d3e7dd2620[color=black];
	Node0x55d3e7da6830 [shape=record,shape=Mrecord,label="{CallGraphNode ID: 3 \{fun: printf\}}"];
	Node0x55d3e7da6970 [shape=record,shape=Mrecord,label="{CallGraphNode ID: 4 \{fun: llvm.dbg.declare\}}"];
}

callgraph_final.dot

digraph "Call Graph" {
	label="Call Graph";

	Node0x55d3e7dd2620 [shape=record,shape=box,label="{CallGraphNode ID: 0 \{fun: foo\}|{<s0>1}}"];
	Node0x55d3e7dd2620:s0 -> Node0x55d3e7da6830[color=black];
	Node0x55d3e7da4ab0 [shape=record,shape=box,label="{CallGraphNode ID: 1 \{fun: bar\}|{<s0>2}}"];
	Node0x55d3e7da4ab0:s0 -> Node0x55d3e7da6830[color=black];
	Node0x55d3e7da4bf0 [shape=record,shape=box,label="{CallGraphNode ID: 2 \{fun: main\}|{<s0>3|<s1>4}}"];
	Node0x55d3e7da4bf0:s0 -> Node0x55d3e7dd2620[color=black];
	Node0x55d3e7da4bf0:s1 -> Node0x55d3e7da4ab0[color=red];
	Node0x55d3e7da6830 [shape=record,shape=Mrecord,label="{CallGraphNode ID: 3 \{fun: printf\}}"];
	Node0x55d3e7da6970 [shape=record,shape=Mrecord,label="{CallGraphNode ID: 4 \{fun: llvm.dbg.declare\}}"];
}

容易观察到，通过指针分析，（通过函数指针的）间接调用被识别出来添加到 callgraph 里面了。接下来可以通过写一个简单的 llvm pass 从 bitcode 里面分析哪些是间接调用，从而分析 SVF 给出的 callgraph 中函数指针有没有被分析出来（或者说被分析认为是哪些函数）

写一个 `IndirectCallPass`

对于一个 CallBase，如果 getCalledFunction() == nullptr，可以认为它是一个间接调用。


#include "llvm/IR/PassManager.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Passes/PassBuilder.h"
#include <fstream>

using namespace llvm;

namespace {
struct IndirectCallPass : public PassInfoMixin<IndirectCallPass> {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
    static std::ofstream outfile("indirect_calls.jsonl", std::ios::app);
    unsigned callsite_idx = 0;
    for (auto &BB : F) {
      for (auto &I : BB) {
        if (auto *callInst = dyn_cast<CallBase>(&I)) {
          auto *calledFunc = callInst->getCalledFunction();
          if (calledFunc && calledFunc->isIntrinsic()){
            continue;
          }
          if (!callInst->getCalledFunction()) {
            outfile << F.getName().str() << "," << callsite_idx << "\n";
          }
          callsite_idx++;
        }
      }
    }
    return PreservedAnalyses::all();
  }
};
} // namespace

// 注册Pass（供opt工具使用）
llvm::PassPluginLibraryInfo getIndirectCallPassPluginInfo() {
  return {LLVM_PLUGIN_API_VERSION, "find-indirect-calls", LLVM_VERSION_STRING,
          [](PassBuilder &PB) {
            PB.registerPipelineParsingCallback(
                [](StringRef Name, FunctionPassManager &FPM,
                   ArrayRef<PassBuilder::PipelineElement>) {
                  if (Name == "find-indirect-calls") {
                    FPM.addPass(IndirectCallPass());
                    return true;
                  }
                  return false;
                });
          }};
}

extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo llvmGetPassPluginInfo() {
  return getIndirectCallPassPluginInfo();
}

通过 opt 工具使用这个 pass：

opt -load-pass-plugin=./IndirectCallPass.so -passes="find-indirect-calls" ./httpd.bc

对于通过 SVF 的 Anderson 导出的 callgraph，它不能精确到调用点位置信息，只能得到“某些函数调用了哪些函数”这样的信息。导出的 ICFG 也暂时只包含分析前的结果。所以最好还是把 SVF 作为一个 library 导入并使用。

=== SVF 关于 callgraph 和间接调用的部分源码分析

观察 SVF 的 CallGraph.h 文件：

class CallGraphEdge：从函数 A 到函数 B 的多个调用会被合并成一个调用边，每个边有一组直接调用和一组间接调用 CallInstSet indirectcalls，每个调用边有一个 CallSiteID csId（这个设计有点奇怪），有方法 getIndirectCalls()
class CallGraph：在多个指针分析中会被使用，每个调用图有一个成员 CallEdgeMap indirectCallMap 维护某个图上节点 CallICFGNode 到间接调用的函数的映射，有方法 hasIndCSCallees 和 getIndCSCallees 判别和获取间接调用的函数信息，getIndCallSitesInvokingCallee 获取间接调用对应的调用点

感觉内容暂时够用，可以先在官方提供的 SVF-example 上试验一下，获取所有的间接调用的 callsite 及其指针分析后结果（指向哪些函数）

=== 在项目中使用 SVF，并且获取间接调用可能指向的函数

对于上面的 demo.c，可以在 SVF-example 里面写：

  CallGraph::CallEdgeMap &indCallMap = callgraph->getIndCallMap();
  std::ofstream outfile("indirect_calls");
  for (CallGraph::CallEdgeMap::iterator it = indCallMap.begin(), eit = indCallMap.end(); it != eit; ++it) {
    const CallICFGNode *cs = it->first;
    const CallGraph::FunctionSet &callees = it->second;
    for (const FunObjVar *callee : callees) {
        std::string calleeName = callee ? callee->getName() : "<unknown>";
        outfile << cs->getSourceLoc() << " : " << calleeName << std::endl;
    }
  }
  outfile.close();

输出结果应当形如：

CallICFGNode: { "ln": 25, "cl": 5, "fl": "demo.c" } : baz
...

改进了一下 demo，用一个 array 来放函数指针，发现 anderson 给出的结果不够精确。

使用 SVF-example 分析一下 httpd 项目，得到结果：

...
CallICFGNode: { "ln": 301, "cl": 34, "fl": "config.c" } : merge_core_dir_configs
...

去看源代码，发现这里确实对应一个函数指针：

conf_vector[i] = (*df)(p, base_vector[i], new_vector[i]);

使用前面的 llvm pass 跑一下所有间接调用发生的位置，观察 anderson 的推理结果（是否有忽略的函数指针）：

config.c,85,977
config.c,88,963
config.c,93,1014
config.c,98,945
config.c,102,996
config.c,160,960
config.c,165,928
config.c,169,870
...

发现 llvm pass 识别出了一些特殊形式的东西，比如对于宏定义：

AP_IMPLEMENT_HOOK_RUN_ALL(int, header_parser,
                          (request_rec *r), (r), OK, DECLINED)

这个宏定义会被展开成一个包含间接调用的函数，anderson 静态分析没有实际做这个分析，暂时不知道是因为宏的问题还是 array 的问题。

===

posted @ 2025-07-15 19:23 sysss 阅读(35) 评论(0) 收藏举报

刷新页面返回顶部

sysss-blogs

使用 SVF 分析某个开源项目

wLLVM

SVF 编译时终端 `exited with code 1`

使用 SVF 分析一个简单 demo

写一个 `IndirectCallPass`

公告

sysss-blogs

使用 SVF 分析某个开源项目

wLLVM

SVF 编译时终端 exited with code 1

使用 SVF 分析一个简单 demo

写一个 IndirectCallPass

公告

SVF 编译时终端 `exited with code 1`

写一个 `IndirectCallPass`