文献学习——Discovering causal graphs with cycles and latent confounders: An exact branch-and-bound approach

International Journal of Approximate Reasoning

Volume 117, February 2020, Pages 29-49

译文：以基于约束的方式从多个数据集发现潜在混杂因素的因果结构。

与SAT结合的点在哪里？

Abstract

Understanding causal relationships is a central challenge in many research endeavours.

译文：理解因果关系是许多研究努力的中心挑战。

Recent research has shown the importance of accounting for feedback (cycles) and latent confounding variables, as they are prominently present in many data analysis settings.

译文：最近的研究显示了反馈(循环)和潜在混淆变量的重要性，因为它们显著地出现在许多数据分析设置中。

However, allowing for cycles and latent confounders makes the structure learning task especially challenging.

译文：然而，考虑到循环和潜在的混杂因素，使结构学习任务特别具有挑战性。

The constraint-based approach is able to learn causal graphs even over such general search spaces, but to obtain high accuracy, the conflicting (in)dependence information in sample data need to be resolved optimally.

译文：基于约束的方法能够在这样的一般搜索空间中学习因果图，但为了获得较高的精度，需要最优地解决样本数据中相互冲突的依赖信息。

In this work, we develop a new practical algorithmic approach to solve this computationally challenging combinatorial optimization problem.

译文：在这项工作中，我们发展了一个新的实用的算法方法来解决这个计算挑战性的组合优化问题。

While recent advances in exact algorithmic approaches for constraint-based causal discovery build upon off-the-shelf declarative optimization solvers, we propose a first specialized branch-and-bound style exact search algorithm.

译文：当基于约束的因果发现的精确算法方法的最新进展，建立在现成的声明式优化求解器上时，我们提出了第一个专门的分支和边界风格的精确搜索算法。

Our problem-oriented approach enables directly incorporating domain knowledge for developing a wider range of specialized search techniques for the problem, including problem-specific propagators and reasoning rules, and branching heuristics together with linear programming based bounding techniques, as well as directly incorporating different constraints on the search space, such as sparsity and acyclicity constraints.

译文：我们将直接面向问题的方法使开发更广泛的专业领域知识搜索的技术问题,包括问题特定的分布函数和推理规则,和基于分支启发式与线性规划的边界技术,以及直接将限制搜索空间不同,如稀疏和acyclicity约束。

We empirically evaluate our implementation of the approach, showing that it outperforms current state of art in exact constraint-based causal discovery on real-world instances.

译文：我们以经验评估了我们对该方法的实现，表明它在现实实例中基于约束的因果发现方面优于当前的技术状态。

Keywords

Graphical models、Structure learning、Causal discovery、Branch and bound、Optimization

1. Introduction

Discovering causal relations from sample data when allowing for latent confounding variables and feedback (that is, cycles) is a very challenging task in the field of graphical models and structure discovery. Although many features of causal structures can in principle be determined even from passive observation [36], [47], determining which structural features can be identified from finite sample data has proven difficult.

For general search spaces (allowing latent confounders and/or cycles), the constraint-based causal discovery approach is still applicable [47], [36]. Constraint-based learning algorithms combine (in)dependence constraints from statistical tests to find determined features of the underlying causal graph structure. However, most of such approaches, including the classical PC, CCD and FCI algorithms, scale up in terms of number of variables by selecting independence tests based on earlier test results [47], [41]. Such greedy strategies can lead to non-optimal accuracy in practice, as early mistakes in independence testing guide search towards inaccurate solutions [8], [22].

On the other hand, for restricted settings without latent confounders and cycles, that is, for Bayesian networks, exact score-based structure discovery algorithms have been developed [60], [2], [3]. A central motivation in developing efficient exact algorithms is that they output a guaranteed optimal solution without making compromises or approximations in their computation. Such provably globally optimal graphs have been shown to exhibit better accuracy [29]. However, much less progress has been made for exact discovery algorithms for more general search spaces that allow for latent confounders and cycles.

In the context of constraint-based discovery, it has been shown that better accuracy can be obtained when a predetermined, large set of tests are conducted before the actual search, and then, conflicting test results are resolved in an optimal way via exact methods [22], [28], [6]. However, the general search space with latent confounders and cycles induces a combinatorial optimization problem over a drastically larger search space compared to more restricted settings such as Bayesian network structures (DAGs). Furthermore, the objective functions considered are computationally more complicated to evaluate. Thus improvements to (exact) algorithms for the more general search spaces in terms of running time performance and scalability without trading off accuracy is a major challenge.

In this work, we take on the challenge of improving the scalability of practical exact algorithms for the general search space of causal graph allowing for latent confounding variables and cycles. Recently, there has been noticeable interest in developing algorithmic solutions to this general problem setting and its variants [55], [56], [23], [22], [28], [6], [61], [24]. The first exact approach to the problem we focus on here was proposed in [22], based on declaratively encoding the underlying optimization task as answer set programming (ASP) and applying an ASP solver to obtain provably optimal solutions to the problem. This approach was further refined as a maximum satisfiability (MaxSAT) based approach in [25], where domain-specific techniques were integrated to the extent possible to a MaxSAT solver, relying on a MaxSAT solver to solve the search problem starting with a declarative encoding of the problem. This resulted in the Dseptor system which currently represents the state of the art in terms of running time performance for the problem at hand. All in all, this line of work has so far focused on using declarative solving techniques, relying in terms of efficiency on generic off-the-shelf declarative methods such as Boolean satisfiability (SAT) [4] solvers and their extensions to Boolean optimization. While declarative methods offer flexibility and remove implementation-level burden of developing optimized search algorithms for the underlying combinatorial optimization tasks, in this work we explore the alternative of developing domain-specific search algorithms instead of directly relying on declarative solver to perform the search.

In this paper we propose a first specialized branch-and-bound style exact search algorithm for optimal causal graphs, allowing the presence of both cycles and latent confounding variables. Our problem-oriented view enables directly incorporating domain knowledge for a wider range of specialized search techniques, including problem-specific propagators, branching heuristics, and bounding techniques, as well as directly incorporating restrictions on the search space, such as sparsity and acyclicity constraints. In particular, we develop a branch-and-bound approach to directly search over the general search space, together with several different performance-improving search techniques. These include (i) a problem-specific branching heuristic, (ii) lower bounding techniques applicable during search based on problem-specific unsatisfiable cores and linear programming relaxations, (iii) optimized algorithms for evaluating the objective function of the problem—over exponentially many independence and dependence constraints—during search under partial solutions, and (iv) inference rules—with correctness proofs—for detecting which edges are irrelevant in terms of d-connectivity under a current partial solution. We provide an open-source implementation bcause of the approach, and empirically evaluate its performance on problem instances obtained from real-world datasets from several perspectives: (i) the marginal contribution of the different proposed search techniques, (ii) the impact of the scoring function used for obtaining constraint weights on the efficiency of the approach, and (iii) the efficiency of the approach with respect to current state of the art. In particular, we show that the proposed approach compares favourably with current state of the art in exact constraint-based causal discovery on real-world data sets with respect to running time performance.

This article considerably extends a preliminary version published at the PGM 2018 conference [40]. In particular, in this article we describe more effective, earlier unpublished techniques for efficient evaluation of the objective function and formalize further inference rules which allow for disregarding undecided edges under partial solutions during search, thereby further speeding up the overall search for an optimal causal graph. We have now implemented these new techniques in a new release version of the bcause system. Empirical results presented here have been obtained using this new version; compared to the version presented at PGM 2018, the additional techniques presented in this article have resulted in non-negligle running time improvements (obtaining up to 10x speed-up and 2x average speed-up) over the version of the system presented at PGM 2018. We have also considerably extended the empirical evaluation of the approach with earlier unpublished results: we present empirical data on the marginal contributions of the various search techniques implemented in bcause to the overall efficiency of the approach in practice, as well as a running time comparison with the earlier state-of-the-art Dseptor system [25]. In addition to these new technical contributions, we have considerably extended the discussion and included various examples for improved readability and self-containment.

The rest of this article is organized as follows. We begin by detailing the necessary background on causal discovery, including causal graphs with latent variables and cycles, the combinatorial optimization task of finding optimal causal graph, and approaches for obtaining well-defined objective function coefficients in terms of weights on the independence and dependence constraints (Section 2). We then continue with detailing the proposed branch-and-bound approach to optimal causal graphs and several efficiency-improving search techniques for the approach (Section 3). We present results from an extensive empirical evaluation of the approach in Section 4). Before conclusions, we discuss the connections of our contributions to related work (Section 5).

2. Constraint-based causal discovery

posted on 2020-08-15 16:31 海阔凭鱼跃越阅读(194) 评论(0) 收藏举报

刷新页面返回顶部