# AB实验人群定向HTE模型1 - Causal Tree

### 论文

Athey, S., and Imbens, G. 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of
Sciences.

Rzepakowski, P. and Jaroszewicz, S., 2012. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2), pp.303-327

### 背景

Treatment effect之所以比通常的预测问题要更难解决，因为groud-truth在现实中是无法直接观测到的，一个人在同一时刻要么吃药要不么吃药，所以你永远无法知道吃药的人要是没吃药血压会不会也降低，或者没吃药的人要是吃了药血压会不会降低。

\begin{align} & {(X_i, Y_i,T_i): X_i \in X} \\ & \text{where X是特征，Y是Response，T是AB实验分组}\\ &T_i \in {0,1} \quad \\ &Y_i = \begin{cases} Y(0) & \quad T_i = 0\\ Y(1) & \quad T_i = 1\\ \end{cases}\\ &CATE: \tau(x) = E(Y_i(1)-Y_i(0)|X=x)\\ \end{align}

### 模型

\begin{align} &S_l = {(X_i, Y_i,T_i): X_i \in X_l} \quad \text{叶节点-局部样本}\\ &\hat{\mu_t}(S_l) = \frac{1}{N_{l,t}}\sum_{T_i=t, i \in S_l}Y_i \quad \text{AB组Y的均值} \\ &\hat{\tau}(S_l) = \hat{\mu_1}(S_l) -\hat{\mu_0}(S_l) \quad \text{叶节点CATE}\\ &F(S_l) = N_l * \hat{\tau}^2(S_l)\\ & \text{cost fucntion}: max \sum_{i=1}^L F(S_i)\\ \end{align}

### 模型优化

• cross-validation来确定树深度
• min_leaf, min_split_gain 用叶节点的最小样本量等参数来停止growth

• Honest approach
• Variance Penalty

Honest approach是把训练样本分成train和est两部分，用train来训练模型用est来给出每个叶节点的估计
Variance Penaly则是直接把叶节点的方差加到cost function中，最终的cost function如下：

$F(S_l) = N_l * \hat{\tau}^2(S_l) - N_l(\frac{Var(S_{l,1})}{p} + \frac{Var(S_{l,0})}{1-p}))$

### 离散outcome - uplift model

\begin{align} KL(P||Q) & = \sum_i{p_i log\frac{p_i}{q_i}} \\ ED(P,Q) &=\sum_i{(p_i - q_i)^2}\\ \chi^2(P,Q) &= \sum_i{\frac{(p_i-q_i)^2}{q_i}} \end{align}

posted @ 2019-10-21 10:22  风雨中的小七  阅读(1592)  评论(0编辑  收藏