Definition of IM

The IM problem is characterized by the triple \((\mathcal{G}, \mathcal{C}, \mathcal{D})\) where \(\mathcal{G}\) is a directed graph encoding the topology of the social network, \(\mathcal{C}\) is the collection of feasible seed sets, and \(\mathcal{D}\) is the underlying diffusion model. Specifically, \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\), where \(\mathcal{V} = \{1, 2,...,n\}\) and \(\mathcal{E}\) are the node and edge sets of \(\mathcal{G}\), with cardinalities \(n = |\mathcal{V}|\) and \(m = |\mathcal{E}|\), respectively. The collection of feasible seed sets \(\mathcal{E}\) is determined by a cardinality constraint on the sets and possibly some combinatorial constraints (e.g. matroid constraints) that rule out some subsets of \(\mathcal{V}\). This implies that \(\mathcal{C} \subseteq \{\mathcal{V} : |\mathcal{S}| \leq K\}\), for some \(K \leq n\). The diffusion model \(\mathcal{D}\) specifies the stochastic process under which influence is propagated across the social network once a seed set \(\mathcal{S} \in \mathcal{C}\) is selected. Without loss of generality, we assume that all stochasticity in \(\mathcal{D}\) is encoded in a random vector \(\boldsymbol{w}\), referred to as the diffusion random vector. Note that throughout this paper, we denote vectors in bold case. We assume that each diffusion has a corresponding \(\boldsymbol{w}\) sampled independently from an underlying probability distribution \(\mathbb{R}\) specific to the diffusion model. For the widelyused models IC and LT, \(\boldsymbol{w}\) is an \(m\)-dimensional binary vector encoding edge activations for all the edges in \(\mathcal{E}\), and \(\mathbb{P}\) is parametrized by \(m\) influence probabilities, one for each edge. Once w is sampled, we use \(\mathcal{D}(\boldsymbol{w})\) to refer to the particular realization of the diffusion model \(\mathcal{D}\). Note that by definition, \(\mathcal{D}(\boldsymbol{w})\) is deterministic, conditioned on \(\boldsymbol{w}\).

Given the above definitions, an IM attempt can be described as: the marketer first chooses a seed set \(\mathcal{S} \in \mathcal{C}\) and then nature independently samples a diffusion random vector \(\boldsymbol{w} \sim \mathbb{P}\). Note that the influenced nodes in the diffusion are completely determined by \(\mathcal{S}\) and \(\mathcal{D}(\boldsymbol{w})\). We use the indicator \(\mathbb{1}(\mathcal{S}, v, \mathcal{D}(\boldsymbol{w})) \in \{0, 1\}\) to denote if the node \(v\) is influenced under the seed set \(\mathcal{S}\) and the particular realization \(\mathcal{D}(\boldsymbol{w})\). For a given \((\mathcal{G}, \mathcal{D})\), once a seed set \(\mathcal{S} \subseteq \mathcal{C}\) is chosen, for each node \(v \in \mathcal{V}\), we use \(F(\mathcal{S}, v)\) to denote the probability that \(v\) is influenced under the seed set \(\mathcal{S}\), i.e.,

\begin{equation}
F(\mathcal{S}, v) = \mathbb{E} [\mathbb{1} (\mathcal{S}, v, \mathcal{D}(\boldsymbol{w}))| \mathcal{S}]
\end{equation}

where the expectation is over all possible realizations \(D(w)\). We denote by F(S) = P
v2V F(S, v), the expected number of nodes that are influenced when the seed
set S is chosen. The aim of the IM problem is to maximize F(S) subject to the constraint S 2 C, i.e., to find
S⇤ 2 arg maxS2C F(S). Although IM is an NP-hard problem in general, under common diffusion models such as IC
and LT, the objective function F(S) is monotone and submodular, and thus, a near-optimal solution can be computed
in polynomial time using a greedy algorithm (Nemhauser
et al., 1978).

Model-Independent Online Learning for Influence Maximization

posted @ 2023-05-10 18:34  X1OO  阅读(2)  评论(0)    收藏  举报