TIM 原始Introduction

% In social networks, cascade models the word-of-mouth effect that users adopt certain products, take up some opinions or receive certain information due to the influence of their friends. Given a social network \(G\) with \(n\) nodes and \(m\) edges, a positive integer \(k\), and a cascade model \(C\), the influence maximization (IM) problem \cite{kempe2003maximizing} asks for \(k\) nodes in \(G\) that can infect the largest number of nodes in cascade model \(C\). IM finds important applications in viral marketing, a marketing strategy that a company provides their product freely to a few influential users in social networks, in the hope that they will recommend the product to their friends.

In social networking platforms, cascade models simulate the “word-of-mouth” dynamics by which users adopt specific products, adopt certain viewpoints, or become aware of particular information through their social connections. Given a social network \(G\) consisting of \(n\) nodes and \(m\) edges, along with a positive integer \(k\) and a specified cascade model \(C\), the influence Maximization (IM) problem, as described by Kempe et al.\cite{kempe2003maximizing}, seeks to identify \(k\) nodes within \(G\) that can “influence” the maximum number of other nodes under the rules of cascade model \(C\). This concept of IM is particularly relevant in the realm of viral marketing. In this marketing strategy, a company offers its product for free to a select group of influential individuals within social networks, with the expectation that these individuals will endorse the product to their social circles, thereby expanding its reach and adoption.

Viral marketing is one of the key applications of influence maximization. In viral marketing, an item that a marketer wants to promote is diffused into social networks by “word-of-mouth” communication. From the perspective of marketing, influence maximization provides how to get maximum profit from all the users in a social network through viral marketing. However, influence maximization is not always the most effective strategy for viral marketing, because there can be some items that are useful to only targeted users. These targeted users can be a few people with common interest in a given item.
% For example, consider a marketer that is asked to promote a cosmetic product for women through viral marketing. For the cosmetic product, the specific users are female users who are likely to use it and male users who wish to purchase it as a gift for female users. In this case, the marketer does not need to be concerned about the other users because the cosmetic product is not useful to them. Instead, it is a better strategy to focus on maximizing the number of influenced specific users, but influence maximization has the weakness that it cannot distinguish them from the other users. The only way of handing such targets with influence maximization is making a homogeneous graph with targets and executing influence maximization on the graph. However, the result of this approach should be inaccurate, because there can be some users who are not targets but can strongly influence the targets.
For example, consider an online advertising platform called “AdMax” that serves targeted advertisements to users across various websites and mobile apps. AdMax aims to maximize the impact of these advertisements by ensuring that they are shown to the right audience at the right time. Consider that the task of targeted advertisement can be framed as an instance of the targeted influence maximization (TIM) problem. Generally, an advertiser or a social platform receives frequent requests from different advertisement instances to find the corresponding seed users. However, the process of solving the influence maximization problem for each new TIM instance is computationally intensive. An efficient and effective methodology is necessary for the TIM problem.
% 假定一个广告实例对应一组特定的用户。一个广告商或社交平台会频繁接收不同广告实例的请求,分别找出对应的关键用户。
% 一个高效和有效的方法对于目标影响力最大化问题是十分必要的。
% 目前,已经有大量的关于TIM及其变体问题被探索。...。这些方法除了解决特定情景下的TIM外,都采用了

% Social influence among individuals plays an immense role in decision making and information acquisition, and the rise of online social networks has empowered it to spread out at a tremendous scale. Understanding, predicting, and con- trolling social influence and its diffusion have become a big field of research called computational social influence [11]. Among the most actively studied algorithmic problems in this field is the influence maximization problem [35, 36], initially motivated by viral marketing [21]. Conceptually, in- fluence maximization involves identifying a small number of seed individuals in the network who can maximize the spread of influence. Kempe, Kleinberg, and Tardos [35] in 2003 formulated influence maximization as a combinatorial optimization problem on graphs, and their framework has been broadly accepted in the research community as well as database [17, 19, 23, 24].
% Following this line of work, variant of the IM problem has been investigated recently, such as targeted IM [18], [24].

% Most existing influence maximization solutions assume the ac- tivation probability is known beforehand. However, in many real- world social networks, this information is not observable. Solutions have been proposed to estimate the activation probabilities from a set of cascades which consist of logged actions by the network users in the past [4, 13, 23, 27]. However, as the observations are independently collected from the learning algorithms, bias might be introduced by the logging mechanism [11, 21]. To combat with biases in offline influence estimation, Aral and Walker [1] proposed randomized online experiments for offline data collection. But it is usually very expensive to carry out such experiments. This moti- vates the studies of online influence maximization [9, 10, 32, 33, 38], in which seed nodes are purposely selected by a learning agent to improve its quality in influence estimation and influencer selection on the fly. The foundation of this line of solutions is the combi- natorial bandits [9], in which a set of arms are pulled together at each round, and the outcome is only revealed as a whole over the set of pulled arms. Mapping it back to the influence maximization problem, each node in the network is considered as an arm, and at each round the received reward on the selected set of seed nodes is the number of their activated nodes.
% Most existing influence maximization solutions is not always the most effective strategy for viral marketing, because there can be some items that are useful to only specific users. These specific users can be a few people with a common interest in a given item, some or all people in a community, or some or all users in a class. To meet the requirement of targeted advertisements, the problem of targeted IM has been studied in [18], [24]. Both works aim to find a set of seed users whose social influence can cover the maximum number of “target” users.

% 第二段 原1
% Next, the other is efficiency. In the real world, there are lots of users who want to promote many items for various purposes using online social networks. Since IMAX query processing can be a breakthrough to promote an item effectively for those users, the number of potential users of IMAX query processing can be very large. It means that efficiency is a very critical issue for IMAX query processing. However, IMAX query processing is NP-hard like the influence maximization problem. Since the submodularity in the influence maximization problem is preserved in IMAX query processing, several techniques utilized for influence maximization can still be used for IMAX query processing. However, they are inefficient to process the IMAX query. In contrast to influence maximization, we know target nodes that we want to influence when an IMAX query is given. It means that an efficient method for an IMAX query should identify quickly the nodes that strongly influence the targets of the query with preprocessed data. Since existing methods for the influence maximization problem do not utilize the nature of query processing, we need to give attention to query processing to develop a new efficient method for IMAX query processing.
% 第二段 原2
% However, all of the proposed techniques suffer from the efficiency issue. Their models require offline training of the propagation probability w.r.t. different topics, which is not scalable to the graph size and number of topics. The most recent work [3] was reported to handle a graph with 4 million vertices and 10 topics only. In addition, the proposed solutions are all heuristic and none of them provides theoretical guarantee on the quality of the results.
% 第二段 原3
% There are numerous applications where the submodular meta-learning framework can be applied to find a personalized solution for each task while significantly reducing the computation load. In general, most recommendation tasks can be cast as an instance of this setting [6–8]. Consider the task of recommending a set of items, e.g., products, locations, ads, to a set of users. One approach for solving such a problem is to find the subset of items that have the highest score over all the previously-visited users and recommend that subset to a new user. Indeed, this approach leads to a reasonable performance at test time; however, it does not provide a user-specific solution for a new user. Another approach is to find the whole subset at the test time when the new user arrives. In contrast to the previous approach, this scheme leads to a user-specific solution, but at the cost of running a computationally expensive algorithm to select all the elements at the test time.
% 第二段 原4
% Due to the NP-hardness of TIM, we focus on processing it approximately with theoretical bounds. Leveraging the monotonicity and submodularity of influence functions, a naive greedy algorithm [27] can provide a (1 − 1/e) approximate solution for TIM. However, the greedy algorithm requires O(k · |U|) (|U| is the number of users in the network) influence function evaluations for each update.

% Most existing target influence maximization solutions is not always the most efficiency strategy for viral marketing. In the real world, there are lots of users who want to promote many items for various purposes using online social networks. Since IM query processing can be a breakthrough to promote an item effectively for those users, the number of potential users of IM query processing can be very large. It means that efficiency is a very critical issue for IM query processing. Due to the NP-hardness of IM query processing, we focus on processing it approximately with theoretical bounds. Leveraging the monotonicity and submodularity of influence functions, a naive greedy algorithm [27] can provide a \((1 - 1/e)\) approximate solution for TIM. However, the greedy algorithm requires \(O(k \cdot |U|\)) (\(|U|\) is the number of users in the network) influence function evaluations for each update. So they are inefficient to process the IM query. In contrast to influence maximization, we know target nodes that we want to influence when an IM query is given. It means that an efficient method for an IM query should identify quickly the nodes that strongly influence the targets of the query with preprocessed data. Since existing methods for the influence maximization problem do not utilize the nature of query processing, we need to give attention to query processing to develop a new efficient method for IM query processing.

% However, all of the proposed techniques suffer from the efficiency issue. Their models require offline training of the propagation probability w.r.t. different topics, which is not scalable to the graph size and number of topics.
% Guo et al. \cite{} studies online query problem, and proposes an online local cascade algorithm, which is a proxy-based approach that only maintains shortest paths from each user to the target one. However, the proposed solutions are all heuristic and none of them provides theoretical guarantee on the quality of the results.
% As conducting online sampling cannot meet the real-time processing requirement, Li et al. \cite{} devise disk-based index structures to push the sampling procedure from online to offline. The idea is to build a sufficient number of \hl{RR} sets for each topic (e.g., music and book) offline. Then, given an online query, it selects RR sets from the query topics and merges the RR sets to compute the result. Moreover, they also introduce an incremental index structure to further reduce the I/O cost. \hl{However, compared to the expensive computational overhead of the iterative greedy algorithm, sampling the inverse reachable set offline does not significantly improve the speed of answering queries online.}

%《Adaptive Budget Allocation for Maximizing Influence of Advertisements》
% In [Alon et al., 2012], Alon et al. called this model by the source-side influence model to distinguish from another model they called the target-side influence model. Since we consider only the source-side influence model, we simply call it by the bipartite influence model. In the original definition, the costs for buying slots are not considered, i.e., c(v, i)=1 for all v 2 V and i 2 [b(v)]. Thus our definition is more general than the original one.

Following this line of work, TIM and variant problem has been investigated\cite{chen2015online,li2015real,song2016targeted,li2018holistic,cai2020target}. Their models require offline training of the propagation probability w.r.t. different topics. In addition to offline sampling, the greedy algorithm is still run online for the cardinal constraint maximum coverage problem. For each TIM instance, leveraging the monotonicity and submodularity of influence functions, a classical greedy algorithm \cite{nemhauser1978analysis} can provide a \((1-1/e)\) approximate solution.
\iffalse % However, the greedy algorithm requires \(O(k \cdot |U|)\) (\(|U|\) is the number of users in the network) influence function evaluations for each update.
\fi
However, the greedy algorithm using \(O(k |U|)\) (\(|U|\) is the number of users in the network) evaluations, and through \(k\) passes over the ground set.
So it is inefficient, because we need to make quick decisions to respond to new advertisement task in online advertising system.

% 如今AI领域的发展 Many applications in artificial intelligence necessitate exploiting prior data and experience to enhance quality and efficiency on new task. 这促使我们将 to extend the methodology of MAML to the TIM. Our goal is to pre-training with a good initial seed set that can be quickly adjusted to perform well over a wide range of new TIM instances.
Many applications in artificial intelligence necessitate exploiting prior data and experience to enhance quality and efficiency on new task\cite{thrun2012learning,finn2017model,bengio1990learning}. These motivate us to extend the methodology of AI to the TIM. our goal is to pre-training with a good initial seed set that can be quickly fine-tuned to perform well over a wide range of new TIM instances. Pre-training and fine-tuning framework can be applied to find a personalized solution for each TIM instance while significantly reducing the computation load. For instance, we can only afford to run \(\alpha k\) rounds of greedy at test phase, which has complexity \(O(\alpha k |U|)\), where \(\alpha \in (0,1)\) is small. A natural solution would be to find an appropriate set of \((1-\alpha)k\) seed users in the pre-training phase, and add the remaining \(\alpha k\) seed uses at test time when a new TIM instance arrives.

% To solve TIM, one approach is to find the subset of users that have the highest score over all the previously-visited TIM instances and \hl{recommend} that subset to a \hl{new user}. Indeed, this approach leads to a reasonable performance at \hl{test time}; however, it does not provide a instance-specific solution for a new instance. Another approach is to find the whole subset at the \hl{test time} when the TIM instance arrives. In contrast to the previous approach, this scheme leads to a instance-specific solution, but at the cost of running a computationally expensive algorithm to select all the seed at the \hl{test time}.

In our pre-train and fine-tune framework, the process of selecting set seed users to a TIM instances is done in two parts: In the first part, a set of users are selected offline according to prior experience. These are the most influential users to the previously-visited TIM instances. In the second part, which happens at the test stage, a set of users that is personalized to the coming TIM instance is selected. These are seed users that are computed specifically according to the features of the coming target. In this manner, the computation for each coming target would be reduced to the selection of the second part, which typically constitutes a small portion of the final set of seed users. The first part can be done offline with a lower frequency. In a real advertisement system, the first part can be computed once every hour, and the second part can be computed specifically for each coming TIM instance.
Our contributions. This paper makes the following contributions:

\begin{itemize}
% \item We propose a novel \hl{two stage submodular framework} where each task is equivalent to maximizing a set function under some cardinality constraint. Our framework aims at using prior data, i.e., previously visited queries, to train a proper initial seed set that can be quickly adapted to a new query at a low computational cost to obtain a query-specific seed set.
\item We propose a novel pre-training and fine-tuning framework for pre-trained TIM where each instance is equivalent to maximizing a set function under budget constraint. Our framework aims at using previously visited instances to pre-train a proper initial seed set that can be quickly adapted to a new instance at a low computational cost to obtain a instance-specific seed set.
\item We present computationally efficient deterministic and probabilistic algorithm to solve the resulting pre-trained TIM problem. We prove that the solution obtained by the deterministic algorithm is at least \(1/2\)-optimal and the solution of the probabilistic algorithm is \((1-1/e - o(1))\)-optimal in expectation.
\item We evaluate the performance on four real datasets. The experiment results confirm our theoretical findings, which chooses a large portion of the solution in the pre-training phase and a small portion adaptively at test phase, is very close to the solution obtained by choosing the entire solution at the test phase when a new instance is revealed.
\end{itemize}
% To meet the requirement of targeted advertisements, the problem of targeted IM has been studied in [18], [24]. Both works aim to find a set of seed users whose social influence can cover the maximum number of “target” users. Differently, Li et al. [18] define the target users w.r.t. a keyword query, i.e., the users mention some keywords in the query. A weighted reverse influence set sampling technique is developed to address the keyword based targeted IM problem. Song et al. [24] define the targeted influence spread that is constrained to an event location and a deadline of running the event, i.e., the users are targeted if they have a probability to check-in at this event location before the deadline. However, none of these methods investigate how users propagate their influence to other users via their spatial interactions together with social influence. Motivated by [11], we argue that a user may have a certain probability to influence his/her spatially-close users with similar interests. However, it may not be practical to assess user-to-user spatial influence impact only with spatial distance and interests.

% Recently, the amount of propagation of information is steadily increased in online social networks such as Facebook and Twitter. To use online social networks as a marketing platform, there are lots of research on how to use the propagation of influence for viral marketing. One of the research problems is influence maximization (IMAX), which aims to find \(k\) seed users to maximize the spread of influence among users in social networks. It is proved to be an NP-hard problem by Kempe et al. [1]. Since they proposed a greedy algorithm for the problem, many researchers have proposed various heuristic methods.

% Viral marketing is one of the key applications of influence maximization. In viral marketing, an item that a marketer wants to promote is diffused into social networks by “word-of-mouth” communication. From the perspective of marketing, influence maximization provides how to get the maximum profit from all the users in a social network through viral marketing. However, influence maximization is not always the most effective strategy for viral marketing, because there can be some items that are useful to only specific users. These specific users can be a few people with a common interest in a given item, some or all people in a community, or some or all users in a class. There is no limit for being specific users. \hl{For example, consider a marketer that is asked to promote a cosmetic product for women through viral marketing. For the cosmetic product, the specific users are female users who are likely to use it and male users who wish to purchase it as a gift for female users.} In this case, the marketer does not need to be concerned about the other users because the cosmetic product is not useful to them. Instead, it is a better strategy to focus on maximizing the number of influenced specific users, but influence maximization has the weakness that it cannot distinguish them from the other users. The only way of handling such targets with influence maximization is making a homogeneous graph with the targets and executing influence maximization on the graph. However, the result of this approach should be inaccurate, because there can be some users who are not targets but can strongly influence the targets.

% Based on the motivation for target-aware viral marketing, there is an earlier study which focuses on specific targets in influence maximization [2]. In [2], each user has several predefined labels before query processing, a query contains some labels to specify targets whom a marketer wants to influence. However, it is not flexible to predefine labels to each user before query processing, since a query for targets who do not share any existing label cannot be formulated. In this case, we should add a new label including those targets if we want to formulate the query. In addition, if we use a preprocessed structure to compute results quickly, we should update the structure when adding a new label, however the cost for updating is likely to be high. There is another research which can be applied to influence a specific part of a social network. Lu and Lakshmanan [3] devise a variation of influence maximization which separates being influenced and adopting an item for profit maximization. In their problem, if a user is influenced for an item, then the user adopts it with some probability. \hl{Thus, by setting the probability for a user who is not a target to adopt an item to \(0\), their problem can handle maximizing influence on specific targets.} However, it requires to check all users one hundred times when we have one hundred items associated with different sets of targets. It is apparently inefficient to check all users when we have many such items. As these two problems, there is no novel problem which processes maximizing influence on specific targets and has the flexibility to handle multiple items without additional costs.

% The IMAX query problem is worth receiving attention of researchers from two aspects. One is the suitability of IMAX query processing for target-aware viral marketing. As we explained, since the influence maximization problem cannot distinguish targets from the other users, it is not suitable for target-aware viral marketing. However, in the IMAX query problem, we can specify targets explicitly using a set and focus on maximizing influence on those targets. The formulation of the IMAX query problem is sufficient for modeling target-aware viral marketing in general purposes.

% Next, the other is efficiency. In the real world, there are lots of users who want to promote many items for various purposes using online social networks. Since IMAX query processing can be a breakthrough to promote an item effectively for those users, the number of potential users of IMAX query processing can be very large. It means that efficiency is a very critical issue for IMAX query processing. However, IMAX query processing is NP-hard like the influence maximization problem. Since the submodularity in the influence maximization problem is preserved in IMAX query processing, several techniques utilized for influence maximization can still be used for IMAX query processing. However, they are inefficient to process the IMAX query. In contrast to influence maximization, we know target nodes that we want to influence when an IMAX query is given. It means that an efficient method for an IMAX query should identify quickly the nodes that strongly influence the targets of the query with preprocessed data. Since existing methods for the influence maximization problem do not utilize the nature of query processing, we need to give attention to query processing to develop a new efficient method for IMAX query processing.

% In this paper, we propose a new efficient expectation model for the influence spread of a seed set based on independent maximum influence paths (IMIP) among users. We also show that the new objective function of the new expectation model is submodular. Based on the new expectation model, we present a method to efficiently process an IMAX query. The method consists of identifying local regions containing nodes that influence the target nodes of a query and approximating optimal seeds from the local regions as the result of the query. Identifying such local regions helps to reduce the processing time, when the number of targets in an IMAX query is small compared to the number of all nodes. To approximate optimal seeds, we use a greedy method based on the marginal gain to the new objective function. In addition, we present a method to incrementally update the marginal gain of each user to accelerate the greedy method.

% Our contributions. This paper makes the following contributions:

% \begin{itemize}
% \item We identify the limitations of existing researches related to maximizing influence on specific targets. We formulate an influence maximization problem as query processing without predefined labels to address the limitations. We prove that the problem is NP-hard and that the objective function of the IMAX query problem is submodular. Based on the submodularity of the objective function, we present a greedy algorithm for IMAX query processing and show that it has a ð1 1=eÞ approximation ratio.
% \item We propose a new efficient expectation model for influence spread of a seed set. We show that the new objective function of the expectation model is submodular. Based on the new expectation model, we propose a greedy-based approximation method to process an IMAX query with efficient incremental updating of the marginal gain of each user. We also propose an effective method to reduce the number of candidates for optimal seeds by identifying users who strongly influence targets from preprocessed data.
% \item We experimentally demonstrate that our identifying local influencing regions technique is very powerful and the proposed method is at least an order of magnitude faster than the comparison methods in most cases with high accuracy. Identifying local influencing regions makes the basic greedy algorithm about 6 times faster in the experiments.
% \end{itemize}

% The rest of this paper is organized as follows. In Section 2, we review related works. We formulate the IMAX query problem under the IC model in Section 3, and show the NP hardness and the submodularity of its objective function. In Section 4.1, since the exact computation of influence spread is so expensive, we develop a new expectation model for the influence spread. Then, we devise an efficient algorithm based on the expectation model to process the IMAX query in Section 4.2. We demonstrate the effectiveness and the efficiency of the proposed method through various experiments in Section 5. We make conclusions and outline future works in Section 6.

The rest of this paper is organized as follows. In Section 2, we review related works. We formulate the pre-trained targeted influence maximization problem in Section 3, and show the non-submodular of its objective function. In Section 4, we design greedy procedures with both deterministic and probabilistic orders and provide theorem guarantees. We demonstrate the effectiveness and the efficiency of the proposed method through various experiments in Section 5. We make conclusions in Section 6.

posted @ 2023-08-31 20:59  X1OO  阅读(22)  评论(0)    收藏  举报