Similarities of Influencers across Different Social Media Platforms by Using Four Centrality Measures

. Searching for influencers among a social network is important because marketers can then use this information to conduct word-of-mouth (WOM) advertisement, which is an important marketing technique. Literature Review provides detailed information about WOM advertisement. There are many ways to search influencers and often they are network centrality measurements. This paper aims to investigate whether each centrality measurement could produce similar results across different social media platforms (eg. Facebook, Twitter, Instagram). The social network data used in this research is from Huawei Company. This research uses four centrality measurements and three set similarity methods to analysis the data. As a result, this paper draws a conclusion about the binary question "Does it provide similar results or not?". Since various companies and applications may have different standards and definitions about being similar, please also check similarity data provided in this paper.


Introduction
With the development of science and technology, social media(network) is becoming essential to people's daily lives. According to the data from Pew Research Center, about 81% of Americans use YouTube and 69% of Americans use Facebook, which means the social media usage is very high [1]. Social media does not only impact people's lives, it also brings challenges and opportunities to all walks of life. One of them is the marketing strategy. When a salesperson wants to promote a product, he/she always wants to reach the maximum number of people by using the minimum amount of resources. As a result, it is essential to find all the influencers in a social network since they can help with the advertisement by using their influence through social networks. This marketing technique is called word-of-mouth (WOM).
Then the problem comes with searching for influencers in a social network. Various methods and models are there to help with searching for influencers, and they will be introduced in the Literature Review section. Usually, centrality measures are used to find the social media influencers.
The following four centrality measures are investigated in this paper: • Degree Centrality • Closeness Centrality • Betweenness Centrality • Eigenvector Centrality Some researchers have compared and contrasted the above four centrality measures in many aspects. For example, they have investigated the performance, consistency and correlations among four centrality measures. However, this paper examines the similarity of influencers when a specific centrality measurement is applied to different social media platforms. The significance of this research is that the research results may save time for market analysts. Suppose we can prove that a centrality measure can produce similar influencers across different platforms. Then, marketers can just analyze one platform instead of different platforms since all platforms will contain similar influencers. This research is quantitative.

Literature review
2.1. Importance of social network influencers WOM marketing means that customers can promote products to their friends through dialogue or conversation [2]. In other words, customers can advertise products to other potential customers. According to a Harvard Business Review, the true valuable customers are those people who can bring new customers [3]. Thus, WOM is an important technique for marketing. Therefore, marketers should try to find influencers on a social network. Social network influencers are the proper individuals to start WOM marketing. Since they have a massive influence on a social network, they can better advertise the products and reach a greater range of potential customers. The following is a direct quote from a paper: "To succeed today, you need to connect with people who are at the center of the conversation… Specifically, you should make sure you are reaching the decision makers who are influential in others' decisions. Influentials are well connected, they have ties to a significantly larger number of groups than the average American." [4].

How to find social network influencers
There are many methods to discover social influencers among a social network, the following are some of them found in a paper [5]: 1. Centrality Measures (Specific information will be introduced below in section 2.3) 2. Link Topology Ranking Measures Some centrality measures (except eigenvector centrality measure) neglect the influence nodes. For example, if a node (node a ) connects to a really influential node (node b ) , then the importance of node a should be increased as well. Link topology ranking measures can solve this kind of problem. Some popular algorithms in this category are Hyper-Induced Topic Search algorithm and PageRank algorithm for web search. Another useful model for social influencers is diffusion model [5]. This kind of model is used for modelling the transmission of information among different nodes within a social network. For instance, they can model virus spreading in a pandemic or message spreading within a social network. This paper concentrates on the first measure of influences in a network ---Centrality Measures.

Centrality measurement
Many problems can be modelled as networks, such as human brains, city traffic, relationships among people, etc. A typical network is formed with two essential elements. The first one is nodes. A node can represent a biological neuron in human brains, a bus station in a city traffic network or an individual in a social network. The second one is edges which connect different nodes. An edge can represent the signal transmission between biological neurons, roads connecting different bus stations or relationships between different individuals in a social network. Since the network structure can vary in various ways, each node has a distinct influence on the whole network [6]. Centrality measure is a quantitative measurement of the importance of a node in a network. The following four sections give brief introductions to four types of centrality measures.

Degree centrality.
Degree Centrality is measured by counting the number of nodes connected to a specific node [7]. If the graph is directed, then the concept of in-degree centrality and out-degree centrality can be defined. In-degree centrality of a node means the number of edges coming into that node. The node's out-degree centrality implies the number of edges coming out of that node.
If the graph is undirected, then the Degree Centrality of node k is the following: If the graph is directed, then in-degree centrality is the following: If the graph is directed, then out-degree centrality is the following:

Closeness centrality.
Closeness centrality measures the sum distances from one node to the other nodes. Specifically, the distance means the shortest path from one node to other nodes. After calculation, if a node has a smaller sum distance to other nodes, then it is easier for this node to communicate with other nodes since the information does not have to travel far to reach other nodes [7]. Therefore, smaller sum distances mean higher closeness centrality. The Closeness Centrality of node k is the following: d kj means the shortest distance between node k and node j

Betweenness centrality
Betweenness centrality measures the importance of a node by checking the importance of this node as a bridge [7]. If a graph is connected, then the shortest path between any two nodes must exist. The betweenness centrality of a node is calculated by checking how many times this node acts as a bridge for communication between two other nodes. If a node is needed many times when the other two nodes want to communicate via the shortest path, then this node has higher betweenness centrality.
The Betweenness Centrality of node k is the following: σ st (k) means the number of shortest path between node s and node t that passes node k σ st means the number of shortest path between node s and node t

Eigenvector centrality.
Eigenvector centrality is an improvement of degree centrality. For degree centrality, it only considers the number of neighbour nodes connected to a particular node instead of considering the importance of neighbour nodes [8]. For example, if node i is only connected to node j and node j is connected to other 1000 nodes. Then from the point of degree centrality, node i has a very low centrality. However, since node i has a valuable neighbour node, it should be considered an important node. With eigenvector centrality, we can better evaluate the importance of a node. The Eigenvector Centrality of node k is the following:  [5]. They used centrality measures to select initial sets of individuals (influencers) and then monitored the messages diffusion level of different centrality measures. They found out that the performances of centrality measures depend on the structure of networks.

Correlation analysis of centrality measures.
Koschutzki and Schreiber investigated the correlation of different centrality measures based on protein-protein-interaction(PPI) network and transcriptional regulation (TR) network [9]. They are two biological networks. They found that the correlation between degree centrality and eigenvector centrality is higher compared with other centralities.

Research type
This research is a quantitative research. The research problem is that for each centrality measurement, does it produce similar influencers across different social media platforms? Different social media platforms can be Facebook, Twitter and Instagram. Since this paper investigated across different social media platforms, data about user relationships from different platforms were required. The controlled variable here was the individuals in the database, which means databases representing different social media platforms should contain the same group of people. For example, if D T and D F are two databases from two platforms, then D T and D F should contain identical individuals.

Data collection
An existing dataset from Kaggle was used.
This dataset was collected by crawling social media platforms i.e. Facebook, Twitter and Instagram Huawei pages. The web crawler mainly focused on Facebook posts, Twitter tweets and Instagram posts. This dataset is used by Huawei Company to enhance their business positions. Social media analytics helps Huawei organizations understand their targeted audience. I can use this dataset to create the following three networks: •  How to construct G T : 1. Add 1000 nodes to G T first.

If M T [i]
[j]=1, this means that person i and person j know each other. As a result, an edge was added between node i and node j in G T . How to construct G F : 3. Add 1000 nodes to G F first. 4. If M F [i][j]=1, this means that person i and person j know each other. As a result, an edge was added between node i and node j in G F . How to construct G I : 5. Add 1000 nodes to G I first. 6. If M I [i][j]=1, this means that person i and person j know each other. As a result, an edge was added between node i and node j in G I .

Calculate centrality Measurements and Find Influencers.
Four types of centrality measures were all calculated for each node from each social network. The following example uses degree centrality to illustrate: 1. Calculate degree centrality for each node from three social networks. 2. For each social network, we selected top 100 nodes which have the highest degree centrality values. After that, three sets of nodes were formed, and these three sets of nodes were named as S DT , S DF and S DI . The first letter in subscript represents degree centrality and the second letter represents different social media platforms. Repeat the above two steps for three other centrality measures. After that, the following sets existed: • Degree Centrality: S DT , S DF , S DI • Closeness Centrality: S CT , S CF , S CI • Betweenness Centrality: S BT , S BF , S BI • Eigenvector Centrality: S ET , S EF , S EI For the second step mentioned above, the experiment selected top 100 nodes and considered them as influencers. The experiment also selected top 200 and top 300 as influencers to do analysis.

Calculate similarities of influencers.
For each centrality measure, similarity of influencers from different social media was calculated. For example, similarities between (S DT , S DF ), (S DT , S DI ) and (S DF , S DI ) were calculated for degree centrality. Then, the experiment repeated the above process for other three centrality measures. The following are three ways to calculate the similarity between two sets: • Comment elements method: For two sets S 1 and S 2 such that |S 1 |=|S 2 |, count the number of elements e such that (e ∈ S 1 ∧ e ∈ S 2 ). More common elements means that two sets are more similar.
• Jaccard Index: For two sets A and B, Jaccard Index is the following: • Sørensen-Dice coefficient: For two sets A and B, Sørensen-Dice coefficient is the following:

Methodology justification
The experiment also selected the top 200 and top 300 nodes with the highest centrality measures to do the similarity analysis. The experiment was designed in this way because various companies or organizations may have different standards for social media influencers. As a result, in this way, the experiment can provide more data for companies to evaluate. Table 1 presents similarity data by choosing the top 100 nodes with the highest Degree centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.  Table 3 presents similarity data by choosing the top 300 nodes with the highest Degree centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.   Table 4 presents similarity data by choosing the top 100 nodes with the highest Betweenness centrality measures as influencers. Three pairs of comparisons were made between three social media platforms. Table 4. Similarity data of top 100 influencers by using Betweenness centrality.

Betweenness centrality
# Of common elements Jaccard Index Sørensen-Dice coefficient S BT VS S BI 10 5.26% 10% Table 5 presents similarity data by choosing the top 200 nodes with the highest Betweenness centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.  Table 6 presents similarity data by choosing the top 300 nodes with the highest Betweenness centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.  Table 7 presents similarity data by choosing the top 100 nodes with the highest Closeness centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.   Table 9 presents similarity data by choosing the top 300 nodes with the highest Closeness centrality measures as influencers. Three pairs of comparisons were made between three social media platforms.

Discussion
The results can indicate two key findings. The first finding is that for each type of centrality measure, it cannot produce similar influencers across different social media platforms. No matter how many individuals were chosen as influencers, the Jaccard indices are consistently below 20%, therefore, all centrality measures cannot produce similar influencers across different social media platforms. The second key finding is that as the number of selected influencers increases, similarity also increases. You can verify this from section 4. Other researchers have studied performance, consistency and correlation. This research results showed some data about the influencers similarity when applying a certain centrality measure to different social media.
The experiment results showed that all four types of centrality measures can not produce similar influencers across different social media platforms. This result may be caused by two reasons. The first one is that people may have different preferences for different social media. For example, an individual may spend more time on Facebook than on Twitter. As a result, he/she presents more influence on Facebook. The second reason is that the data size may not be big enough. In this research, the dataset only has relationships among 1000 people. The results may differ if the same methodology can be applied to a more extensive dataset.
For marketers, this research indicates that it is essential to analyze different social media platforms since different platforms do not contain similar influencers. However, as mentioned before, various companies and organizations may have different definitions of being similar. Therefore, it is also recommended to check out the results presented in the previous section.
For future researchers, the first improvement can be collecting more social media data and applying the same methodology to the more extensive dataset. Also, there are some other solutions to find social media influencers. Future researchers can apply the same methodology to those solutions to check if they can generate similar influencers across different social media platforms.

Conclusion
This research provides some experimental results about the similarity aspects when comparing four types of centrality measures and presents constructive advice for markers. Based on the experiment results, for all four centrality measures (degree centrality, closeness centrality, betweenness centrality and eigenvector centrality), they can not generate similar influencers across different social media platforms. The significance of this research is to show markers that it is necessary to analyze different social media platforms individually since different social media platforms contain different influencers.