This book addresses the challenges of social network and social media analysis in terms of prediction and inference. The chapters collected here tackle these issues by proposing new analysis methods and by examining mining methods for the vast amount of social content produced. Social Networks (SNs) have become an integral part of our lives; they are used for leisure, business, government, medical, educational purposes and have attracted billions of users. The challenges that stem from this wide adoption of SNs are vast. These include generating realistic social network topologies, awareness of user activities, topic and trend generation, estimation of user attributes from their social content, and behavior detection. This text has applications to widely used platforms such as Twitter and Facebook and appeals to students, researchers, and professionals in the field.
Prediction and Inference from Social Networks and Social Media (Lecture Notes in Social Networks)
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory.[1] It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network analysis include social media networks,[2][3] meme spread,[4] information circulation,[5] friendship and acquaintance networks, peer learner networks[6][7][8], business networks, knowledge networks,[9][10] difficult working relationships,[11] collaboration graphs, kinship, disease transmission, and sexual relationships.[12][13] These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines. These visualizations provide a means of qualitatively assessing networks by varying the visual representation of their nodes and edges to reflect attributes of interest.[14]
Social network analysis has its theoretical roots in the work of early sociologists such as Georg Simmel and Émile Durkheim, who wrote about the importance of studying patterns of relationships that connect social actors. Social scientists have used the concept of "social networks" since early in the 20th century to connote complex sets of relationships between members of social systems at all scales, from interpersonal to international.[26]
Visual representation of social networks is important to understand the network data and convey the result of the analysis.[49] Numerous methods of visualization for data produced by social network analysis have been presented.[50][51][52] Many of the analytic software have modules for network visualization. Exploration of the data is done through displaying nodes and ties in various layouts, and attributing colors, size and other advanced properties to nodes. Visual representations of networks may be a powerful method for conveying complex information, but care should be taken in interpreting node and graph properties from visual displays alone, as they may misrepresent structural properties better captured through quantitative analyses.[53]
Signed graphs can be used to illustrate good and bad relationships between humans. A positive edge between two nodes denotes a positive relationship (friendship, alliance, dating) and a negative edge between two nodes denotes a negative relationship (hatred, anger). Signed social network graphs can be used to predict the future evolution of the graph. In signed social networks, there is the concept of "balanced" and "unbalanced" cycles. A balanced cycle is defined as a cycle where the product of all the signs are positive. According to balance theory, balanced graphs represent a group of people who are unlikely to change their opinions of the other people in the group. Unbalanced graphs represent a group of people who are very likely to change their opinions of the people in their group. For example, a group of 3 people (A, B, and C) where A and B have a positive relationship, B and C have a positive relationship, but C and A have a negative relationship is an unbalanced cycle. This group is very likely to morph into a balanced cycle, such as one where B only has a good relationship with A, and both A and B have a negative relationship with C. By using the concept of balanced and unbalanced cycles, the evolution of signed social network graphs can be predicted.[54]
Variables used to calculate an individual's SNP include but are not limited to: participation in Social Networking activities, group memberships, leadership roles, recognition, publication/editing/contributing to non-electronic media, publication/editing/contributing to electronic media (websites, blogs), and frequency of past distribution of information within their network. The acronym "SNP" and some of the first algorithms developed to quantify an individual's social networking potential were described in the white paper "Advertising Research is Changing" (Gerstley, 2003) See Viral Marketing.[58]
Social network analysis is used extensively in a wide range of applications and disciplines. Some common network analysis applications include data aggregation and mining, network propagation modeling, network modeling and sampling, user attribute and behavior analysis, community-maintained resource support, location-based interaction analysis, social sharing and filtering, recommender systems development, and link prediction and entity resolution.[61] In the private sector, businesses use social network analysis to support activities such as customer interaction and analysis, information system development analysis,[62] marketing, and business intelligence needs (see social media analytics). Some public sector uses include development of leader engagement strategies, analysis of individual and group engagement and media use, and community-based problem solving.
Social network analysis is also used in intelligence, counter-intelligence and law enforcement activities. This technique allows the analysts to map covert organizations such as an espionage ring, an organized crime family or a street gang. The National Security Agency (NSA) uses its electronic surveillance programs to generate the data needed to perform this type of analysis on terrorist cells and other networks deemed relevant to national security. The NSA looks up to three nodes deep during this network analysis.[63] After the initial mapping of the social network is complete, analysis is performed to determine the structure of the network and determine, for example, the leaders within the network.[64] This allows military or law enforcement assets to launch capture-or-kill decapitation attacks on the high-value targets in leadership positions to disrupt the functioning of the network.The NSA has been performing social network analysis on call detail records (CDRs), also known as metadata, since shortly after the September 11 attacks.[65][66]
Large textual corpora can be turned into networks and then analysed with the method of social network analysis. In these networks, the nodes are Social Actors, and the links are Actions. The extraction of these networks can be automated by using parsers. The resulting networks, which can contain thousands of nodes, are then analysed by using tools from network theory to identify the key actors, the key communities or parties, and general properties such as robustness or structural stability of the overall network, or centrality of certain nodes.[67] This automates the approach introduced by Quantitative Narrative Analysis,[68] whereby subject-verb-object triplets are identified with pairs of actors linked by an action, or pairs formed by actor-object.[69]
Another concept that has emerged from this connection between social network theory and the Internet is the concept of netocracy, where several authors have emerged studying the correlation between the extended use of online social networks, and changes in social power dynamics.[72]
Social network analysis has been applied to social media as a tool to understand behavior between individuals or organizations through their linkages on social media websites such as Twitter and Facebook.[73]
In the economics community, a plausible and widely supported belief is that actors strategically choose their relations to optimize their network positions in an incentive-guided fashion16. Similarly to SAOM, strategic network formation models assume that actors aim at maximizing payoff functions that depend on their position in the network and on the topology. The objective is to explain why certain network architectures emerge when actors strive for centrality, while links are costly. The literature on this topic is broad17,18,19,20,21,22,23,24,25,26,27 (see refs. 28,29,30,31 for extensive surveys), yet there is no common agreement on the specific centrality metrics32. Among the seminal works on strategic network formation, Bala and Goyal18 use degree centrality, while the connections model introduced by Jackson and Wolinsky17 is related to closeness centrality, as shown in ref. 22. Others19,20,21 propose models where actors strive for structural holes, which are missing connections between certain pairs of agents, thus brokerage opportunities. Burt showed that his brokerage constraint measure, defined in ref. 33, is tightly related to betweenness centrality34. According to Coleman35, triangulated structures provide cohesive support to the agents. Davis36 also showed empirically that transitivity, often termed network (or triadic) closure or clustering15,37, is a prevalent effect in many human social networks as the result of social selection based on, e.g., homophily38.
As discussed in the introduction, agents may privilege social support. Formalized in the network settings, agents benefit from being surrounded by closed triads, or in other words when a friend of a friend is a friend43. In graph theory, the mean probability that two nodes, which are network neighbors of the same other node, will themselves be neighbors is referred as clustering coefficient5. Albeit it might be hard for agents to compute such a probability, they can estimate, for each friend \(k\), the number of common friends \(l\). Similarly to the approach in ref. 23, we define the clustering of agent \(i\) as 2ff7e9595c
Comments