Social Analytic  Engine

KEG —— Tsinghua University


  Social Analytic Engine (SAE) is a novel for mining dynamic networks. Different from traditional social network analysis projects that focus on static networks, SAE is essentially designed for dynamic networks. The analysis results can benefit many applications in social networks. We empirically study and verify the presented social analytic engine on various real social networks. Our experimental results demonstrate the effectiveness of proposed core technologies in the analytic engine.

Random Factor Graph Model: it seamlessly incorporates the analysis results by the previous components for modeling and predicting the dynamics in the social network. The proposed model achieves much better prediction performance than traditional methods when testing on several typical prediction tasks in social networks. It also has a very good scalability performance.
Distributed Topic Model: we implement a distributed LDA to model content information of the given network by extracting hidden topics of the corpus.

Social Influence: it aims to quantify the influential strength between users from different angles (topics) in a large social network.
Structural Hole Detection: it recognizes structural hole spanners who control the information flow in the social network.
Social Tie Mining: it tries to reveal the fundamental factors that form the different types of social relationships.

Macro Characterization: it statistics the macro properties of the given network such as density, diameter, degree distribution, community partition, etc..
Micro Characterization: it statistics the micro properties of the given network such as centrality, homophily, reciprocity, prestige, reachability for specific nodes, etc..

Dynamic Network Storage: to efficiently store and manipulate dynamic networks, it only store the changes of networks in adjacent time stamps. We use the memory mapping technique to automatically load the network data that is needed for the computation into the memory and to swap the least frequently used data onto the disk.
Distributed Computation Engine: it provides a distributed platform to support computation scheduling on different machines and fast message passing between machines.

Coauthor: it contains 1,629,217 Users, 2,623,832 Relationships and 2,174,141 Messages.
Twitter: it contains 112,044 Users, 468,238 Relationships and 2,409,768 Messages.
Weibo: it contains 1,787,443 Users,423,347,905 Relationships and 1,038,775,431 Messages.
Slashdot: it contains 93,133 Users,964,562 Relationships and 8,714,700 Messages.
Email: it contains 151 Users, 3,572 Relationships and 136,329 Messages.
Mobile: it contains 106 Users, 5,436 Relationships and 16,807 Messages.