x

## 留言板

Link prediction in microblog retweet network based on maximum entropy model

## Link prediction in microblog retweet network based on maximum entropy model

Li Yong-Jun, Yin Chao, Yu Hui, Liu Zun
PDF

(PLEASE TRANSLATE TO ENGLISH

BY GOOGLE TRANSLATE IF NEEDED.)

• #### Abstract

Microblog is a social media platform, based on the follower-followee relationship, that enables users to share real-time information, by which the information propagation is characterized as rapid, explosive, and immediate. The research on the information propagation and retweet prediction is very important for public sentiment analysis and product promotion. A majority of existing works adopt several traditional prediction methods to predict the future information retweet based on the features extracted from existing retweet behaviors, which are hard to reconcile accuracy, complexity, robustness and feature extensiveness. To overcome the above mentioned shortcomings in existing works, we propose in this paper a link prediction algorithm based on maximum entropy model to predict retweet behavior on microblog. In our proposed approach, firstly we abstract the retweet prediction problem to a link prediction problem. Then we analyze the retweet behaviors on microblog and determine the factors influencing the retweet behavior. We extract the features from the retweet behaviors based on these factors in the next step. Now based on these features, the retweet behavior could be predicted by the proposed approach. However, information redundancy and other issues may exist among these features. These issues will cause an increase in computational complexity or a decrease in computational accuracy. To solve the above problems, we selecte the features dominating the retweet behavior with feature selection methods such as Information Gain, IG-CHI. The proposed model requires no further independent assumption in features or intrinsic constraints, and omits the processing in relation to features, which is usually the prerequisite of other prediction methods. We take the Sina Weibo retweet records in a time span from 2009 to 2012 as an example to test the effectiveness and efficiency of our link prediction algorithm. Results show that: 1) the proposed algorithm has incomparable advantages in running time; 2) as for the predicted result, the proposed algorithm is better than other algorithms in performance evaluations; 3) the proposed algorithm runs stably for different sizes of training sets and feature sets; 4) the accuracy of the predicted results remains stable based on the selected features. The proposed approach avoids the independent restriction among features and shows better accuracy than other similar methods, thus it has reference values for resolving other prediction problems in complex networks.

#### Authors and contacts

###### Corresponding author: Li Yong-Jun, lyj@nwpu.edu.cn
• Funds: Project supported by the Shaanxi Provincial Natural Science Foundation, China (Grant Nos. 2014JM2-6104, 2015JM6290).

 [1] Watts D J, Strogatz S H 1998 Nature 393 440 [2] Barabsi A L, Albert R 1999 Science 286 509 [3] Pastor S R, Vespignani A 2001 Phys. Rev. Lett. 86 3200 [4] Wu T F, Zhou C L, Wang X H, Huang X X, Chen Z Q, Wang R B 2014 Acta Phys. Sin. 63 240501 (in Chinese) [吴腾飞, 周昌乐, 王小华, 黄孝喜, 谌志群, 王荣波 2014 物理学报 63 240501] [5] Wang J L, Liu F A, Zhu Z F 2015 Acta Phys. Sin. 64 050501 (in Chinese) [王金龙, 刘方爱, 朱振方 2015 物理学报 64 050501] [6] Wang Y Z, Zheng B H 2014 Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Beijing, China, Aug. 17-20, 2014, p285 [7] Zhao X Q, Tajima K 2014 Proceedings of 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies Warsaw, Poland, Aug. 11-14, 2014, p282 [8] Ding H Y, Wu J 2015 Proceedings of 2015 IEEE International Conference on Multimedia Big Data Beijing, China, Apr. 20-22, 2015, p56 [9] Luo Z L, Wang Y, Wu X T 2012 Proceedings of the 13th International Conference on Web Information System Engineering Paphos, Cyprus, Nov. 28-30, 2012, p777 [10] Yang Z, Guo J Y, Cai K K, Tang J, Li J Z, Zhang L, Su Z 2010 Proceedings of the 19th ACM conference on information and knowledge management Toronto, Canada, Oct. 26-30, 2010, p1633 [11] Peng H K, Zhu J, Piao D Z, Yan R, Zhang Y 2011 Proceedings of IEEE 11th International Conference on Data Mining Workshops Vancouver, Canada, Dec. 11, 2011, p336 [12] Zhao H D, Liu G, Shi C, Wu B 2014 Proceedings of 2014 IEEE International Conference on Data Mining Workshop Shenzhen, China, Dec. 14, 2014, p952 [13] Hou W, Huang Y, Zhang K 2015 Proceedings of IEEE 14th International Conference on Cognitive Informatics Cognitive Computing Beijing, China, Jul. 6-8, 2015, p255 [14] Huang D X, Zhou J, Mu D J, Yang F S 2014 Proceedings of 7th International Symposium on Computational Intelligence and Design Hangzhou, China, Dec. 13-14, 2014, p30 [15] Wu Y, Hu Y, He X H, Deng K 2014 Chin. Phys. B 23 060101 [16] Wang F, Wang H Y, Xu K 2012 Proceedings of IEEE ICDCS Workshop on Peer-to-Peer Computing and Online Social Networking Macau, China, Jun. 18-21 2012, p133 [17] Zhang L M, Pei J, Jia Y, Zhou B, Wang X 2014 Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Beijing, China, Aug. 17-20, 2014, p208 [18] Wu Z X, Liao J X, Zhang L J 2013 Proceedings of 5m th IEEE Conference on Broadband Network $Multimedia Technology Guilin, China, Nov. 17-19, 2013, p119 [19] Suh B, Hong L C, Pirolli P, Chi E D H 2010 Proceedings of The 2010 IEEE International Conference on Privacy, Security, Risk and Trust Minneapolis, USA, Aug. 20-22, 2010, p177 [20] Xu Z H, Yang Q 2012 Proceedings of 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Istanbul, Turkey, Aug. 26-29, 2012, p46 [21] Leicht E A, Holme P, Newman M E J 2006 Phys. Rev. E 73 026120 [22] Liben-Nowell D, Kleinberg J 2007 J. Am. Soc. Inf. Sci. Tec. 58 1019 [23] L L Y, Zhou T 2011 Physica A 390 1150 [24] Bai M, Hu K, Tang Y 2011 Chin. Phys. B 20 128902 [25] Brin S, Page L 1998 Comput. Netw. ISDN Syst. 30 107 [26] Lorrain F, White H C 1971 J. Math. Soc. 1 49 [27] Liu W P, L L Y 2010 Europhys. Lett. 89 58007 [28] Berger A L, Pietra S, Pietra V 1996 Comput. Linguist. 22 39 [29] Byrd R H, Nocedal J, Schnabel R B 1994 Math. Program.: Series A and B 63 4 #### Cited By •  [1] Watts D J, Strogatz S H 1998 Nature 393 440 [2] Barabsi A L, Albert R 1999 Science 286 509 [3] Pastor S R, Vespignani A 2001 Phys. Rev. Lett. 86 3200 [4] Wu T F, Zhou C L, Wang X H, Huang X X, Chen Z Q, Wang R B 2014 Acta Phys. Sin. 63 240501 (in Chinese) [吴腾飞, 周昌乐, 王小华, 黄孝喜, 谌志群, 王荣波 2014 物理学报 63 240501] [5] Wang J L, Liu F A, Zhu Z F 2015 Acta Phys. Sin. 64 050501 (in Chinese) [王金龙, 刘方爱, 朱振方 2015 物理学报 64 050501] [6] Wang Y Z, Zheng B H 2014 Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Beijing, China, Aug. 17-20, 2014, p285 [7] Zhao X Q, Tajima K 2014 Proceedings of 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies Warsaw, Poland, Aug. 11-14, 2014, p282 [8] Ding H Y, Wu J 2015 Proceedings of 2015 IEEE International Conference on Multimedia Big Data Beijing, China, Apr. 20-22, 2015, p56 [9] Luo Z L, Wang Y, Wu X T 2012 Proceedings of the 13th International Conference on Web Information System Engineering Paphos, Cyprus, Nov. 28-30, 2012, p777 [10] Yang Z, Guo J Y, Cai K K, Tang J, Li J Z, Zhang L, Su Z 2010 Proceedings of the 19th ACM conference on information and knowledge management Toronto, Canada, Oct. 26-30, 2010, p1633 [11] Peng H K, Zhu J, Piao D Z, Yan R, Zhang Y 2011 Proceedings of IEEE 11th International Conference on Data Mining Workshops Vancouver, Canada, Dec. 11, 2011, p336 [12] Zhao H D, Liu G, Shi C, Wu B 2014 Proceedings of 2014 IEEE International Conference on Data Mining Workshop Shenzhen, China, Dec. 14, 2014, p952 [13] Hou W, Huang Y, Zhang K 2015 Proceedings of IEEE 14th International Conference on Cognitive Informatics Cognitive Computing Beijing, China, Jul. 6-8, 2015, p255 [14] Huang D X, Zhou J, Mu D J, Yang F S 2014 Proceedings of 7th International Symposium on Computational Intelligence and Design Hangzhou, China, Dec. 13-14, 2014, p30 [15] Wu Y, Hu Y, He X H, Deng K 2014 Chin. Phys. B 23 060101 [16] Wang F, Wang H Y, Xu K 2012 Proceedings of IEEE ICDCS Workshop on Peer-to-Peer Computing and Online Social Networking Macau, China, Jun. 18-21 2012, p133 [17] Zhang L M, Pei J, Jia Y, Zhou B, Wang X 2014 Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Beijing, China, Aug. 17-20, 2014, p208 [18] Wu Z X, Liao J X, Zhang L J 2013 Proceedings of 5m th IEEE Conference on Broadband Network$ Multimedia Technology Guilin, China, Nov. 17-19, 2013, p119 [19] Suh B, Hong L C, Pirolli P, Chi E D H 2010 Proceedings of The 2010 IEEE International Conference on Privacy, Security, Risk and Trust Minneapolis, USA, Aug. 20-22, 2010, p177 [20] Xu Z H, Yang Q 2012 Proceedings of 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Istanbul, Turkey, Aug. 26-29, 2012, p46 [21] Leicht E A, Holme P, Newman M E J 2006 Phys. Rev. E 73 026120 [22] Liben-Nowell D, Kleinberg J 2007 J. Am. Soc. Inf. Sci. Tec. 58 1019 [23] L L Y, Zhou T 2011 Physica A 390 1150 [24] Bai M, Hu K, Tang Y 2011 Chin. Phys. B 20 128902 [25] Brin S, Page L 1998 Comput. Netw. ISDN Syst. 30 107 [26] Lorrain F, White H C 1971 J. Math. Soc. 1 49 [27] Liu W P, L L Y 2010 Europhys. Lett. 89 58007 [28] Berger A L, Pietra S, Pietra V 1996 Comput. Linguist. 22 39 [29] Byrd R H, Nocedal J, Schnabel R B 1994 Math. Program.: Series A and B 63 4
•  [1] Wu Teng-Fei, Zhou Chang-Le, Wang Xiao-Hua, Huang Xiao-Xi, Chen Zhi-Qun, Wang Rong-Bo. Microblog propagation network model based on mean-field theory. Acta Physica Sinica, 2014, 63(24): 240501. doi: 10.7498/aps.63.240501 [2] Tan Suo-Yi, Qi Ming-Ze, Wu Jun, Lu Xin. Link predictability of complex network from spectrum perspective. Acta Physica Sinica, 2020, 69(8): 088901. doi: 10.7498/aps.69.20191817 [3] Wang Ya-Qi, Wang Jing, Yang Hai-Bin. An evolution model of microblog user relationship networks based on complex network theory. Acta Physica Sinica, 2014, 63(20): 208902. doi: 10.7498/aps.63.208902 [4] Wang Bing-Hong, Zhou Tao, Wang Wen-Xu, Li Ji, Jiang Pin-Qun. Growing complex network model with acceleratingly increasing number of nodes. Acta Physica Sinica, 2006, 55(8): 4051-4057. doi: 10.7498/aps.55.4051 [5] Guo Jin-Li. The bilateral power-law distribution model of supply chain networks. Acta Physica Sinica, 2006, 55(8): 3916-3921. doi: 10.7498/aps.55.3916 [6] Kong Jiang-Tao, Huang Jian, Gong Jian-Xing, Li Er-Yu. Evaluation methods of node importance in undirected weighted networks based on complex network dynamics models. Acta Physica Sinica, 2018, 67(9): 098901. doi: 10.7498/aps.67.20172295 [7] Hu Qing-Cheng, Zhang Yong, Xu Xin-Hui, Xing Chun-Xiao, Chen Chi, Chen Xin-Hua. A new approach for influence maximization in complex networks. Acta Physica Sinica, 2015, 64(19): 190101. doi: 10.7498/aps.64.190101 [8] Chen Dan, Shi Dan-Dan, Pan Gui-Jun. Correlation between the electrical transport performance and the communicability sequence entropy in complex networks. Acta Physica Sinica, 2019, 68(11): 118901. doi: 10.7498/aps.68.20190230 [9] Xing Chang-Ming, Liu Fang-Ai. Research on the deterministic complex network model based on the Sierpinski network. Acta Physica Sinica, 2010, 59(3): 1608-1614. doi: 10.7498/aps.59.1608 [10] Ouyang Min, Fei Qi, Yu Ming-Hui. Estimation and improvement of disaster spreading models based on complex network. Acta Physica Sinica, 2008, 57(11): 6763-6770. doi: 10.7498/aps.57.6763 [11] Yuan Ming. A cascading failure model of complex network with hierarchy structure. Acta Physica Sinica, 2014, 63(22): 220501. doi: 10.7498/aps.63.220501 [12] Han Zhong-Ming, Chen Yan, Li Meng-Qi, Liu Wen, Yang Wei-Jie. An efficient node influence metric based on triangle in complex networks. Acta Physica Sinica, 2016, 65(16): 168901. doi: 10.7498/aps.65.168901 [13] Liang Yi, Wang Xing-Yuan. Pinning chaotic synchronization in complex networks on maximum eigenvalue of low order matrix. Acta Physica Sinica, 2012, 61(3): 038901. doi: 10.7498/aps.61.038901 [14] Liu Shu-Xin, Ji Xin-Sheng, Liu Cai-Xia, Guo Hong. A complex network evolution model for network growth promoted by information transmission. Acta Physica Sinica, 2014, 63(15): 158902. doi: 10.7498/aps.63.158902 [15] Song Qing-Song, Feng Zu-Ren, Li Ren-Hou. Multiple clusters echo state network for chaotic time series prediction. Acta Physica Sinica, 2009, 58(7): 5057-5064. doi: 10.7498/aps.58.5057 [16] Li Yong-Jun, Liu Zun, Yu Hui. Advisor-advisee relationship identification based on maximum entropy model. Acta Physica Sinica, 2013, 62(16): 168902. doi: 10.7498/aps.62.168902 [17] Cai Meng, Du Hai-Feng, Marcus W Feldman. A new network structure entropy based on maximum flow. Acta Physica Sinica, 2014, 63(6): 060504. doi: 10.7498/aps.63.060504 [18] Song Yu-Rong, Jiang Guo-Ping. Epidemic-spreading model for networks with different anti-attack abilities of nodes and nonuniform transmission of edges. Acta Physica Sinica, 2010, 59(11): 7546-7551. doi: 10.7498/aps.59.7546 [19] Deng Qi-Xiang, Jia Zhen, Xie Meng-Shu, Chen Yan-Fei. Study of directed networks-based Email virus propagation model and its concussion attractor. Acta Physica Sinica, 2013, 62(2): 020203. doi: 10.7498/aps.62.020203 [20] Xiong Fei, Liu Yun, Si Xia-Meng, Ding Fei. Network model with synchronously increasing nodes and edges based on Web 2.0. Acta Physica Sinica, 2010, 59(10): 6889-6895. doi: 10.7498/aps.59.6889
•  Citation:
##### Metrics
• Abstract views:  1150
• Cited By: 0
##### Publishing process
• Received Date:  23 June 2015
• Accepted Date:  20 October 2015
• Published Online:  20 January 2016

## Link prediction in microblog retweet network based on maximum entropy model

###### Corresponding author: Li Yong-Jun, lyj@nwpu.edu.cn;
• 1. School of Computer, Northwestern Polytechnical University, Xi'an 710072, China
Fund Project:  Project supported by the Shaanxi Provincial Natural Science Foundation, China (Grant Nos. 2014JM2-6104, 2015JM6290).

Abstract: Microblog is a social media platform, based on the follower-followee relationship, that enables users to share real-time information, by which the information propagation is characterized as rapid, explosive, and immediate. The research on the information propagation and retweet prediction is very important for public sentiment analysis and product promotion. A majority of existing works adopt several traditional prediction methods to predict the future information retweet based on the features extracted from existing retweet behaviors, which are hard to reconcile accuracy, complexity, robustness and feature extensiveness. To overcome the above mentioned shortcomings in existing works, we propose in this paper a link prediction algorithm based on maximum entropy model to predict retweet behavior on microblog. In our proposed approach, firstly we abstract the retweet prediction problem to a link prediction problem. Then we analyze the retweet behaviors on microblog and determine the factors influencing the retweet behavior. We extract the features from the retweet behaviors based on these factors in the next step. Now based on these features, the retweet behavior could be predicted by the proposed approach. However, information redundancy and other issues may exist among these features. These issues will cause an increase in computational complexity or a decrease in computational accuracy. To solve the above problems, we selecte the features dominating the retweet behavior with feature selection methods such as Information Gain, IG-CHI. The proposed model requires no further independent assumption in features or intrinsic constraints, and omits the processing in relation to features, which is usually the prerequisite of other prediction methods. We take the Sina Weibo retweet records in a time span from 2009 to 2012 as an example to test the effectiveness and efficiency of our link prediction algorithm. Results show that: 1) the proposed algorithm has incomparable advantages in running time; 2) as for the predicted result, the proposed algorithm is better than other algorithms in performance evaluations; 3) the proposed algorithm runs stably for different sizes of training sets and feature sets; 4) the accuracy of the predicted results remains stable based on the selected features. The proposed approach avoids the independent restriction among features and shows better accuracy than other similar methods, thus it has reference values for resolving other prediction problems in complex networks.

Reference (29)

/