-
量化复杂网络之间的结构相似性是网络科学中一个基本且具有挑战性的问题, 在医学、社会学等多个领域发挥了至关重要的作用. 传统的网络比较方法通常基于简单的结构特征, 例如节点度分布、最短路径长度等, 这些方法可能无法充分捕捉网络的全局结构信息, 导致得到的网络相似性不精准. 本文提出了一种基于高阶信息的网络相似性比较方法, 该方法同时考虑了网络的全局结构和局部结构. 具体而言, 通过构建网络节点的高阶聚类系数分布和节点间距离分布, 并利用基于这两个分布的Jensen-Shannon散度来量化网络之间的相似性. 实验结果表明, 相较于其他基线方法, 本文提出的方法不仅能高效地比较不同网络的相似性, 且在对真实网络进行扰动的过程中也表现出鲁棒性.Quantifying structural similarity between complex networks presents a fundamental and formidable challenge in network science, which plays a crucial role in various fields, such as bioinformatics, social science, and economics, and serves as an effective method for network classification, temporal network evolution, network generated model evaluation, etc. Traditional network comparison methods often rely on simplistic structural properties such as node degree and network distance. However, these methods only consider the local or global aspect of a network, leading to inaccuracies in network similarity assessments. In this study, we introduce a network similarity comparison method based on the high-order structure. This innovative approach takes into account the global and the local structure of a network, resulting in a more comprehensive and accurate quantification of the network difference. Specifically, we construct distributions of higher-order clustering coefficient and distance between nodes in a network. The Jensen-Shannon divergence, based on these two distributions, is used to quantitatively measure the similarity between two networks, offering a more refined and robust measure of network similarity. To validate the effectiveness of our proposed method, we conduct a series of comprehensive experiments on the artificial and the real-world network, spanning various domains and applications. By meticulously fine-tuning the parameters related to three different artificial network generation models, we systematically compare the performances of our method under various parameter settings in the same network. In addition, we generate four different network models with varying levels of randomization, creating a diverse set of test cases to evaluate the robustness and adaptability of the method. In artificial networks, we rigorously compare our proposed method with other baseline techniques, consistently demonstrating its superior accuracy and stability through experimental results; in real networks, we select datasets from diverse domains and confirm the reliability of our method by conducting extensive similarity assessments between real networks and their perturbed reconstructed counterparts. Furthermore, in real networks, the rigorous comparison between our method and null models underscores its robustness and stability across a broad spectrum of scenarios and applications. Finally, a meticulous sensitivity analysis of the parameters reveals that our method exhibits remarkable performance consistency across networks of different types, scales, and complexities.
[1] Gursoy A, Keskin O, Nussinov R 2008 Biochem. Soc. Trans. 36 1398Google Scholar
[2] Cheng X, Scherpen J M A 2021 Annu. Rev. Control Robot. Auton. Syst. 4 425Google Scholar
[3] Dorogovtsev S N, Mendes J F F 2002 Adv. Phys. 51 1079Google Scholar
[4] Goh K I, Cusick M E, Valle D, Childs B, Vidal M, Barabási A L 2007 Proc. Natl. Acad. Sci. USA 104 8685Google Scholar
[5] Liu C, Ma Y F, Zhao J, Nussinov R, Zhang Y C, Cheng F X, Zhang Z K 2020 Phys. Rep. 846 1Google Scholar
[6] Woolley S M, Posada D, Crandall K A 2008 PLoS One 3 e1913Google Scholar
[7] Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D U 2006 Phys. Rep. 424 175Google Scholar
[8] Orsini C, Dankulov M M, Colomer-de-Simón P, Jamakovic A, Mahadevan P, Vahdat A, Krioukov D 2015 Nat. Commun. 6 8627Google Scholar
[9] Tantardini M, Ieva F, Tajoli L, Piccardi C 2019 Sci. Rep. 9 17557Google Scholar
[10] 周漩, 张凤鸣, 李克武, 惠晓滨, 吴虎胜 2012 物理学报 61 190201Google Scholar
Zhou X, Zhang F M, Li K W, Hui X B, Wu H S 2012 Acta Phys. Sin. 61 190201Google Scholar
[11] 刘建国, 任卓明, 郭强, 汪秉宏 2013 物理学报 62 178901Google Scholar
Liu J G, Ren Z M, Guo Q, Wang B H 2013 Acta Phys. Sin. 62 178901Google Scholar
[12] Bracken C P, Scott H S, Goodall G J A 2016 Nat. Rev. Genet. 17 719Google Scholar
[13] Pržulj N 2007 Bioinformatics 23 e177Google Scholar
[14] 荣辉桂, 火生旭, 胡春华, 莫进侠 2014 通信学报 35 2
Rong H G, Huo S X, Hu C H, Mo J X 2014 J. Commun. 35 2
[15] Zemlyachenko V N, Korneenko N M, Tyshkevich R I 1985 J. Sov. Math. 29 1426Google Scholar
[16] Grohe M, Schweitzer P 2020 Commun. ACM 63 128Google Scholar
[17] Caetano T S, McAuley J J, Cheng L, Le Q V, Smola A J 2009 IEEE Trans. Pattern Anal. Mach. Intell. 31 1048Google Scholar
[18] Klau G W 2009 BMC Bioinf. 10 S59Google Scholar
[19] Lischka J, Karl H 2009 Proceedings of the 1st ACM Workshop on Virtualized Infrastructure Systems and Architectures Barcelona, 17 August, 2009 p81
[20] 杨博, 刘大有, 金弟, 马海宾 2009 软件学报 20 54Google Scholar
Yang B, Liu D Y, Jin D, Ma H B 2009 J. Softw. 20 54Google Scholar
[21] Aliakbary S, Motallebi S, Rashidian S, Habibi J, Movaghar A 2015 Chaos 25 023111Google Scholar
[22] 刘旭, 易东云 2011 自动化学报 37 1520
Liu X, Yi D Y 2011 Acta Anat. Sin. 37 1520
[23] Nascimento M C, De Carvalho A C 2011 Eur. J. Oper. Res. 211 221Google Scholar
[24] Wilson R C, Zhu P 2008 Pattern Recognit 41 2833Google Scholar
[25] Wang Z P, Zhan X X, Liu C, Zhang Z K 2022 iScience 25 104446Google Scholar
[26] 汪小帆, 刘亚冰 2009 电子科技大学学报 38 537
Wang X F, Liu Y B 2009 J. Univ. Electron. Sci. Technol. China. 38 537
[27] 吕琳媛 2010 电子科技大学学报 39 651
Lv L Y 2010 J. Univ. Electron. Sci. Technol. China. 39 651
[28] Koutra D, Vogelstein J T, Faloutsos C 2013 Proceedings of the 2013 SIAM International Conference on Data Mining ( SDM) Austin, May, 2013 p162
[29] De Domenico M, Biamonte J 2016 Phys. Rev. X 6 041062Google Scholar
[30] Schieber T A, Carpi L, Díaz-Guilera A, Pardalos P M, Masoller C, Ravetti M G 2017 Nat. Commun. 8 13928Google Scholar
[31] Chen D, Shi D D, Qin M, Xu S M, Pan G J 2018 Phys. Rev. E 98 012319Google Scholar
[32] Liu Q, Dong Z, Wang E 2018 Sci. Rep. 8 5134Google Scholar
[33] 邓小龙, 王柏, 吴斌, 杨胜琦 2012 计算机研究与发展 49 725
Deng X L, Wang B, Wu B, Yang S Q 2012 J. Comput. Res. Dev. 49 725
[34] Menéndez M L, Pardo J A, Pardo L 1997 J. Franklin Inst. 334 307Google Scholar
[35] Fronczak A, Hołyst J A, Jedynak M, Sienkiewicz J 2002 Physica A 316 688Google Scholar
[36] 王林, 戴冠中 2005 科技导报 23 62
Wang L, Dai G Z 2005 Sci. & Tech. Rev. 23 62
[37] Zager L A, Verghese G C 2008 Appl. Math. Lett. 21 86Google Scholar
[38] Sarajlić A, Malod-Dognin N, Yaveroğlu Ö N, Pržulj N 2016 Sci. Rep. 6 35098Google Scholar
[39] Wang L, Egorova E K, Mokryakov A V 2018 J. Comput. Syst. Sci. Int. 57 109Google Scholar
[40] Holme P, Saramäki J 2012 Phys. Rep. 519 97Google Scholar
-
图 1 网络高阶聚类系数计算示意图 (a)节点$ v_1 $及它的邻居形成的网络; (b)去除节点$ v_1 $后的网络; (c)图1(b)所示网络中节点$ v_1 $的邻居之间的距离矩阵; (d)节点$ v_1 $的高阶聚类系数分布
Fig. 1. Illustration of the calculation of the higher-order clustering coefficient: (a) A network formed by node $ v_1 $ and its neighbors; (b) network after removing node $ v_1 $; (c) distance matrix between neighbors of node $ v_1 $ in the network shown in panel (b); (d) the higher-order clustering coefficient distribution of node $ v_1 $.
图 2 基于高阶信息的网络比较方法计算流程示意图 (a)给定两个拥有 11个节点的网络G和$ G' $, 其中G有 14条边, $ G' $有12条边; (b)如何计算基于高阶信息的网络相似性的示例, 包含了节点高阶聚类系数分布和节点距离分布; (c)网络相似值的计算, 其中$ \beta = 0.5 $
Fig. 2. Schematic diagram of calculation flow of network comparison method based on high-order information: (a) Given two networks G and $ G' $ with 11 nodes, G has 14 edges and $ G' $ has 12 edges; (b) an illustration of how to compute the network similarity based on higher-order information, including the distribution of node higher-order clustering coefficients and node distance distribution; (c) calculation of the network similarity value $ D_{{\mathrm{HC}}} $, where $ \beta = 0.5 $.
图 3 人工合成网络下(WS, BA)的参数敏感性分析 (a)不同参数β下$ N = 1000 $的WS网络与$ N=[1500, 5000] $, 间隔为500, 重连概率$ p=0.3 $的WS网络之间的相似性; (b)不同参数β下$ N = 1000 $的BA网络与$ N=[1500, 5000] $, 间隔为500的BA网络之间的相似性, 其中每个BA网络每一步加边数$ m=5 $; (c)不同参数γ下WS网络之间的相似性, 参数与(a)图一样; (d)不同参数γ下BA网络之间的相似性, 参数与(b)图一样. 所有的结果均基于100次实验的平均值
Fig. 3. Parameter sensitivity analysis of synthetic networks generated by the WS and BA model: (a) Similarity between the WS network of $ N = 1000 $ and the WS networks of $ N=[1500, 5000] $ with the interval is 500 under different parameters β, where the probability of rewiring $ p=0.3 $; (b) similarity between the BA network of $ N = 1000 $ and the BA networks of $ N=[1500, 5000] $ with an interval of 500 under different parameters β, where each BA network adds edges at each step with number of $ m=5 $; (c) similarity between WS networks under different parameters γ, the parameters are the same as with those in panel (a); (d) similarity between BA networks under different parameters γ, the parameters are the same as those in panel (b). All results are based on an average of 100 realizations.
图 4 四种相似性方法在人工合成网络上的效果评估(网络规模均为$ N=1000 $) (a)—(d)不同重连概率p下ER模型生成的每对网络的相似性, 其中相似性方法分别为$ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $以及$ D_{{{M}}} $; (e)—(h)不同重连概率p, 平均度为10下WS模型生成的每对网络的相似性, 其中相似性方法分别为$ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $以及$ D_{{{M}}} $; (i)—(l)不同加边数$ m\in\{2, 3, 4, 5, 6\} $下BA模型生成的每对网络的相似性, 其中相似性方法分别为$ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $以及$ D_{{{M}}} $. 所有的结果均基于100次实验的平均值
Fig. 4. Effectiveness of four similarity methods in comparing synthetic networks. The network size is set to $ N=1000 $: (a)–(d) Similarity between each pair of networks generated by the ER model under different rewiring probabilities p, where the network comparing methods are $ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $ and $ D_{{{M}}} $; (e)–(h) similarity between each pair of networks generated by the WS model with different rewiring probabilities p and an average degree of 10, where the network comparing methods are $ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $ and $ D_{{{M}}} $; (i)–(l) similarity between each pair of networks generated by the BA model under different edge numbers $ m\in\{2, 3, 4, 5, 6\} $ added at each time step, where the similarity methods are $ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $ and $ D_{{{M}}} $. All results are based on an average of 100 realizations.
图 5 分别使用$ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $和$ D_{{{M}}} $这4种方法对4种人工合成网络(K-regular, WSC, WSK和BA)进行相互比较. 所有的结果均基于100次实验的平均值
Fig. 5. Comparison of the four synthetic networks, i.e., K-regular, WSC, WSK, and BA, by using four methods of $ D_{{\mathrm{HC}}} $, $ D_{{\mathrm{SP}}} $, $ D_{{{C}}} $ and $ D_{{{M}}} $. All results are based on an average of 100 realizations.
图 6 真实网络与其零模型生成的网络相似性. 考虑了具有不同k值(1.0, 2.0和2.5)的$ Dk $零模型, 图中的值表示$ D_{{\mathrm{HC}}} $的值的大小. 所有的结果均基于100次实验的平均值
Fig. 6. Similarity between real networks and their null-models. We considered the $ Dk $ null model with different values k (1.0, 2.0, and 2.5), and the values in the figure indicate the value of $ D_{{\mathrm{HC}}} $. All results are based on an average of 100 realizations.
图 7 原始真实网络和经过扰动后生成的网络之间的相似性, 其中f的负值对应于给定比例的边的随机删除过程, 正值表示随机增边的过程. 所有的结果均基于100次实验的平均值.
Fig. 7. Similarity between the original real network and the network after perturbation, where negative values of f correspond to the deletion of $ |f| $ fraction of edges and positive values of f indicate the addition of f fraction of edges. All results are based on an average of 100 realizations.
表 1 真实网络的拓扑结构性质, 其中N为节点数, $ |E| $为边数, $ {\mathrm{Ad}} $为平均度, $ {\mathrm{Avl }}$为平均路径长度, $ {\mathrm{Ld}} $为网络密度, C为聚类系数, d为直径
Table 1. Topology properties of the real networks, where N is the number of nodes, $ |E| $ is the number of edges, ${\mathrm{ Ad }}$ is the average degree, $ {\mathrm{Avl }}$ is the average path length, $ {\mathrm{Ld }}$ is the network link density, and C is the clustering coefficient, and d is the diameter.
Networks N $ |E| $ $ {\mathrm{Ad}} $ $ {\mathrm{Avl}} $ $ {\mathrm{Ld}} $ C d Chesapeake 39 170 8.72 1.83 0.2294 0.450 3 Windsurfers 43 336 15.63 1.69 0.3721 0.653 3 Contiguous 49 107 4.37 4.16 0.0910 0.497 11 Jazz 198 2742 27.69 2.24 0.1406 0.617 6 Infectious 410 2765 13.49 3.63 0.0330 0.456 9 Metabolic 453 2025 8.94 2.68 0.0198 0.646 7 Rovira 1133 5451 9.62 3.61 0.0085 0.220 8 Petster 1858 12534 13.49 3.45 0.0073 0.141 14 Yeast 1870 2203 2.44 6.81 0.0013 0.067 19 Irvine 1899 59835 14.57 3.06 0.0079 0.109 8 Petsterc 2426 16631 13.71 3.59 0.0057 0.538 10 Pgp 10680 24316 4.55 7.49 0.0004 0.266 24 -
[1] Gursoy A, Keskin O, Nussinov R 2008 Biochem. Soc. Trans. 36 1398Google Scholar
[2] Cheng X, Scherpen J M A 2021 Annu. Rev. Control Robot. Auton. Syst. 4 425Google Scholar
[3] Dorogovtsev S N, Mendes J F F 2002 Adv. Phys. 51 1079Google Scholar
[4] Goh K I, Cusick M E, Valle D, Childs B, Vidal M, Barabási A L 2007 Proc. Natl. Acad. Sci. USA 104 8685Google Scholar
[5] Liu C, Ma Y F, Zhao J, Nussinov R, Zhang Y C, Cheng F X, Zhang Z K 2020 Phys. Rep. 846 1Google Scholar
[6] Woolley S M, Posada D, Crandall K A 2008 PLoS One 3 e1913Google Scholar
[7] Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D U 2006 Phys. Rep. 424 175Google Scholar
[8] Orsini C, Dankulov M M, Colomer-de-Simón P, Jamakovic A, Mahadevan P, Vahdat A, Krioukov D 2015 Nat. Commun. 6 8627Google Scholar
[9] Tantardini M, Ieva F, Tajoli L, Piccardi C 2019 Sci. Rep. 9 17557Google Scholar
[10] 周漩, 张凤鸣, 李克武, 惠晓滨, 吴虎胜 2012 物理学报 61 190201Google Scholar
Zhou X, Zhang F M, Li K W, Hui X B, Wu H S 2012 Acta Phys. Sin. 61 190201Google Scholar
[11] 刘建国, 任卓明, 郭强, 汪秉宏 2013 物理学报 62 178901Google Scholar
Liu J G, Ren Z M, Guo Q, Wang B H 2013 Acta Phys. Sin. 62 178901Google Scholar
[12] Bracken C P, Scott H S, Goodall G J A 2016 Nat. Rev. Genet. 17 719Google Scholar
[13] Pržulj N 2007 Bioinformatics 23 e177Google Scholar
[14] 荣辉桂, 火生旭, 胡春华, 莫进侠 2014 通信学报 35 2
Rong H G, Huo S X, Hu C H, Mo J X 2014 J. Commun. 35 2
[15] Zemlyachenko V N, Korneenko N M, Tyshkevich R I 1985 J. Sov. Math. 29 1426Google Scholar
[16] Grohe M, Schweitzer P 2020 Commun. ACM 63 128Google Scholar
[17] Caetano T S, McAuley J J, Cheng L, Le Q V, Smola A J 2009 IEEE Trans. Pattern Anal. Mach. Intell. 31 1048Google Scholar
[18] Klau G W 2009 BMC Bioinf. 10 S59Google Scholar
[19] Lischka J, Karl H 2009 Proceedings of the 1st ACM Workshop on Virtualized Infrastructure Systems and Architectures Barcelona, 17 August, 2009 p81
[20] 杨博, 刘大有, 金弟, 马海宾 2009 软件学报 20 54Google Scholar
Yang B, Liu D Y, Jin D, Ma H B 2009 J. Softw. 20 54Google Scholar
[21] Aliakbary S, Motallebi S, Rashidian S, Habibi J, Movaghar A 2015 Chaos 25 023111Google Scholar
[22] 刘旭, 易东云 2011 自动化学报 37 1520
Liu X, Yi D Y 2011 Acta Anat. Sin. 37 1520
[23] Nascimento M C, De Carvalho A C 2011 Eur. J. Oper. Res. 211 221Google Scholar
[24] Wilson R C, Zhu P 2008 Pattern Recognit 41 2833Google Scholar
[25] Wang Z P, Zhan X X, Liu C, Zhang Z K 2022 iScience 25 104446Google Scholar
[26] 汪小帆, 刘亚冰 2009 电子科技大学学报 38 537
Wang X F, Liu Y B 2009 J. Univ. Electron. Sci. Technol. China. 38 537
[27] 吕琳媛 2010 电子科技大学学报 39 651
Lv L Y 2010 J. Univ. Electron. Sci. Technol. China. 39 651
[28] Koutra D, Vogelstein J T, Faloutsos C 2013 Proceedings of the 2013 SIAM International Conference on Data Mining ( SDM) Austin, May, 2013 p162
[29] De Domenico M, Biamonte J 2016 Phys. Rev. X 6 041062Google Scholar
[30] Schieber T A, Carpi L, Díaz-Guilera A, Pardalos P M, Masoller C, Ravetti M G 2017 Nat. Commun. 8 13928Google Scholar
[31] Chen D, Shi D D, Qin M, Xu S M, Pan G J 2018 Phys. Rev. E 98 012319Google Scholar
[32] Liu Q, Dong Z, Wang E 2018 Sci. Rep. 8 5134Google Scholar
[33] 邓小龙, 王柏, 吴斌, 杨胜琦 2012 计算机研究与发展 49 725
Deng X L, Wang B, Wu B, Yang S Q 2012 J. Comput. Res. Dev. 49 725
[34] Menéndez M L, Pardo J A, Pardo L 1997 J. Franklin Inst. 334 307Google Scholar
[35] Fronczak A, Hołyst J A, Jedynak M, Sienkiewicz J 2002 Physica A 316 688Google Scholar
[36] 王林, 戴冠中 2005 科技导报 23 62
Wang L, Dai G Z 2005 Sci. & Tech. Rev. 23 62
[37] Zager L A, Verghese G C 2008 Appl. Math. Lett. 21 86Google Scholar
[38] Sarajlić A, Malod-Dognin N, Yaveroğlu Ö N, Pržulj N 2016 Sci. Rep. 6 35098Google Scholar
[39] Wang L, Egorova E K, Mokryakov A V 2018 J. Comput. Syst. Sci. Int. 57 109Google Scholar
[40] Holme P, Saramäki J 2012 Phys. Rep. 519 97Google Scholar
计量
- 文章访问数: 2737
- PDF下载量: 123
- 被引次数: 0