-
排名聚合将多个排名列表聚合成一个综合排名列表, 可应用于推荐系统、链路预测、元搜索、提案评选等. 当前已有工作从不同角度对不同排名聚合算法进行了综述、比较, 但存在算法种类较少、数据统计特性不清晰、评价指标不够合理等局限性. 不同排名聚合算法在提出时均声称优于已有算法, 但是用于比较的方法不同, 测试的数据不同, 应用的场景不同, 因此何种算法最能适应某一任务在很多情况下仍不甚清楚. 本文基于Mallows模型, 提出一套生成统计特性可控的不同类型的排名列表的算法, 使用一个可应用于不同类型排名列表的通用评价指标, 介绍9种排名聚合算法以及它们在聚合少量长列表时的表现. 结果发现启发式方法虽然简单, 但是在排名列表相似度较高、列表相对简单的情况下, 能够接近甚至超过一些优化类方法的结果; 列表中平局数量的增长会降低聚合排名的一致性并增加波动; 列表数量的增加对聚合效果的影响呈现非单调性. 整体而言, 基于距离优化的分支定界方法(FAST)优于其他各类算法, 在不同类型的排名列表中表现非常稳定, 能够很好地完成少量长列表的排名聚合.Rank aggregation aims to combine multiple rank lists into a single one, which has wide applications in recommender systems, link prediction, metasearch, proposal selection, and so on. Some existing studies have summarized and compared different rank aggregation algorithms. However, most of them cover only a few algorithms, the data used to test algorithms do not have a clear statistical property, and the metric used to quantify the aggregated results has certain limitations. Moreover, different algorithms all claim to be superior to existing ones when proposed, the baseline algorithms, the testing samples, and the application scenario are all different from case to case. Therefore, it is still unclear which algorithm is better for a particular task. Here we review nine rank aggregation algorithms and compare their performances in aggregating a small number of long rank lists. We assume an algorithm to generate different types of rank lists with known statistical properties and cause a more reliable metric to quantify the aggregation results. We find that despite the simplicity of heuristic algorithms, they work pretty well when the rank lists are full and have high similarities. In some cases, they can reach or even surpass the optimization-based algorithms in performance. The number of ties in the list will reduce the quality of the consensus rank and increase fluctuations. The quality of aggregated rank changes non-monotonically with the number of rank lists that need to be combined. Overall, the algorithm FAST outperforms all others in three different rank types, which can sufficiently complete the task of aggregating a small number of long rank lists.
-
Keywords:
- rank aggregation /
- incomplete rank list /
- Mallows model /
- rank-biased overlap
[1] Liao H, Mariani M S, Medo M, Zhang Y C, Zhou M Y 2017 Phys. Rep. 689 154
[2] 刘建国, 任卓明, 郭强, 汪秉宏 2013 物理学报 62 178901Google Scholar
Liu J G, Ren Z M, Guo Q, Wang B H 2013 Acta Phys. Sin. 62 178901Google Scholar
[3] Pujari M, Kanawati R 2012 Proceedings of the 21st International Conference on World Wide Web Lyon, France, April 16−20, 2012 p11
[4] Tabourier L, Libert A S, Lambiotte R 2016 EPJ Data Sci. 5 1Google Scholar
[5] Snell J L, Kemeny J G 1962 Mathematical Models in the Social Sciences (Boston: Introduction to Higher Mathematics) pp3−23
[6] Davenport A J, Kalagnanam J 2004 Conference on 19th National Conference on Artificial Intelligence San Jose, USA, July 25−29, 2004 p697
[7] Amodio S, D’ambrosio A, Siciliano R 2016 Eur. J. Oper. Res. 249 667Google Scholar
[8] Meila M, Phadnis K, Patterson A, Bilmes J 2012 arXiv: 1206.5265 [cs.LG]
[9] Baskin J P, Krishnamurthi S 2009 Proceedings of the Third ACM Conference on Recommender Systems New York, USA, October 23−25, 2009 p337
[10] Lü L Y, Medo M, Yeung C H, Zhang Y C, Zhang Z K, Zhou T 2012 Phys. Rep. 519 1Google Scholar
[11] Dwork C, Kumar R, Naor M 2001 Proceedings of the 10th International Conference on World Wide Web Hong Kong, May 1−5, 2001 p613
[12] Cook W D, Raviv T A L, Richardson A J 2010 Accounting Perspectives 9 217Google Scholar
[13] Cook W D, Golany B, Penn M 2007 Comput. Oper. Res. 34 954Google Scholar
[14] 郭崇慧, 李敏谦 2018 数据分析与知识发现 2 10Google Scholar
Guo C H, Li M Q 2018 Data Analysis and Knowledge Discovery 2 10Google Scholar
[15] Jia T, Wang D, Szymanski B K 2017 Nat. Hum. Behav. 1 0078Google Scholar
[16] 张海霞, 吕振, 张传亭 2018 电子科技大学学报 47 112Google Scholar
Zhang H X, LÜ Z, Zhang C T 2018 Journal of University of Electronic Science and Technology of China 47 112Google Scholar
[17] 贾韬, 夏锋 2019 大数据 04 38Google Scholar
Jia T, Xia F 2019 Big Data Res. 04 38Google Scholar
[18] Wang X, Ran Y, Jia T 2020 Chaos: An Interdisciplinary Journal of Nonlinear Science 30 013101
[19] 刘文, 王永滨 2011 物理学报 60 070301Google Scholar
Liu W, Wang Y B 2011 Acta Phys. Sin. 60 070301Google Scholar
[20] 韩忠明, 吴杨, 谭旭升, 段大高, 杨伟杰 2015 物理学报 64 58902Google Scholar
Han Z M, Wu Y, Tan X S, Duan D G, Yang W J 2015 Acta Phys. Sin. 64 58902Google Scholar
[21] Borda J C de 1781 Histoire de l'Academie Royale des Sciences 657
[22] Langville A N, Meyer C D 2012 Who's# 1?: The Science of Rating and Ranking (Princeton: Princeton University Press) pp159−231
[23] Cook W D 2006 Eur. J. Oper. Res. 172 369Google Scholar
[24] Lin S 2010 Wiley Interdiscip Rev. Comput. Stat. 2 555Google Scholar
[25] Ali A, Meila M 2012 Math. Soc. Sci. 64 28Google Scholar
[26] Schalekamp F, Zuylen A 2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments New York, USA, January 3, 2009 p38
[27] Brancotte B, Yang B, Blin G, Cohen B S, Denise A, Hamel S 2015 Proceedings of the VLDB Endowment 8 1202Google Scholar
[28] Fagin R, Kumar R, Sivakumar D 2003 SIAM J. Discrete Math. 17 134Google Scholar
[29] Cohen-boulakia S, Denise A, Hamel S 2011 International Conference on Scientific and Statistical Database Management Portland, USA, July 20−22, 2011 p73
[30] Xiao Y, Deng Y, Wu J 2017 Nav. Res. Logist. 64 556Google Scholar
[31] Fagin R, Kumar R, Mahdian M 2004 Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems Paris, France, June 14−16, 2004 p47
[32] Li X, Wang X, Xiao G 2017 Brief. Bionform. 20 178
[33] [34] Deng K, Han S, Li K J 2014 J. Am. Stat. Assoc. 109 1023Google Scholar
[35] Liu Y T, Liu T Y, Qin T 2007 Proceedings of the 16th international conference on World Wide Web Banff, Alberta, Canada, May 8−12, 2007 p481
[36] Freund Y, Iyer R, Schapire R E 2003 J. Mach. Learn. Res. 4 933
[37] Ailon N, Charikar M, Newman A 2008 J. ACM 55 23
[38] Van Zuylen A, Williamson D P 2009 Math. Oper. Res. 34 594Google Scholar
[39] Kendall M G 1948 Rank correlation methods (London: Griffin)
[40] Diaconis P, Graham R L 1977 J. R. Stat. Soc. B 262
[41] Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E 2006 SIAM J. Discrete. Math. 20 628Google Scholar
[42] Fagin R, Kumar R, Sivakumar D 2003 Proceedings of the 2003 ACM SIGMOD International Conference on Management of data San Diego, California January 9−12, 2003 p301
[43] Brin S, Page L 1998 Comput. Networks ISDN Syst. 30 107Google Scholar
[44] Adali S, Hill B, Magdon-Ismail M 2007 J. Digital Information Management(JDIM) 5 292
[45] Emond E J, Mason D W 2002 J. Multi-Crit. Decis. Anal. 11 17Google Scholar
[46] Ailon N 2010 Algorithmica 57 284Google Scholar
[47] Lin S, Ding J 2009 Biometrics 65 9Google Scholar
[48] Heiser W J, D’ambrosio A (edited by Lausen B, Dirk V P ) 2013 Algorithms from and for Nature and Life (New York: Springer) pp19−31
[49] Pedings K E, Langville A N, Yamamoto Y 2012 Optim. Eng. 13 349Google Scholar
[50] Bar-Ilan J, Mat-Hassan M, Levene M 2006 Comput. Networks 50 1448Google Scholar
[51] Lin Z W, Yi L, Guo X L 2017 arXiv: 1704.08464 [cs.AI]
[52] Ekstrom C T, Gerds T A, Jensen A K 2018 Biostatistics 20 582
[53] Kumar R, Vassilvitskii S 2010 Proceedings of the 19th International Conference on World Wide Web Raleigh, North Carolina, USA, April 26−30, 2010 p571
[54] Sakai T, Nicola F 2014 Metrics, Statistics, Tests in: Bridging between Information Retrieval and Databases (Heidelberg: Springer) pp116−163
[55] Webber W, Moffat A, Zobel J 2010 ACM T. Inform. Syst. 28 1
[56] Mallows C L 1957 Biometrika 44 141Google Scholar
[57] Critchlow D E, Fligner M A, Verducci J S 1991 J. Math. Psychol. 35 294Google Scholar
[58] Irurozki E, Calvo B, Lozano J A 2016 J. Stat Softw. 71 1575
[59] Fligner M A, Verducci J S 1986 J. R. Stat. Soc. B 359
[60] Smith B B 1950 J. R. Stat. Soc. B 12 41
[61] [62] Thurstone L L 1927 Psychol. Rev. 34 273Google Scholar
-
-
[1] Liao H, Mariani M S, Medo M, Zhang Y C, Zhou M Y 2017 Phys. Rep. 689 154
[2] 刘建国, 任卓明, 郭强, 汪秉宏 2013 物理学报 62 178901Google Scholar
Liu J G, Ren Z M, Guo Q, Wang B H 2013 Acta Phys. Sin. 62 178901Google Scholar
[3] Pujari M, Kanawati R 2012 Proceedings of the 21st International Conference on World Wide Web Lyon, France, April 16−20, 2012 p11
[4] Tabourier L, Libert A S, Lambiotte R 2016 EPJ Data Sci. 5 1Google Scholar
[5] Snell J L, Kemeny J G 1962 Mathematical Models in the Social Sciences (Boston: Introduction to Higher Mathematics) pp3−23
[6] Davenport A J, Kalagnanam J 2004 Conference on 19th National Conference on Artificial Intelligence San Jose, USA, July 25−29, 2004 p697
[7] Amodio S, D’ambrosio A, Siciliano R 2016 Eur. J. Oper. Res. 249 667Google Scholar
[8] Meila M, Phadnis K, Patterson A, Bilmes J 2012 arXiv: 1206.5265 [cs.LG]
[9] Baskin J P, Krishnamurthi S 2009 Proceedings of the Third ACM Conference on Recommender Systems New York, USA, October 23−25, 2009 p337
[10] Lü L Y, Medo M, Yeung C H, Zhang Y C, Zhang Z K, Zhou T 2012 Phys. Rep. 519 1Google Scholar
[11] Dwork C, Kumar R, Naor M 2001 Proceedings of the 10th International Conference on World Wide Web Hong Kong, May 1−5, 2001 p613
[12] Cook W D, Raviv T A L, Richardson A J 2010 Accounting Perspectives 9 217Google Scholar
[13] Cook W D, Golany B, Penn M 2007 Comput. Oper. Res. 34 954Google Scholar
[14] 郭崇慧, 李敏谦 2018 数据分析与知识发现 2 10Google Scholar
Guo C H, Li M Q 2018 Data Analysis and Knowledge Discovery 2 10Google Scholar
[15] Jia T, Wang D, Szymanski B K 2017 Nat. Hum. Behav. 1 0078Google Scholar
[16] 张海霞, 吕振, 张传亭 2018 电子科技大学学报 47 112Google Scholar
Zhang H X, LÜ Z, Zhang C T 2018 Journal of University of Electronic Science and Technology of China 47 112Google Scholar
[17] 贾韬, 夏锋 2019 大数据 04 38Google Scholar
Jia T, Xia F 2019 Big Data Res. 04 38Google Scholar
[18] Wang X, Ran Y, Jia T 2020 Chaos: An Interdisciplinary Journal of Nonlinear Science 30 013101
[19] 刘文, 王永滨 2011 物理学报 60 070301Google Scholar
Liu W, Wang Y B 2011 Acta Phys. Sin. 60 070301Google Scholar
[20] 韩忠明, 吴杨, 谭旭升, 段大高, 杨伟杰 2015 物理学报 64 58902Google Scholar
Han Z M, Wu Y, Tan X S, Duan D G, Yang W J 2015 Acta Phys. Sin. 64 58902Google Scholar
[21] Borda J C de 1781 Histoire de l'Academie Royale des Sciences 657
[22] Langville A N, Meyer C D 2012 Who's# 1?: The Science of Rating and Ranking (Princeton: Princeton University Press) pp159−231
[23] Cook W D 2006 Eur. J. Oper. Res. 172 369Google Scholar
[24] Lin S 2010 Wiley Interdiscip Rev. Comput. Stat. 2 555Google Scholar
[25] Ali A, Meila M 2012 Math. Soc. Sci. 64 28Google Scholar
[26] Schalekamp F, Zuylen A 2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments New York, USA, January 3, 2009 p38
[27] Brancotte B, Yang B, Blin G, Cohen B S, Denise A, Hamel S 2015 Proceedings of the VLDB Endowment 8 1202Google Scholar
[28] Fagin R, Kumar R, Sivakumar D 2003 SIAM J. Discrete Math. 17 134Google Scholar
[29] Cohen-boulakia S, Denise A, Hamel S 2011 International Conference on Scientific and Statistical Database Management Portland, USA, July 20−22, 2011 p73
[30] Xiao Y, Deng Y, Wu J 2017 Nav. Res. Logist. 64 556Google Scholar
[31] Fagin R, Kumar R, Mahdian M 2004 Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems Paris, France, June 14−16, 2004 p47
[32] Li X, Wang X, Xiao G 2017 Brief. Bionform. 20 178
[33] [34] Deng K, Han S, Li K J 2014 J. Am. Stat. Assoc. 109 1023Google Scholar
[35] Liu Y T, Liu T Y, Qin T 2007 Proceedings of the 16th international conference on World Wide Web Banff, Alberta, Canada, May 8−12, 2007 p481
[36] Freund Y, Iyer R, Schapire R E 2003 J. Mach. Learn. Res. 4 933
[37] Ailon N, Charikar M, Newman A 2008 J. ACM 55 23
[38] Van Zuylen A, Williamson D P 2009 Math. Oper. Res. 34 594Google Scholar
[39] Kendall M G 1948 Rank correlation methods (London: Griffin)
[40] Diaconis P, Graham R L 1977 J. R. Stat. Soc. B 262
[41] Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E 2006 SIAM J. Discrete. Math. 20 628Google Scholar
[42] Fagin R, Kumar R, Sivakumar D 2003 Proceedings of the 2003 ACM SIGMOD International Conference on Management of data San Diego, California January 9−12, 2003 p301
[43] Brin S, Page L 1998 Comput. Networks ISDN Syst. 30 107Google Scholar
[44] Adali S, Hill B, Magdon-Ismail M 2007 J. Digital Information Management(JDIM) 5 292
[45] Emond E J, Mason D W 2002 J. Multi-Crit. Decis. Anal. 11 17Google Scholar
[46] Ailon N 2010 Algorithmica 57 284Google Scholar
[47] Lin S, Ding J 2009 Biometrics 65 9Google Scholar
[48] Heiser W J, D’ambrosio A (edited by Lausen B, Dirk V P ) 2013 Algorithms from and for Nature and Life (New York: Springer) pp19−31
[49] Pedings K E, Langville A N, Yamamoto Y 2012 Optim. Eng. 13 349Google Scholar
[50] Bar-Ilan J, Mat-Hassan M, Levene M 2006 Comput. Networks 50 1448Google Scholar
[51] Lin Z W, Yi L, Guo X L 2017 arXiv: 1704.08464 [cs.AI]
[52] Ekstrom C T, Gerds T A, Jensen A K 2018 Biostatistics 20 582
[53] Kumar R, Vassilvitskii S 2010 Proceedings of the 19th International Conference on World Wide Web Raleigh, North Carolina, USA, April 26−30, 2010 p571
[54] Sakai T, Nicola F 2014 Metrics, Statistics, Tests in: Bridging between Information Retrieval and Databases (Heidelberg: Springer) pp116−163
[55] Webber W, Moffat A, Zobel J 2010 ACM T. Inform. Syst. 28 1
[56] Mallows C L 1957 Biometrika 44 141Google Scholar
[57] Critchlow D E, Fligner M A, Verducci J S 1991 J. Math. Psychol. 35 294Google Scholar
[58] Irurozki E, Calvo B, Lozano J A 2016 J. Stat Softw. 71 1575
[59] Fligner M A, Verducci J S 1986 J. R. Stat. Soc. B 359
[60] Smith B B 1950 J. R. Stat. Soc. B 12 41
[61] [62] Thurstone L L 1927 Psychol. Rev. 34 273Google Scholar
计量
- 文章访问数: 7495
- PDF下载量: 162
- 被引次数: 0