格点量子色动力学蒸馏算法中关联函数的计算优化

张仁强; 蒋翔宇; 俞炯弛; 曾充; 宫明; 徐顺

doi:10.7498/aps.70.20210030

摘要

格点量子色动力学(格点QCD)是一种以量子色动力学为基础, 被广泛应用于强相互作用相关计算的理论, 作为一种可以给出精确可靠理论结果的研究方法, 近年来随着计算机能力的提升, 正在发挥着越来越重要的作用. 蒸馏算法是格点QCD中计算强子关联函数的一种重要数值方法, 可以提高所计算物理量的信噪比. 但用它来构造关联函数时, 同样面临着数据量大和数据维数多的问题, 需要进一步提升计算效率. 本文开发了一套利用蒸馏算法产生夸克双线性算符的关联函数的程序, 利用MPI (message passing interface, 消息传递接口, https://www.open-mpi.org), OpenMP (open multi-processing, 共享存储并行) 和SIMD (single instruction multiple data, 单指令多数据流)多级别优化技术解决其中计算性能瓶颈问题. 对程序进行了多方面的测试, 结果表明本文的设计方案能够支持大规模的计算, 在强扩展性测试下512个进程并行计算仍能达到70%左右的效率, 大大提升了计算关联函数的能力.

关键词:

Abstract

Lattice quantum chromodynamics (lattice QCD) is a theory based on quantum chromodynamics, which is widely used in strong interaction related calculations. As a research method that can give accurate and reliable theoretical results, with the improvement of computer ability, Lattice QCD is playing an increasingly important role in recent years. Distillation method is an important numerical method to calculate hadron correlation function in lattice QCD, and can improve the signal-to-noise ratio of calculated physical quantities. Distillation is a method to approximately compute full propagator via replace the laplacian operator with it's outerproduct of laplace eigenvectors. In this way, the construction of operators is independent of the inversion of propagator which is costful. The eigenvector system and perambulator can be used in different physical projects and we don't need to compute these data repeatedly. It's also convinent for computing disconnected part of correlation function. However, it also faces to the problem of large amount of data in constructing correlation function because the difficulty of compuation is proportional to the cubic of the number of eigenvectors, so it is necessary to further improve its computational efficiency. A program is developed in this work to construct correlation function of quark bilinear with distillation method, and solved the bottleneck of computing performance by using MPI(Message Passing Interface, https://www.open-mpi.org), OpenMP(Open Multi-Processing) and SIMD(Single Instruction Multiple Data) multi-level optimization technology. And this program distribute timeslices to different MPI processes because the computation of each timeslice is independent. In order to show the efficiency of our program some tests result are presented. After various tests of the program, it shows that our design can support large-scale computation. Under the strong scalability test, the parallel computing efficiency of 512 processes can still achieve about 70%. The ability of calculating correlation function is greatly improved. The correction of results also has been checked via compute pseudo-scalar correlators of charmonium. Three different $ 0^{-+}$ operators were adopted for variational analysis and there effecitive mass plateau were compared with the effective mass obtained from the tradional method with point source. The results of distillation method are consistent with traditional method. After variational analysis, three state is obtained, which means the variational analysis take effects and the correlation functions obtained from distillation method is reasonable.

Keywords:

作者及机构信息

1.
中国科学院高能物理研究所理论物理研究室, 北京　100049

2.
浙江大学高分子科学与工程学系, 杭州　310058

3.
浙江大学计算机科学与技术学院, 杭州　310058

4.
中国科学院计算机网络信息中心, 北京　100190

5.
中国科学院大学物理科学学院, 北京　100049

通信作者: 徐顺, xushun@sccas.cn

基金项目: 国家重点研发计划(批准号: 2017YFB0203203)和国家自然科学基金(批准号: 11775229, 11935017)资助的课题

Authors and contacts

1.
Theoretical Physics Division, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China

2.
Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China

3.
College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China

4.
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China

5.
School of Physical Science, University of Chinese Academy of Sciences, Beijing 100049, China

Corresponding author: Xu Shun, xushun@sccas.cn

Funds: Project supported by the National Key R&D Program of China (Grant No. 2017YFB0203203) and the National Natural Science Foundation of China (Grant Nos. 11775229, 11935017)

文章全文

参考文献

[1]	Flynn J M, Mescia F, Tariq A S B 2003 JHEP 07 066 Google Scholar
[2]	Lozano J, Agadjanov A, Gegelia J, Meißner U G, Rusetsky A 2021 Phys. Rev. D 103 034507 Google Scholar
[3]	Chen C, Fischer C S, Roberts C D, Segovia J 2021 Phys. Lett. B 815 136150 Google Scholar
[4]	Meißner U G 2014 Nucl. Phys. News. 24 11 Google Scholar
[5]	Lähde T A, Meißner U G 2019 Lect. Notes Phys. 957 1 Google Scholar
[6]	Wilson K G 1974 Phys. Rev. D 10 2445 Google Scholar
[7]	Gasser J, Leutwyler H 1984 Annals Phys. 158 142 Google Scholar
[8]	Diakonov D, Petrov V, Pobylitsa P, Polyakov M V, Weiss C 1996 Nucl. Phys. B 480 341 Google Scholar
[9]	Rothe H J 2012 World Sci. Lect. Notes Phys. 82
[10]	Brower R, Christ N, DeTar C, Edwards R, Mackenzie P 2018 EPJ Web Conf. 175 09010 Google Scholar
[11]	Zhang Z, Luan Z, Xu C, Gong M, Xu S 2018 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom),Melbourne, VIC, Australia 605
[12]	Gattringer C, Lang C B 2010 Lect. Notes Phys. 788 1
[13]	Barrett R, Berry M, Chan T F, Demmel J, Donato J M, Dongarra J, Eijkhout V, Pozo R, Romine C, Vorst H V 1994 SIAM, Philadelphia 139, 140, 141
[14]	Press W H, Teukolsky S A, Vetterling W T, Flannery B P 1999 (Cambridge: Cambridge University Press) p139
[15]	Wilcox W, Darnell D, Morgan R, Lewis R 2006 PoS LAT 2005 039 Google Scholar
[16]	Peardon M, Bulava J, Foley J, Morningstar C, Dudek J, Edwards R G, Joó B, Lin H W, Richards D G, Juge K J 2009 Phys. Rev. D 80 054506 Google Scholar
[17]	Egerer C, Edwards R G, Orginos K, Richards D G 2021 Phys. Rev. D 103 034502 Google Scholar
[18]	Güsken S, Löw U, Mütter K H, Sommer R 1989 Phys. Lett. B 227 266 Google Scholar
[19]	Best C, et al. 1997 Phys. Rev. D 56 2743 Google Scholar
[20]	Basak S, Edwards R G, Fleming G T, Heller U M, Morningstar C, Richards D, Sato I, Wallace S 2005 Phys. Rev. D 72 094506 Google Scholar
[21]	Ehmann C, Bali G 2007 PoS LATTICE 2007 094 Google Scholar

施引文献

图 1 利用蒸馏算法计算关联函数的流程

Fig. 1. The procedure of computing correlators via distillation method.

下载: 全尺寸图片幻灯片

图 2 含计算约化的关联函数计算的流程. 其中${\boldsymbol T}^A$和${\boldsymbol T}^B$表示两个中间计算量. 利用中间量的计算减少了总体的计算量, 让计算量从$\propto N_{\rm op}^2\times N_{\rm v}^4$变成$\propto N_{\rm op}\times N_{\rm v}^3$, 极大地减少了计算量

Fig. 2. The flowchart of computing correlation function. ${\boldsymbol T}^A$ and ${\boldsymbol T}^B$ are two intermidiate quantities. After introducted intermediate quantities, the computation consumption is highly reducted to $\propto N_{\rm op}\times N_{\rm v}^3$.

下载: 全尺寸图片幻灯片

图 3 按照时间切分实现并行计算的方式, 根据$N_{\rm p}$与$N_{\rm t}$的相对大小, 由于数据的特性, 对τ和Φ按情况采用不同的切分方法.

Fig. 3. Data sgemented according to time. Two conditions are considered which decided how τ and Φ are treated because of the feature of data.

下载: 全尺寸图片幻灯片

图 4 使用SIMD优化前后各阶段耗时对比. I/O代表图1中第一步和第二步的时间, Calc.prepare代表图1中第三步的时间, Calc.result代表图1中第四步的时间, Others代表图1中第五步的时间, Init代表程序初始化的时间. 图例SIMD表示启用了AVX形式的SIMD计算性能, 而Complex表示程序直接调用标准库中的复数计算函数(此处未使用SIMD计算). 其中16个MPI进程并行计算的结果是在超线程计算状态下获得

Fig. 4. The cost of time of program's each part to see the effects of SIMD. I/O labels the time of first step and second step in Fig. 1, Calc.prepare labels the time of the third step in Fig. 1, Calc.result labels the time of the fourth step in Fig. 1, Others labels the time of the fifth step in Fig. 1. Init labels the time of initialization. SIMD in the picture means SIMD optimization was adopted and Complex in the picture means the stdandard library of complex computation was used. And hyper-threading technology was used for 16 MPI process.

下载: 全尺寸图片幻灯片

图 5 使用SIMD优化前后性能对比. 图例SIMD表示启用了AVX形式的SIMD计算性能, 而Complex表示程序直接调用标准库中的复数计算函数(此处未使用SIMD计算). 其中, 在SIMD启用时16个超线程计算结果未参与数据拟合

Fig. 5. The cost of time of program's each part to see the effects of SIMD. SIMD in the picture means SIMD optimization was adopted and Complex in the picture means the stdandard library of complex computation was used. And hyper-threading technology was used for 16 MPI process.

下载: 全尺寸图片幻灯片

图 6 使用OpenMP优化前后耗时对比. 图例如图4. 图例Serial表示串行版本, 即未开启OpenMP多线程和MPI多进程

Fig. 6. The effects of OpenMP optimization was showed. Legends are the same as 4. Serial lables the results of serial program which means no OpenMP and MPI was adopted.

下载: 全尺寸图片幻灯片

图 7 MPI并行强扩展性测试. 随着MPI进程数增加, 计算时间成比例减少. 图例如图4

Fig. 7. MPI parallelism in strong scale tests. The cost time decrease with MPI process numbers. Legends are the same as Fig. 4.

下载: 全尺寸图片幻灯片

图 8 MPI并行强扩展性测试，不同MPI进程数的测试相对于16个进程的计算效率

Fig. 8. MPI parallelism in strong scale tests. The efficiency of strong scale tests compared with 16 MPI processes.

下载: 全尺寸图片幻灯片

图 9 MPI并行弱扩展性测试. $N_{\rm p}$表示使用的并行进程数, $N_{\rm v}$表示本征向量数. 图例说明同图4

Fig. 9. MPI parallelism in weak scale tests. $N_{\rm p}$ represents the number of process, $N_{\rm v}$ represents the number of eigenvectors. Legends are the same as Fig. 4.

下载: 全尺寸图片幻灯片

图 10 做变分前的结果. 在时间比较小时, 三个算符得到的有效质量的行为有很大差别, 证明在不同态的投影是不同的, 意味着变分会有一定的效果. 在时间较大时, 三个算符的有效质量趋于同一平台, 说明它们的量子数是相同的, 可以用来变分. traditional method表示第一个算符通过传统方法所得到的有效质量, 作为蒸馏算法的参照

Fig. 10. Results before variation. The behaviors of the effective mass of these three operators are very different and it means variational analysis would give good results. When time is large enough, these three operators approach to the same plateau so that they should have the same quantum numbers. traditial method label the effective mass of first operator throgh traditional mehtod, which can is matched with distillation method.

下载: 全尺寸图片幻灯片

图 11 做变分后的结果

Fig. 11. Results after variation.

下载: 全尺寸图片幻灯片

[1]	Flynn J M, Mescia F, Tariq A S B 2003 JHEP 07 066 Google Scholar
[2]	Lozano J, Agadjanov A, Gegelia J, Meißner U G, Rusetsky A 2021 Phys. Rev. D 103 034507 Google Scholar
[3]	Chen C, Fischer C S, Roberts C D, Segovia J 2021 Phys. Lett. B 815 136150 Google Scholar
[4]	Meißner U G 2014 Nucl. Phys. News. 24 11 Google Scholar
[5]	Lähde T A, Meißner U G 2019 Lect. Notes Phys. 957 1 Google Scholar
[6]	Wilson K G 1974 Phys. Rev. D 10 2445 Google Scholar
[7]	Gasser J, Leutwyler H 1984 Annals Phys. 158 142 Google Scholar
[8]	Diakonov D, Petrov V, Pobylitsa P, Polyakov M V, Weiss C 1996 Nucl. Phys. B 480 341 Google Scholar
[9]	Rothe H J 2012 World Sci. Lect. Notes Phys. 82
[10]	Brower R, Christ N, DeTar C, Edwards R, Mackenzie P 2018 EPJ Web Conf. 175 09010 Google Scholar
[11]	Zhang Z, Luan Z, Xu C, Gong M, Xu S 2018 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom),Melbourne, VIC, Australia 605
[12]	Gattringer C, Lang C B 2010 Lect. Notes Phys. 788 1
[13]	Barrett R, Berry M, Chan T F, Demmel J, Donato J M, Dongarra J, Eijkhout V, Pozo R, Romine C, Vorst H V 1994 SIAM, Philadelphia 139, 140, 141
[14]	Press W H, Teukolsky S A, Vetterling W T, Flannery B P 1999 (Cambridge: Cambridge University Press) p139
[15]	Wilcox W, Darnell D, Morgan R, Lewis R 2006 PoS LAT 2005 039 Google Scholar
[16]	Peardon M, Bulava J, Foley J, Morningstar C, Dudek J, Edwards R G, Joó B, Lin H W, Richards D G, Juge K J 2009 Phys. Rev. D 80 054506 Google Scholar
[17]	Egerer C, Edwards R G, Orginos K, Richards D G 2021 Phys. Rev. D 103 034502 Google Scholar
[18]	Güsken S, Löw U, Mütter K H, Sommer R 1989 Phys. Lett. B 227 266 Google Scholar
[19]	Best C, et al. 1997 Phys. Rev. D 56 2743 Google Scholar
[20]	Basak S, Edwards R G, Fleming G T, Heller U M, Morningstar C, Richards D, Sato I, Wallace S 2005 Phys. Rev. D 72 094506 Google Scholar
[21]	Ehmann C, Bali G 2007 PoS LATTICE 2007 094 Google Scholar

[1]	袁晓娟. 链接杂质对一维量子Ising模型动力学性质的调控. 物理学报, 2025, 74(3): 037501. doi: 10.7498/aps.74.20241390
[2]	尹相国, 于海如, 郝亚江, 张云波. 一维接触排斥相互作用单自旋翻转费米气体的基态和淬火动力学性质. 物理学报, 2024, 73(2): 020302. doi: 10.7498/aps.73.20231425
[3]	洪浩艺, 高美琪, 桂龙成, 华俊, 梁剑, 史君, 邹锦涛. 格点量子色动力学数据的虚部分布与信号改进. 物理学报, 2023, 72(20): 201101. doi: 10.7498/aps.72.20230869
[4]	邵绪强, 梅鹏, 陈文新. 基于稳定性SPH-SWE数值模型的真实感流体动画实时模拟. 物理学报, 2021, 70(23): 234701. doi: 10.7498/aps.70.20211251
[5]	唐富明, 刘凯, 杨溢, 屠倩, 王凤, 王哲, 廖青. 基于图形处理器加速数值求解三维含时薛定谔方程. 物理学报, 2020, 69(23): 234202. doi: 10.7498/aps.69.20200700
[6]	肖俊, 李登宇, 王雅丽, 史祎诗. 并行化叠层成像算法研究. 物理学报, 2016, 65(15): 154203. doi: 10.7498/aps.65.154203
[7]	张义招, 包芸. 三维湍流Rayleigh-Bénard热对流的高效并行直接求解方法. 物理学报, 2015, 64(15): 154702. doi: 10.7498/aps.64.154702
[8]	林晨森, 陈硕, 李启良, 杨志刚. 耗散粒子动力学GPU并行计算研究. 物理学报, 2014, 63(10): 104702. doi: 10.7498/aps.63.104702
[9]	黄培培, 刘大刚, 刘腊群, 王辉辉, 夏梦局, 陈颖. 单路脉冲功率真空装置的三维数值模拟研究. 物理学报, 2013, 62(19): 192901. doi: 10.7498/aps.62.192901
[10]	蒋建军, 杨翠红, 刘拥军. 一种等效于反铁磁海森伯混合自旋链的铁磁-反铁磁交替自旋链. 物理学报, 2012, 61(6): 067502. doi: 10.7498/aps.61.067502
[11]	李银芳, 申银阳, 孔祥木. 随机外磁场对一维Blume-Capel模型动力学性质的影响. 物理学报, 2012, 61(10): 107501. doi: 10.7498/aps.61.107501
[12]	周庆, 何校栋, 胡月. 用简单物理模型构建通用对称加密系统. 物理学报, 2011, 60(9): 094701. doi: 10.7498/aps.60.094701
[13]	周建槐, 邓敏艺, 唐国宁, 孔令江, 刘慕仁. 利用蜂拥控制算法的反馈方法控制时空混沌. 物理学报, 2009, 58(10): 6828-6832. doi: 10.7498/aps.58.6828
[14]	廖臣, 刘大刚, 刘盛纲. 三维电磁粒子模拟并行计算的研究. 物理学报, 2009, 58(10): 6709-6718. doi: 10.7498/aps.58.6709
[15]	陈贺胜. 带有2+1味道Wilson费米子的格点量子色动力学在有限温度、有限密度下的相变. 物理学报, 2009, 58(10): 6791-6797. doi: 10.7498/aps.58.6791
[16]	王怀玉, 夏青. 海森伯铁磁系统的总能量. 物理学报, 2007, 56(9): 5466-5470. doi: 10.7498/aps.56.5466
[17]	郭媛媛, 陈晓松. 二元高斯核模型的相不稳定性研究. 物理学报, 2005, 54(12): 5755-5762. doi: 10.7498/aps.54.5755
[18]	孙春峰. 钻石分形晶格上Ising模型的配分函数与关联函数. 物理学报, 2005, 54(8): 3768-3773. doi: 10.7498/aps.54.3768
[19]	王延申. 开边界六顶角模型的边界关联函数. 物理学报, 2003, 52(11): 2700-2705. doi: 10.7498/aps.52.2700
[20]	张海燕, GNgele, 马红孺. 二分量带电胶体悬浮系统的等效硬球模型. 物理学报, 2002, 51(8): 1892-1896. doi: 10.7498/aps.51.1892

计量

文章访问数: 7994
PDF下载量: 114
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

搜索

留言板

格点量子色动力学蒸馏算法中关联函数的计算优化