密度矩阵重正化群的异构并行优化

陈富州; 程晨; 罗洪刚

doi:10.7498/aps.68.20190586

摘要

密度矩阵重正化群方法(DMRG)在求解一维强关联格点模型的基态时可以获得较高的精度, 在应用于二维或准二维问题时, 要达到类似的精度通常需要较大的计算量与存储空间. 本文提出一种新的 DMRG异构并行策略, 可以同时发挥计算机中央处理器(CPU)和图形处理器(GPU)的计算性能. 针对最耗时的哈密顿量对角化部分, 实现了数据的分布式存储, 并且给出了CPU和GPU之间的负载平衡策略. 以费米Hubbard模型为例, 测试了异构并行程序在不同DMRG保留状态数下的运行表现, 并给出了相应的性能基准. 应用于4腿梯子时, 观测到了高温超导中常见的电荷密度条纹, 此时保留状态数达到10⁴, 使用的GPU显存小于12 GB.

关键词:

Abstract

Density matrix renormalization group (DMRG), as a numerical method of solving the ground state of one-dimensional strongly-correlated lattice model with very high accuracy, requires expensive computational and memory cost when applied to two- and quasi-two-dimensional problems. The number of DMRG kept states is generally very large to achieve a reliable accuracy for these applications, which results in numerous matrix and vector operations and unbearably consuming time in the absence of the proper parallelization. However, due to its sequential nature, the parallelization of DMRG algorithm is usually not straightforward. In this work, we propose a new hybrid parallelization strategy for the DMRG method. It takes advantage of the computing capability of both central processing unit (CPU) and graphics processing unit (GPU) of the computer. In order to achieve as many as DMRG kept states within a limited GPU memory, we adopt the four-block formulation of the Hamiltonian rather than the two-block formulation. The later consumes much more memories, which has been used in another pioneer work on the hybrid parallelization of the DMRG algorithm, and only a small number of DMRG kept states are available. Our parallel strategy focuses on the diagonalization of the Hamiltonian, which is the most time-consuming part of the whole DMRG procedure. A hybrid parallelization strategy of diagonalization method is implemented, in which the required data for diagonalization are distributed on both the host and GPU memory, and the data exchange between them is negligible in our data partitioning scheme. The matrix operations are also shared on both CPU and GPU when the Hamiltonian acts on a wave function, while the distribution of these operations is determined by a load balancing strategy. Taking fermionic Hubbard model for example, we examine the running performance of the hybrid parallelization strategy with different DMRG kept states and provide corresponding performance benchmark. On a 4-leg ladder, we employ the conserved quantities with U(1) symmetry of the model and a good-quantum number based task scheduling to further reduce the GPU memory cost. We manage to obtain a moderate speedup of the hybrid parallelization for a wide range of DMRG kept states. In our example, the ground state energy with high accuracy is obtained by the extrapolation of the results, with different numbers of states kept, and we show charge stripes which are usually experimentally observed in high-temperature superconductors. In this case, we keep 10⁴ DMRG states and the GPU memory cost is less than 12 Gigabytes.

Keywords:

作者及机构信息

1.
兰州大学物理科学与技术学院, 兰州　730000

2.
北京计算科学研究中心, 北京　100084

通信作者: 罗洪刚, luohg@lzu.edu.cn

基金项目: 国家自然科学基金(批准号: 11674139, 11834005)和长江学者和创新团队发展计划(批准号: IRT-16R35)资助的课题.

Authors and contacts

1.
School of Physical Science and Technology, Lanzhou University, Lanzhou 730000, China

2.
Beijing Computational Science Research Center, Beijing 100084, China

Corresponding author: Luo Hong-Gang, luohg@lzu.edu.cn

Funds: Project supported by the National Natural Science Foundation of China (Grant Nos. 11674139, 11834005) and the Program for Changjiang Scholars and Innovative Research Team in University, China (Grant No. IRT-16R35).

文章全文

参考文献

[1]	White S R 1992 Phys. Rev. Lett. 69 2863 Google Scholar
[2]	White S R 1993 Phys. Rev. B 48 10345 Google Scholar
[3]	Schollwöck U 2005 Rev. Mod. Phys. 77 259 Google Scholar
[4]	Schollwöck U 2011 Annals of Physics 326 96 Google Scholar
[5]	Xiang T 1996 Phys. Rev. B 53 R10445
[6]	White S R, Martin R L 1999 J. Chem. Phys. 110 4127 Google Scholar
[7]	Luo H G, Qin M P, Xiang T 2010 Phys. Rev. B 81 235129 Google Scholar
[8]	Yang J, Hu W, Usvyat D, Matthews D, Schütz M, Chan G K L 2014 Science 345 640 Google Scholar
[9]	Cazalilla M A, Marston J B 2002 Phys. Rev. Lett. 88 256403 Google Scholar
[10]	Luo H G, Xiang T, Wang X Q 2003 Phys. Rev. Lett. 91 049701 Google Scholar
[11]	White S R, Feiguin A E 2004 Phys. Rev. Lett. 93 076401 Google Scholar
[12]	Cheng C, Mondaini R, Rigol M 2018 Phys. Rev. B 98 121112 Google Scholar
[13]	Zheng B X, Chung C M, Corboz P, Ehlers G, Qin M P, Noack R M, Shi H, White S R, Zhang S, Chan G K L 2017 Science 358 1155 Google Scholar
[14]	Huang E W, Mendl C B, Liu S, Johnston S, Jiang H C, Moritz B, Devereaux T P 2017 Science 358 1161 Google Scholar
[15]	Dagotto E 1994 Rev. Mod. Phys. 66 763 Google Scholar
[16]	Keimer B, Kivelson S A, Norman M R, Uchida S, Zaanen J 2015 Nature 518 179 Google Scholar
[17]	Fradkin E, Kivelson S A, Tranquada J M 2015 Rev. Mod. Phys. 87 457 Google Scholar
[18]	Yan S, Huse D A, White S R 2011 Science 332 1173 Google Scholar
[19]	Savary L, Balents L 2017 Rep. Prog. Phys. 80 016502
[20]	Alvarez G 2012 Comput. Phys. Commun. 183 2226
[21]	Tzeng Y C 2012 Phys. Rev. B 86 024403 Google Scholar
[22]	Legeza O, Röder J, Hess B A 2003 Phys. Rev. B 67 125114 Google Scholar
[23]	Legeza O, Sólyom J 2003 Phys. Rev. B 68 195116 Google Scholar
[24]	White S R 1996 Phys. Rev. Lett. 77 3633 Google Scholar
[25]	Hubig C, McCulloch I P, Schollwöck U, Wolf F A 2015 Phys. Rev. B 91 155115 Google Scholar
[26]	White S R 2005 Phys. Rev. B 72 180403 Google Scholar
[27]	Stoudenmire E M, White S R 2013 Phys. Rev. B 87 155137 Google Scholar
[28]	Hager G, Jeckelmann E, Fehske H, Wellein G 2004 J. Comput. Phys. 194 795 Google Scholar
[29]	Chan G K L 2004 J. Chem. Phys. 120 3172 Google Scholar
[30]	Nemes C, Barcza G, Nagy Z, Örs Legeza, Szolgay P 2014 Comput. Phys. Commun. 185 1570 Google Scholar
[31]	Siro T, Harju A 2012 Comput. Phys. Commun. 183 1884 Google Scholar
[32]	Lutsyshyn Y 2015 Comput. Phys. Commun. 187 162 Google Scholar
[33]	Yu J, Hsiao H C, Kao Y J 2011 Comput. Fluids 45 55 Google Scholar
[34]	Ehlers G, White S R, Noack R M 2017 Phys. Rev. B 95 125125 Google Scholar
[35]	Davidson E R 1975 J. Comput. Phys. 17 87 Google Scholar
[36]	Sadkane M, Sidje R B 1999 Numer. Algorithms 20 217 Google Scholar
[37]	Tranquada J M, Sternlieb B J, Axe J D, Nakamura Y, Uchida S 1995 Nature 375 561 Google Scholar
[38]	Comin R, Damascelli A 2016 Annu. Rev. Condens. Matter Phys. 7 369 Google Scholar

施引文献

图 1 超块中的四个子块

Fig. 1. 4 Sub-blocks of super-block

下载: 全尺寸图片幻灯片

图 2 CPU中作用哈密顿量在波函数上的性能　(a)矩阵乘法的浮点性能; (b)作用哈密顿量于波函数的性能, 及矩阵乘法中的最大矩阵尺寸

Fig. 2. Performance of acting the Hamiltonian on the wave function in CPU: (a) The matrix multiplication performance; (b) the performance of acting the Hamiltonian on the wave function, and the maximum matrix size of the matrix multiplications.

下载: 全尺寸图片幻灯片

图 3 对角化哈密顿量和作用哈密顿量到波函数操作占总计算时间的比例

Fig. 3. Time ratio of diagonalization of the Hamiltonian and acting the Hamiltonian on the wave function to the total time cost.

下载: 全尺寸图片幻灯片

图 4 存储临时数据, 子块算符需要的GPU显存

Fig. 4. The GPU memory cost of temporary data and sub-block operators.

下载: 全尺寸图片幻灯片

图 5 异构并行的性能　(a)加速比; (b) Davidson方法中的向量占用GPU显存; (c)作用哈密顿量到波函数部分的性能

Fig. 5. Performance of hybrid parallel strategy: (a) The speedup; (b) the GPU memory cost of vectors in Davidson; (c) the performance of $H\left|{\psi}\right\rangle$

下载: 全尺寸图片幻灯片

图 6 基态能量关于截断误差的函数(直线表示对基态能量的线性外推, 直至截断误差为0)

Fig. 6. Groundstate energy as a function of truncation error. The straight line gives a linear extrapolation of the ground energy until 0 truncation-error.

下载: 全尺寸图片幻灯片

图 7 对于16 × 4 Hubbard模型, U = 8.0时的基态电荷密度分布(可以观察到明显的电荷密度条纹)

Fig. 7. Ground state density profile for the 16 × 4 Hubbard ladder with U = 8.0. Charge density stripes can be clearly observed.

下载: 全尺寸图片幻灯片

[1]	White S R 1992 Phys. Rev. Lett. 69 2863 Google Scholar
[2]	White S R 1993 Phys. Rev. B 48 10345 Google Scholar
[3]	Schollwöck U 2005 Rev. Mod. Phys. 77 259 Google Scholar
[4]	Schollwöck U 2011 Annals of Physics 326 96 Google Scholar
[5]	Xiang T 1996 Phys. Rev. B 53 R10445
[6]	White S R, Martin R L 1999 J. Chem. Phys. 110 4127 Google Scholar
[7]	Luo H G, Qin M P, Xiang T 2010 Phys. Rev. B 81 235129 Google Scholar
[8]	Yang J, Hu W, Usvyat D, Matthews D, Schütz M, Chan G K L 2014 Science 345 640 Google Scholar
[9]	Cazalilla M A, Marston J B 2002 Phys. Rev. Lett. 88 256403 Google Scholar
[10]	Luo H G, Xiang T, Wang X Q 2003 Phys. Rev. Lett. 91 049701 Google Scholar
[11]	White S R, Feiguin A E 2004 Phys. Rev. Lett. 93 076401 Google Scholar
[12]	Cheng C, Mondaini R, Rigol M 2018 Phys. Rev. B 98 121112 Google Scholar
[13]	Zheng B X, Chung C M, Corboz P, Ehlers G, Qin M P, Noack R M, Shi H, White S R, Zhang S, Chan G K L 2017 Science 358 1155 Google Scholar
[14]	Huang E W, Mendl C B, Liu S, Johnston S, Jiang H C, Moritz B, Devereaux T P 2017 Science 358 1161 Google Scholar
[15]	Dagotto E 1994 Rev. Mod. Phys. 66 763 Google Scholar
[16]	Keimer B, Kivelson S A, Norman M R, Uchida S, Zaanen J 2015 Nature 518 179 Google Scholar
[17]	Fradkin E, Kivelson S A, Tranquada J M 2015 Rev. Mod. Phys. 87 457 Google Scholar
[18]	Yan S, Huse D A, White S R 2011 Science 332 1173 Google Scholar
[19]	Savary L, Balents L 2017 Rep. Prog. Phys. 80 016502
[20]	Alvarez G 2012 Comput. Phys. Commun. 183 2226
[21]	Tzeng Y C 2012 Phys. Rev. B 86 024403 Google Scholar
[22]	Legeza O, Röder J, Hess B A 2003 Phys. Rev. B 67 125114 Google Scholar
[23]	Legeza O, Sólyom J 2003 Phys. Rev. B 68 195116 Google Scholar
[24]	White S R 1996 Phys. Rev. Lett. 77 3633 Google Scholar
[25]	Hubig C, McCulloch I P, Schollwöck U, Wolf F A 2015 Phys. Rev. B 91 155115 Google Scholar
[26]	White S R 2005 Phys. Rev. B 72 180403 Google Scholar
[27]	Stoudenmire E M, White S R 2013 Phys. Rev. B 87 155137 Google Scholar
[28]	Hager G, Jeckelmann E, Fehske H, Wellein G 2004 J. Comput. Phys. 194 795 Google Scholar
[29]	Chan G K L 2004 J. Chem. Phys. 120 3172 Google Scholar
[30]	Nemes C, Barcza G, Nagy Z, Örs Legeza, Szolgay P 2014 Comput. Phys. Commun. 185 1570 Google Scholar
[31]	Siro T, Harju A 2012 Comput. Phys. Commun. 183 1884 Google Scholar
[32]	Lutsyshyn Y 2015 Comput. Phys. Commun. 187 162 Google Scholar
[33]	Yu J, Hsiao H C, Kao Y J 2011 Comput. Fluids 45 55 Google Scholar
[34]	Ehlers G, White S R, Noack R M 2017 Phys. Rev. B 95 125125 Google Scholar
[35]	Davidson E R 1975 J. Comput. Phys. 17 87 Google Scholar
[36]	Sadkane M, Sidje R B 1999 Numer. Algorithms 20 217 Google Scholar
[37]	Tranquada J M, Sternlieb B J, Axe J D, Nakamura Y, Uchida S 1995 Nature 375 561 Google Scholar
[38]	Comin R, Damascelli A 2016 Annu. Rev. Condens. Matter Phys. 7 369 Google Scholar

[1]	郭晓敏, 王岐岐, 罗越, 宋智杰, 李正雅, 瞿毅坤, 郭龑强, 肖连团. 实时熵源评估二重并行连续变量量子随机数发生器. 物理学报, 2025, 74(12): 124202. doi: 10.7498/aps.74.20250333
[2]	庞晓娟, 赵凯玥, 何航宇, 张宁波, 蒋臣威. 靛红双氮二苯腙分子开关的光致异构化机理. 物理学报, 2024, 73(17): 173101. doi: 10.7498/aps.73.20240461
[3]	罗旭, 王丽红, 吕良, 曹书峰, 董学成, 赵建国. 基于面磁荷密度的金属磁记忆检测正演模型. 物理学报, 2022, 71(15): 154101. doi: 10.7498/aps.71.20220176
[4]	张仁强, 蒋翔宇, 俞炯弛, 曾充, 宫明, 徐顺. 格点量子色动力学蒸馏算法中关联函数的计算优化. 物理学报, 2021, 70(16): 161201. doi: 10.7498/aps.70.20210030
[5]	梁潇, 钱志鸿, 田洪亮, 王雪. 基于马尔可夫决策模型的异构无线网络切换选择算法. 物理学报, 2016, 65(23): 236402. doi: 10.7498/aps.65.236402
[6]	肖俊, 李登宇, 王雅丽, 史祎诗. 并行化叠层成像算法研究. 物理学报, 2016, 65(15): 154203. doi: 10.7498/aps.65.154203
[7]	任学藻, 贺树, 丛红璐, 王旭文. 两格点两电子Hubbard-Holstein模型极化子的量子纠缠特性. 物理学报, 2012, 61(12): 124207. doi: 10.7498/aps.61.124207
[8]	李清都, 周红伟, 杨晓松. 基于异构计算的简单行走模型的吸引区域研究. 物理学报, 2012, 61(4): 040503. doi: 10.7498/aps.61.040503
[9]	黄书文, 刘涛, 汪克林. DNA模型的有限格点系的严格对角化解. 物理学报, 2010, 59(3): 2033-2037. doi: 10.7498/aps.59.2033
[10]	刘成周, 张昌平. 二维静态时空中Dirac场的重正化能动张量和Casimir效应. 物理学报, 2007, 56(4): 1928-1937. doi: 10.7498/aps.56.1928
[11]	邓罗根, 罗丽媛. 存在光致异构化情况下掺杂液晶非线性增强因子的微观形式. 物理学报, 2007, 56(3): 1480-1488. doi: 10.7498/aps.56.1480
[12]	丁建文, 颜晓红, 方显承, 段祝平. 纳米结构链的hopping电导实空间重正化群方法. 物理学报, 1999, 48(2): 314-319. doi: 10.7498/aps.48.314
[13]	赵辉, 王永生, 徐征, 侯延冰, 徐叙. 光激励发光的并行模型. 物理学报, 1998, 47(2): 333-339. doi: 10.7498/aps.47.333
[14]	朱建阳. 引入“鬼”场的二维次近邻正方格点渗流模型的重整化群方法研究. 物理学报, 1993, 42(6): 880-885. doi: 10.7498/aps.42.880
[15]	叶青；唐坤发；胡嘉校. 通过平均场处理的严格重正化群变换方法对Potts模型的应用. 物理学报, 1987, 36(8): 1019-1026. doi: 10.7498/aps.36.1019
[16]	董绍静. SU(2)格点规范理论中的重夸克相互作用力及势的计算. 物理学报, 1986, 35(9): 1248-1252. doi: 10.7498/aps.35.1248
[17]	唐坤发, 胡嘉桢. 一种推广伊辛自旋模型的实空间重正化群理论. 物理学报, 1986, 35(8): 1048-1054. doi: 10.7498/aps.35.1048
[18]	王家珠, 毕品镇, 殷鹏程. 重夸克对强子的椭球袋模型. 物理学报, 1981, 30(12): 1707-1712. doi: 10.7498/aps.30.1707
[19]	汪容. 关于规范群中含有Abel子群和有Higgs场的情况下重正化前后规范群的同构. 物理学报, 1981, 30(6): 731-746. doi: 10.7498/aps.30.731
[20]	高崇寿. SU₃群八重态理论和强相互作用粒子的分类. 物理学报, 1964, 20(12): 1187-1198. doi: 10.7498/aps.20.1187

计量

文章访问数: 19915
PDF下载量: 134
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

搜索

留言板

密度矩阵重正化群的异构并行优化