-
This study aims to predict the thermodynamic stability of rare-earth compounds by using machine learning (ML) models, providing crucial data support for designing advanced materials and facilitating the discovery of new rare-earth compounds. In terms of methods, this study is based on a dataset consisting of 280,569 compounds. The formation energies of these compounds are calculated by density functional theory (DFT). A system consisting of 145 feature descriptors is constructed, covering stoichiometric properties, statistical properties of elements, electronic structure properties, and properties of ionic compounds, comprehensively describing the characteristics of rare-earth compounds. Two ML models, i.e. random forest (RF) and neural network (NN), are selected to perform classification and regression tasks respectively. The 5-fold cross-validation is used to improve the reliability of the models. The min-max scaling technique is used for preprocessing data, and an ensemble learning architecture is constructed to address the limitations of single model. In the classification task, the RF and NN algorithms perform remarkably well. With 5-fold cross-validation, the accuracy reaches approximately 0.97, and the F1 score is around 0.98, enabling the precise classification of compounds into stable or unstable categories. In the regression task, the mean absolute errors (MAEs) of the formation energy predictions by the RF and NN models are 0.055 eV/atom and 0.071 eV/atom, respectively. This indicates that the model predictions are highly accurate and can replace complete DFT calculations to a certain extent. In the predictive analysis of system outside the test set, six representative components are selected from the material project database, covering binary, ternary, and quaternary systems. The prediction errors of all compositions are controlled within 0.5 eV/atom, with an error percentage of lower than 25%, indicating that the model has strong ability of extrapolation and prediction. When predicting the binary phase diagrams of rare-earth compounds La-Al and Ce-H by using the trained models, the convex hull phase diagrams constructed through the ensemble learning architecture, which combines the prediction results of the RF and NN models, are highly consistent with those constructed from the open quantum materials database. The models successfully capture several metastable phases that are not present in multiple databases. Moreover, the convex hull distances of the predicted phases are mostly less than 0.1 eV/atom, with the maximum not exceeding 0.2 eV/atom. In conclusion, this study successfully uses ML models to predict the thermodynamic stability of rare-earth compounds. The constructed models demonstrate strong capabilities in classification and regression tasks. The ensemble learning architecture effectively improves the model performance, providing a promising tool for discovering materials in the field of rare-earth science, contributing to the research and development of new rare-earth compounds, and designing advanced materials. -
Keywords:
- thermodynamic stability /
- rare earth compounds /
- machine learning /
- ensemble learning
-
图 1 (a)数据集元素流行分布; (b)数据集稀土元素统计分布柱状图; (c)带有ICSD标签的数据集稀土元素统计分布柱状图; (d) 带有ICSD标签的数据集稀土元素统计分布柱状图
Figure 1. (a) Popular distribution of elements in the dataset; (b) statistical distribution histograms of rare earth elements in the dataset; (c) a histogram of the statistical distribution of rare earth elements in a dataset labeled with ICSD; (d) statistical distribution histograms of rare earth elements in datasets with ICSD labels.
图 2 (a)数据集的形成能分布; (b)数据集材料到凸包的距离统计图; (c)带有ICSD标签的数据集的形成能分布; (d)带有ICSD标签的数据集材料到凸包的距离统计图
Figure 2. (a) Statistical chart of the formation energy distribution of the dataset; (b) statistical graph of the distance from the dataset material to the convex hull; (c) statistical graph of formation energy distribution for datasets with ICSD labels; (d) statistical graph of distance from material to convex hull in dataset with ICSD label.
图 5 化合物稳定性的分类结果 (a) RF和(d) NN模型的混淆矩阵; (b) RF和(e) NN模型的受试者工作特征曲线(ROC); (c) RF和(f) NN模型的精确率-召回率(P-R)曲线
Figure 5. Classification results of compound stability: (a) RF and (d) NN model confusion matrices; (b) RF and (e) NN model receiver operating characteristic (ROC) curves; (c) RF and (f) NN model precision-recall (P-R) curves.
图 6 集成学习架构预测出的 (a) La-Al和(b) Ce-H二元体系的凸包相图; 黑色实线代表了凸包边界, 绿色点代表稳定的组分(凸包能量距离等于0 eV/atom), 红色点代表亚稳定的组分(凸包能量距离小于0.2 eV/atom)
Figure 6. Ensemble learning architecture-predicted convex hull phase diagrams of (a) La-Al and (b) Ce-H binary systems; the black solid line represents the boundaries of the convex hull, the green dots represent the stabilized components (the distance to the convex hull equal to 0 eV/atom), and the red dots represent the sub-stabilized components (the distance to the convex hull less than 0.2 eV/atom).
表 1 使用ML模型预测以及DFT计算得到的组分形成能
Table 1. Formation energies of the compositions calculated using ML model and DFT.
组分 ML/(eV·atom–1) DFT/(eV·atom–1) 误差
百分比/%EuH2 –0.58 –0.687 15.6 Tb2O3 –3.52 –3.982 11.6 CeSi –0.58 –0.749 22.6 NdVO3 –3.14 –3.221 2.5 PrH3O3 –1.97 –2.199 10.4 LaP3H3O10 –2.22 –1.942 14.3 表 2 预测组分的形成能(Ef)和和凸包能量距离(Ehull)
Table 2. Formation enthalpy (Ef) and distance to the convex hull (Ehull) of predicted compositions.
组分 Ef /(eV·atom–1) Ehull/(eV·atom–1) Ce2H3 –0.531 0.0038 Ce3H8 –0.525 0.0082 CeH5 –0.143 0.1882 La5Al9 –0.411 0.0736 La7Al10 –0.414 0.0419 La4Al5 –0.407 0.0316 La2Al5 –0.375 0.0945 La9Al4 –0.244 0.008 -
[1] Dutta T, Kim K H, Uchimiya M, Kwon E E, Jeon B H, Deep A, Yun S T 2016 Environ. Res. 150 182
Google Scholar
[2] Ramos S J, Dinali G S, Oliveira C, Martins G C, Moreira C G, Siqueira J O, Guilherme L R G 2016 Curr. Pollut. Rep. 2 28
Google Scholar
[3] 杜志勇, 沈丽萍, 王清 2025 现代肿瘤医学 33 1
Google Scholar
Du Z Y, Shen L P, Wang Q 2025 J. Mod. Oncol. 33 1
Google Scholar
[4] Meng S Y, Li G, Wang P, He M, Sun X H, Li Z X 2023 Mater. Chem. Front. 7 806
Google Scholar
[5] Zheng B Z, Fan J Y, Chen B, Qin X, Wang J, Wang F, Deng R R, Liu X G 2022 Chem. Rev. 122 5519
Google Scholar
[6] 陈娇, 赵超宇, 刘冬 2024 热加工工艺 53 11
Chen J, Zhao C Y, Liu D 2024 Hot Work. Technol. 53 11
[7] 刘贵立 2006 物理学报 55 6570
Google Scholar
Liu G L 2006 Acta Phys. Sin. 55 6570
Google Scholar
[8] 张国英, 张辉, 魏丹, 罗志成, 李昱材 2009 物理学报 58 444
Google Scholar
Zhang G Y, Zhang H, Wei D, Luo Z C, Li Y C 2009 Acta Phys. Sin. 58 444
Google Scholar
[9] Agrawal A, Choudhary A 2016 APL Mater. 4 053208
Google Scholar
[10] Pham T L, Nguyen N D, Nguyen V D, Kino H, Miyake T, Dam H C 2018 J. Chem. Phys. 148 204106
Google Scholar
[11] Pilania G, Liu X Y, Wang Z 2019 J. Mater. Sci. 54 8361
Google Scholar
[12] Singh P, Del Rose T, Vazquez G, Arroyave R, Mudryk Y 2022 Acta Mater. 229 117759
Google Scholar
[13] 张桥, 谭薇, 宁勇祺, 聂国政, 蔡孟秋, 王俊年, 朱慧平, 赵宇清 2024 物理学报 73 230201
Google Scholar
Zhang Q, Tan W, Ning Y Q, Nie G Z, Cai M Q, Wang J N, Zhu H P, Zhao Y Q 2024 Acta Phys. Sin. 73 230201
Google Scholar
[14] Lotfi S, Zhang Z, Viswanathan G, Fortenberry K, Mansouri Tehrani A, Brgoch J 2020 Matter 3 261
Google Scholar
[15] Schmidt J, Shi J, Borlido P, Chen L, Botti S, Marques M A L 2017 Chem. Mater. 29 5090
Google Scholar
[16] Talapatra A, Uberuaga B P, Stanek C R, Pilania G 2021 Chem. Mater. 33 845
Google Scholar
[17] Li W, Jacobs R, Morgan D 2018 Comput. Mater. Sci. 150 454
Google Scholar
[18] Odabaşı Ç, Yıldırım R 2020 Sol. Energy Mater. Sol. Cells 205 110284
Google Scholar
[19] Batra R, Chen C, Evans T G, Walton K S, Ramprasad R 2020 Nat. Mach. Intell. 2 704
Google Scholar
[20] Qin C L, Liu J D, Yu Y S, Xu Z H, Du J G, Jiang G, Zhao L 2024 Ceram. Int. 50 1220
Google Scholar
[21] Kirklin S, Saal J E, Meredig B, Thompson A, Doak J W, Aykol M, Rühl S, Wolverton C 2015 npj Comput. Mater. 1 15010
Google Scholar
[22] Zagorac D, Muller H, Ruehl S, Zagorac J, Rehme S 2019 J. Appl. Crystallogr. 52 918
Google Scholar
[23] Ward L, Agrawal A, Choudhary A, Wolverton C 2016 npj Comput Mater 2 16028
Google Scholar
[24] Ward L, Dunn A, Faghaninia A, Zimmermann N E R, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K A, Snyder G J, Foster I, Jain A 2018 Comput. Mater. Sci. 152 60
Google Scholar
[25] Yang C, Ren C, Jia Y F, Wang G, Li M J, Lu W C 2022 Acta Mater. 222 117431
Google Scholar
[26] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E 2011 J. Mach. Learn. Res. 12 2825
[27] Bartel C J, Trewartha A, Wang Q, Dunn A, Jain A, Ceder G 2020 npj Comput. Mater. 6 97
Google Scholar
[28] Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson K A 2013 APL Mater. 1 011002
Google Scholar
[29] Jha D, Ward L, Paul A, Liao W K, Choudhary A, Wolverton C, Agrawal A 2018 Sci. Rep. 8 17593
Google Scholar
Metrics
- Abstract views: 397
- PDF Downloads: 15
- Cited By: 0