-
大语言模型的出现极大地推动了科学研究的进步。以ChatGPT为代表的语言模型和DeepSeek R1为代表的推理模型,为科研范式带来了显著变革。尽管这些模型均为通用型,但它们在电池领域,尤其是固态电池的研究中,展现出强大的泛化能力。在本研究中,我们系统性地筛选了2024年及之前重点期刊中的5,309,268篇文章,精准提取了124,021篇电池相关文献。同时,我们全面检索了欧洲专利局与美国专利局2024年及以前的申请与授权专利,共计17,559,750篇,从中筛选出125,716篇电池相关专利。利用这些文献与专利,我们对语言模型的知识储备、实时学习、指令遵从和结构化输出能力进行了大量实验。通过多维度的模型评估与分析,我们发现:当前的大语言模型在信息分类和数据提取等的精度基本达到了研究生水平,语言模型在内容总结和趋势分析方面也展现出强大的能力。同时,我们也发现模型在极少数情况下可能出现数值幻觉问题。而在处理电池领域海量数据时,模型在工程应用方面仍存在优化空间。我们根据模型的特点和以上测试结果,利用模型提取了无机固态电解质材料数据,包括了离子电导率数据5970条、扩散系数数据387条、迁移势垒数据3094条,此外还包括1000多条化学、电化学、力学等数据,涵盖了无机固态电解质所涉及的几乎所有物理、化学、电化学性质,这也意味着大语言模型对科研的应用已经从辅助科研转向主动促进科研发展阶段。The emergence of large language models has significantly advanced scientific research. Representative models such as ChatGPT and DeepSeek R1 have brought notable transformations to the paradigm of scientific research. While these models are general-purpose, they have demonstrated strong generalization capabilities in the field of batteries, particularly in solid-state battery research. In this study, we systematically screened 5,309,268 articles from key journals up to 2024, accurately extracting 124,021 relevant battery-related papers.Additionally, we comprehensively searched through 17,559,750 patent applications and granted patents from the European Patent Office and the United States Patent and Trademark Office up to 2024, from which we filtered out 125,716 battery-related patents. Utilizing this extensive collection of literature and patents, we conducted numerous experiments to evaluate the knowledge base, in context learning, instruction-following, and structured output capabilities of language models. Through multi-dimensional model evaluations and analyses, we found the following: first, the model exhibited high accuracy in screening literature on inorganic solid-state electrolytes, equivalent to the level of a doctoral student in the relevant field. Based on 10,604 data entries, the model demonstrated good recognition capabilities in identifying literature on in-situ polymerization/solidification technology. However, its understanding accuracy for this emerging technology was slightly lower than that for solid-state electrolytes, requiring further fine-tuning to improve accuracy. Second, through testing with 10,604 data entries, the model achieved reliable accuracy in extracting inorganic ionic conductivity data. Third, based on solid-state lithium battery patents from four companies in South Korea and Japan over the past 20 years, the model proved effective in analyzing historical patent trends and conducting comparative analyses. Furthermore, the model-generated personalized literature reports based on the latest publications also showed high accuracy.Fourth, by leveraging the model's iteration strategies, we enabled DeepSeek to engage in self-thinking, thereby providing more comprehensive responses. The research results indicate that language models possess strong capabilities in content summarization and trend analysis. However, we also observed that the model may occasionally exhibit issues with numerical hallucinations. Additionally, while processing vast amounts of battery-related data, the model still has room for optimization in engineering applications. Based on the characteristics of the model and the above test results, we utilized the DeepSeek V3-0324 model to extract data on inorganic solid electrolyte materials, including 5,970 entries of ionic conductivity, 387 entries of diffusion coefficients, and 3,094 entries of migration barriers. Additionally, it includes over 1,000 entries of data related to chemical, electrochemical, and mechanical properties, covering nearly all physical, chemical, and electrochemical properties associated with inorganic solid electrolytes. This also signifies that the application of large language models in scientific research has transitioned from assisting research to actively advancing its development. The datasets presented in this paper can be acess at the website: https://cmpdc.iphy.ac.cn/literature/SSE.html (DOI: https://doi.org/10.57760/sciencedb.j00213.00172).
-
Keywords:
- Soild state Battery /
- Artificial Intelligence /
- Natural Language Process /
- Materials Database
-
[1] ChatGPT ChatGPT website
[2] DeepSeek-AI, 2025 arXiv 2501.12948
[3] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin 2017 arXiv 1706.03762
[4] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 2018 OpenAI Blog
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 2018 arXiv 1810.04805
[6] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019 OpenAI Blog
[7] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei 2020 arXiv 2005.14165
[8] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. 2022 arXiv 2201.11903
[9] o1模型OpenAI o1 Hub|OpenAI
[10] An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Zeyu Cui, Zhenru Zhang, Zhihao Fan 2024 arXiv 2407.10671
[11] Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang. 2024 arXiv 2406.12793
[12] Gemini Gemini
[13] Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu 2024 arXiv 2402.03216
[14] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela. 2000 arXiv 2005.11401
[15] 吴思远, 王宇琦, 肖睿娟, 陈立泉, 2020物理学报, 69(22): 226104
[16] 离子输运数据库http://e01.iphy.ac.cn/bmd
[17] Xiao R J, Li H, Chen L Q 2015 Sci. Rep. 514227
[18] He B, Chi S, Ye A J, Mi P H, Zhang L W, Pu B W, Zou Z Y, Ran Y B, Zhao Q, Wang D, Zhang W Q, Zhao J T, Adams S, Avdeev M, Shi S 2020 Sci. Data 7151
[19] Fangling Yang, Egon Campos dos Santos, Xue Jia, Ryuhei Sato, Kazuaki Kisu, Yusuke Hashimoto, Shin-ichi Orimo, Hao Li 2024 Nano Materials Science 6256-262
[20] Cameron J. Hargreaves, Michael W. Gaultois, Luke M. Daniels, Emma J. Watts, Vitaliy A. Kurlin, Michael Moran, Yun Dang, Rhun Morris, Alexandra Morscher, Kate Thompson, Matthew A. Wright, Beluvalli-Eshwarappa Prasad, Frédéric Blanc, Chris M. Collins, Catriona A. Crawford, Benjamin B. Duff, Jae Evans, Jacinthe Gamon, Guopeng Han, Bernhard T. Leube, Hongjun Niu, Arnaud J. Perez, Aris Robinson, Oliver Rogan, Paul M. Sharp, Elvis Shoko, Manel Sonni, William J. Thomas, Andrij Vasylenko, Lu Wang, Matthew J. Rosseinsky & Matthew S. Dyer 2023 npj Comput Mater 9, 9
[21] 无机固态电解质材料数据库https://cmpdc.iphy.ac.cn/literature/SSE.html
[22] Siyuan Wu(吴思远), Tiannian Zhu(朱天念), Sijia Tu(涂思佳), Ruijuan Xiao(肖睿娟), Jie Yuan(袁洁), Quansheng Wu(吴泉生), Hong Li(李泓), and Hongming Weng(翁红明) Literature classification and its applications in condensed matter physics and materials science by natural language processing 2024 Chin. Phys. B 33050704
[23] Yong Zhang, Meng-Xiang Xie, Wu Zhang, Jia-Li Yan, Gang-Qin Shao 2020 Materials Letters 266127508
[24] Yuxiang Li, Shugo Daikuhara, Satoshi Hori, Xueying Sun, Kota Suzuki, Masaaki Hirayama, Ryoji Kanno 2020 Chemistry of Materials 328860-8867
[25] Ruochen Xu, Zhang Wu, Shenzhao Zhang, Xiuli Wang, Yan Xia, Xinhui Xia, Xiaohua Huang, Jiangping Tu 2017 Chemistry – A European Journal 2313950-13956
[26] Xuelei Li, Wenxiu Peng, Rongzheng Tian, Dawei Song, Zhenyu Wang, Hongzhou Zhang, Lingyun Zhu, Lianqi Zhang 2020 Electrochimica Acta 363137185
[27] Fan Wu, William Fitzhugh, Luhan Ye, Jiaxin Ning, Xin Li 2018 Nature Communications 94037
[28] MatElab平台https://in.iphy.ac.cn/eln/#/recusertype
计量
- 文章访问数: 21
- PDF下载量: 4
- 被引次数: 0