高质量的材料科学文本挖掘数据集构建方法

刘悦; 刘大晖; 葛献远; 杨正伟; 马舒畅; 邹喆乂; 施思齐

doi:10.7498/aps.72.20222316

摘要

科学文献中蕴含的大量历史数据和经验知识, 对材料设计与研发具有重要参考价值. 文本挖掘尽管能高效地探索并利用被存储在海量科学文献中的信息, 但高质量文本数据的获取困难阻碍了其在材料领域更广泛的应用. 本文从品质和数量双视角剖析了材料领域的文本数据质量问题及其相关研究工作, 提出高质量的材料科学文本挖掘数据集构建方法. 该方法通过可溯源的文献自动获取方案确保文本数据的源头可追溯; 以下游任务为驱动对文献进行预处理以提升预标注文本语料的质量; 基于材料四面体准则定义适配全体系的标签注释方案以完成对语料的高品质标注; 利用融合材料领域知识的有条件文本数据增强模型实现材料文本数据量的扩充. 在不同体系数据集上的实验结果表明, 该方法可有效地提升下游文本挖掘模型的预测精度, 其中在NASICON型固态电解质材料实体识别任务上的F1值达84%. 本文为文本挖掘在材料领域的深入应用提供理论指导和解决方案, 并有望推进数据与知识双向驱动的材料设计与研发.

关键词:

Abstract

Numerous data and knowledge generated and stored as text in peer-reviewed scientific literature are important for materials research and development. Although text mining can automatically explore this information, the barriers of acquiring high-quality textual data prevent its general application in materials science. Herein, we systematically analyze the issues of textual DATA QUALITY and related research from the perspectives of data quality and quantity. Following this, we propose a pipeline to construct high-quality datasets for text mining in materials science. In this pipeline, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is used to generate high-quality pre-annotated corpora conditioned on the characteristics of material texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating material domain knowledge (cDA-DK) is constructed to augment the data quantity. Experimental results on datasets with various material systems demonstrate that our method can effectively improve the accuracy of downstream models and the F1-score towards the named entity recognition task in NASICON-type solid electrolyte material reaches 84%. This study provides an important insight into the general application of text mining in materials science, and is expected to advance the material design and discovery driven by data and knowledge bidirectionally.

Keywords:

作者及机构信息

1.
上海大学计算机工程与科学学院, 上海　200444

2.
上海大学材料科学与工程学院, 上海　200444

3.
上海大学材料基因组工程研究院, 上海　200444

4.
上海市智能计算系统工程技术研究中心, 上海　200444

5.
湘潭大学材料科学与工程学院, 湘潭　411105

通信作者: 施思齐, sqshi@shu.edu.cn

基金项目: 国家重点研发计划(批准号: 2021YFB3802101)和国家自然科学基金(批准号: 92270124, 52073169, 52102313)资助的课题.

Authors and contacts

1.
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

2.
School of Materials Science and Engineering, Shanghai University, Shanghai 200444, China

3.
Materials Genome Institute, Shanghai University, Shanghai 200444, China

4.
Shanghai Engineering Research Center of Intelligent Computing System, Shanghai 200444, China

5.
School of Materials Science and Engineering, Xiangtan University, Xiangtan 411105, China

Corresponding author: Shi Si-Qi, sqshi@shu.edu.cn

Funds: Project supported by the National Key Research and Development Program of China (Grant No. 2021YFB3802101), and the National Natural Science Foundation of China (Grant Nos. 92270124, 52073169, 52102313).

文章全文

补充材料

补充材料-7-20222316-070701.pdf

参考文献

[1]	Gupta T, Zaki M, Krishnan N M A, Mausam 2022 npj Comput. Mater. 8 102 Google Scholar
[2]	Olivetti E A, Cole J M, Kim E, Kononova O, Ceder G, Han T Y J, Hiszpanski A M 2020 Appl. Phys. Rev. 7 041317 Google Scholar
[3]	Venugopal V, Sahoo S, Zaki M, Agarwal M, Gosvami N N, Krishnan N M A 2021 Patterns 2 100290 Google Scholar
[4]	Kononova O, He T, Huo H, Trewartha A, Olivetti E A, Ceder G 2021 iScience 24 102155 Google Scholar
[5]	Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E 2017 Chem. Mater. 29 9436 Google Scholar
[6]	Mysore S, Jensen Z, Kim E, Huang K, Chang H S, Strubell E, Flanigan J, McCallum A, Olivetti E 2019 Proceedings of the 13th Linguistic Annotation Workshop Florence, Italy, August 1, 2019 p56
[7]	Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, Persson K A, Ceder G, Jain A 2019 Nature 571 95 Google Scholar
[8]	Vaucher A C, Zipoli F, Geluykens J, Nair V H, Schwaller P, Laino T 2020 Nat. Commun. 11 3601 Google Scholar
[9]	Nie Z, Zheng S, Liu Y, Chen Z, Li S, Lei K, Pan F 2022 Adv. Funct. Mater. 32 2201437 Google Scholar
[10]	Wang W R, Jiang X, Tian S H, Liu P, Dang D P, Su Y J, Lookman T, Xie J X 2022 npj Comput. Mater. 8 9 Google Scholar
[11]	Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson K A, Ceder G, Jain A 2019 J. Chem. Inf. Model. 59 3692 Google Scholar
[12]	Friedrich A, Adel H, Tomazic F, Hingerl J, Benteau R, Maruscyk A, Lange L 2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Seattle, Washington, July 5–10, 2020 p1255
[13]	He T, Sun W, Huo H, Kononova O, Rong Z, Tshitoyan V, Botari T, Ceder G 2020 Chem. Mater. 32 7861 Google Scholar
[14]	Beal M S, Hayden B E, Le Gall T, Lee C E, Lu X, Mirsaneh M, Mormiche C, Pasero D, Smith D C, Weld A, Yada C, Yokoishi S 2011 ACS Comb. Sci. 13 375 Google Scholar
[15]	Rajan A C, Mishra A, Satsangi S, Vaish R, Mizuseki H, Lee K R, Singh A K 2018 Chem. Mater. 30 4031 Google Scholar
[16]	刘悦, 邹欣欣, 杨正伟, 施思齐 2022 硅酸盐学报 50 863 Google Scholar Liu Y, Zou X X, Yang Z W, Shi S Q 2022 J. Chin. Ceram. Soc. 50 863 Google Scholar
[17]	赵凯琳, 靳小龙, 王元卓 2021 软件学报 32 349 Google Scholar Zhao K L, Jin X L, Wang Y Z 2021 J. Software 32 349 Google Scholar
[18]	Wei J, Zou K 2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9 th International Joint Conference on Natural Language Processing Hong Kong, China, November 3–7, 2019 p6382
[19]	Morris J X, Lifland E, Yoo J Y, Grigsby J, Jin D, Qi Y 2020 Proceedings of the 2020 EMNLP (Systems Demonstrations) Punta Cana, Dominican Republic, November 16–20, 2020 p119
[20]	Malandrakis N, Shen M, Goyal A, Gao S, Sethi A, Metallinou A 2019 Proceedings of the 3rd Workshop on Neural Gene ration and Translation (WNGT 2019) Hong Kong, China, November 4, 2019 p90
[21]	Wu X, Lü S W, Zang L J, Han J Z, Hu S L 2019 Computational Science–ICCS 2019 (Cham: Springer Nature Switzerland AG) p84
[22]	Kumar V, Choudhary A, Cho E 2021 arXiv: 2003.02245 [cs. CL]
[23]	Xu X, Lei Y, Li Z 2020 IEEE Trans. Ind. Electron. 67 2326 Google Scholar
[24]	Shinyama Yhttps://euske.github.io/pdfminer/ [2022-11-20]
[25]	Jessop D M, Adams S E, Willighagen E L, Hawizy L, Murray-Rust P 2011 J. Cheminf. 3 41 Google Scholar
[26]	Hawizy L, Jessop D M, Adams N, Murray-Rust P 2011 J. Cheminf. 3 17 Google Scholar
[27]	Swain M C, Cole J M 2016 J. Chem. Inf. Model. 56 1894 Google Scholar
[28]	Sun C C 2009 J. Pharm. Sci. 98 1671 Google Scholar
[29]	Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D 1999 Natural Language Processing Using Very Large Corpora (Berlin: Springer) pp157–176
[30]	Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V 2019 arXiv: 1907.11692 [cs. CL]
[31]	Chen S, Wu C, Shen L, Zhu C, Huang Y, Xi K, Maier J, Yu Y 2017 Adv. Mater. 29 1700431 Google Scholar
[32]	肖睿娟, 李泓, 陈立泉 2018 物理学报 67 128801 Google Scholar Xiao R J, Li H, Chen L Q 2018 Acta Phys. Sin. 67 128801 Google Scholar
[33]	Liu Y, Ge X Y, Yang Z W, Sun S Y, Liu D H, Avdeev M, Shi S Q 2022 J. Power Sources 545 231946 Google Scholar

施引文献

图 1 高质量材料文本挖掘数据集构建管道

Fig. 1. The pipeline for constructing high-quality datasets for materials text mining.

下载: 全尺寸图片幻灯片

图 2 文献的数据与过程溯源示意图

Fig. 2. The illustration of the traceability of literature data and process.

下载: 全尺寸图片幻灯片

图 3 实体关系标注流程示意图

Fig. 3. The process of annotation on entities and relations.

下载: 全尺寸图片幻灯片

图 4 基于cDA-DK的材料文本数据增强

Fig. 4. Materials textual data augmentation based on cDA-DK.

下载: 全尺寸图片幻灯片

图 5 两份数据集的样本统计情况对比　(a) 三元组个数分布情况; (b) 语句长度分布情况

Fig. 5. Comparison of sample statistics of two datasets: (a) The distribution of numbers of triplets; (b) the distribution of length of sentence.

下载: 全尺寸图片幻灯片

图 6 实体识别模型在不同数据集上的混淆矩阵　(a) Dataset 1的混淆矩阵; (b) Dataset 2的混淆矩阵

Fig. 6. Confusion matrix of NER model on various datasets: (a) The confusion matrix of Dataset 1; (b) the confusion matrix of Dataset 2.

下载: 全尺寸图片幻灯片

图 7 MatBERT-BiLSTM-CRF在不同数据集上的训练及验证Loss变化曲线　(a) Dataset 1上的Loss变化曲线; (b) Dataset 2上的Loss变化曲线; (c) Dataset 4上的Loss变化曲线; (d) Dataset 5上的Loss变化曲线

Fig. 7. The training and validation loss function of MatBERT-BiLSTM-CRF on various datasets: (a) The loss function on Dataset 1; (b) the loss function on Dataset 2; (c) the loss function on Dataset 4; (d) the loss function on Dataset 5.

下载: 全尺寸图片幻灯片

图 8 对激活能预测起关键影响的部分描述符, 其中虚线表示尚未被研究的潜在描述符^[33]

Fig. 8. Partial descriptor entities that are critical for predicting activation energy, of which dotted lines indicate potential ones still to be developed^[33].

下载: 全尺寸图片幻灯片

表 1 材料科学文本语料获取方式对比

Table 1. Comparison of acquisition methods of materials scientific corpus.

获取方式	数据库	文档类型	访问权限	文档数量	参考
索引数据库 API	CAplus	论文, 专利, 报告	订阅	少	www.cas.org/support/documentation/references
	DOAJ	论文	部分订阅	少	doaj.org
	PubMed Central	论文	开放获取	较少	www.ncbi.nlm.nih.gov/pmc
	Science Direct	论文	订阅	少	dev.elsevier.com/api_docs.html
	Scopus	摘要	开放获取	较少	dev.elsevier.com/api_docs.html
	Springer Nature	论文, 书籍	订阅	少	dev.springernature.com/
网络爬虫	网页	论文, 专利, 报告, 书籍	开放获取	多	requests.readthedocs.io, crummy.com/software/BeautifulSoup

下载: 导出CSV

表 2 化学与材料科学中常用的自然语言处理工具

Table 2. Common natural language processing tools in chemistry and materials science.

名称	适用范围	是否开源	版本迭代	功能完备性	难易性	友好性
OSCAR4^[25]	化学反应和生物化学	是	快	中	普通	中
ChemicalTagger^[26]	化学合成作用和条件	是	慢	中	普通	中
ChemDataExtractor^[27]	通用化学和材料科学领域	是	快	高	容易	高

下载: 导出CSV

表 3 已有材料文本挖掘研究中的实体标签定义对比

Table 3. Comparison of entity label definitions in previous materials text mining research.

来源	目标	标签数	标签类别	适用领域	应用实例
Weston等^[11]	构建材料领域最新研究结果与历史文献的关联	7	无机材料, 相结构, 描述符, 属性, 应用, 合成方法, 表征方法	无机材料	目标材料检索, 文献搜索与总结, 元信息分析
He等^[13]	从无机固相合成反应文献中挖掘反应前体信息	3	材料, 合成反应前体, 目标化合物	无机固相合成反应	固相合成反应前体数据挖掘, 元信息分析
Friedrich等^[12]	标注科学出版物中与SOFCs 实验相关的信息	4(SOFC) 17(SOFC-slot)	实验, 材料, 数值, 应用等	电池材料	构建SOFCs科学语料库并用于多个实验信息提取任务
Wang等^[10]	从文献中自动挖掘出数据驱动的材料设计模型所需的高质量可靠数据	6	元素, 合金命名实体, 成分含量, 属性描述符, 属性值, 其他	合金材料	钴基单晶高温合金${ {\rm{\gamma } } }'$ 相固溶温度预测
Nie等^[9]	构建语义表示框架以探索潜在的锂离子电池阴极材料	3	无机材料, 锂离子电池阴极材料, 属性描述符	电池材料	新型锂离子电池阴极材料设计与寻优

下载: 导出CSV

表 4 面向通用领域的材料实体类型定义

Table 4. The definition of materials entity types in the general domain.

实体标签	定义	示例
Composition	与化学式有关的内容; 描述材料内部与含量相关的内容等.	NaCl, CaCl₂; Na concentration, Electrons charge carriers.
Structure	晶体结构; 相; 用于刻画晶体结构的名称等.	Fcc, Phase; Bottleneck, Channel, Path.
Property	带单位的可度量值; 材料表现出来定性的性质或现象; 描述材料产生物理/化学行为或物理/化学机制的名词等.	Conductivity, Activation, Radius; Ferroelectric, Metallic; Phase transition, Ionic reaction.
Processing	材料合成技术或加工工艺; 材料改性手段等.	Solid state reaction, Annealing; Doping.
Characterization	用于表征材料的任何实验、理论、模型或公式等.	XRD, STM, Photoluminescence, DFT; Bethe-Salpeter equation.
Application	任何高级的应用; 任何特定的器件、系统等.	Cathode, Photovoltaics; Battery Management System.
Feature	样品类型、形状的特殊描述.	Single crystal, Bulk, nanotube, Quantum dot.
Condition	描述材料所处的环境或外部条件.	980 $°{\rm{C}}$, 1000 MPa.

下载: 导出CSV

表 5 面向通用领域的材料实体关系类型定义

Table 5. The definition of materials relation types in the general domain.

关系标签 (A to B)	定义	可能存在此关系的实体类型
Cause-Effect	A对B有影响	Property-Property, Composition-Structure, Structure-Property, ...
Component-Whole	A是B的部分	Composition-Composition, ...
Feature-Of	A是B的特征	Feature-Composition, Feature-Application, ...
Located-Of	A占据了B位置	Composition-Structure, ...
Instance-Of	A是B的实例	Composition-Composition, Structure-Structure, Property-Property, ...
Condition-On	A的条件是B	Processing-Condition, ...
Method-Of	A的表征方法是B	Property-Characterization, ...
Other	A与B存在除上述关系类型外的其他关系	—

下载: 导出CSV

表 6 常用文本标注工具对比

Table 6. Comparison of common tools for text annotation.

标注工具	适配任务	文本要求	角色管理权限	难易性	友好性	可扩展性	参考
Label Studio	多模态信息标注	严格	不完善	普通	中	中	labelstud.io
Brat	关系标注	一般	完善	普通	中	低	github.com/nlplab/brat
Doccano	文本分类	严格	较完善	普通	低	低	github.com/doccano
EasyData	实体与关系标注	一般	完善	容易	高	高	ai.baidu.com/easydata/

下载: 导出CSV

算法1　数据增强方法cDA-DK
输入　原始数据集$ {D}_{{\rm{t}}{\rm{r}}{\rm{a}}{\rm{i}}{\rm{n}}} = \{\left({x}_{1}, {y}_{1}\right), \left({x}_{2}, {y}_{2}\right), \dots , \left({x}_{n}, {y}_{n}\right)\} $ 　　预训练语言模型模型$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $
材料领域词典$ C=\{{w}_{1}, {w}_{2}, \dots , {w}_{m}\} $输出　增强数据集$ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}} $
1: 开始
2: for $ {w}_{i}\in C $ do3: 　　$ {w}_{i} $输入至$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $的词汇表并训练其对应的词向量
4: 在下游任务文本数据增强上微调$ {P}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $得到$ {F}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}} $
5: 初始化$ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}=\left\{\right\} $
6: for $ \left\{{x}_{i}, {y}_{i}\right\}\in {D}_{{\rm{t}}{\rm{r}}{\rm{a}}{\rm{i}}{\rm{n}}} $ do
7:　　$ ({\widehat{x}}_{i}, {\widehat{y}}_{i})={F}_{{\rm{D}}{\rm{i}}{\rm{s}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{R}}{\rm{o}}{\rm{B}}{\rm{E}}{\rm{R}}{\rm{T}}{\rm{a}}}({x}_{i}, {y}_{i}) $ // 生成新的样本
8:　　$ {D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}={D}_{{\rm{s}}{\rm{y}}{\rm{n}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{t}}{\rm{i}}{\rm{c}}}\cup ({\widehat{x}}_{i}, {\widehat{y}}_{i}) $ // 生成样本加入增强数据集
9: 结束

下载: 导出CSV

表 7 NASICON实体关系数据集与CoNLL-2004数据集的对比

Table 7. Comparison of the NASICON dataset with the CoNLL-2004 dataset.

数据集	样本数	实体类型	实体数	关系类型	关系数
CoNLL-2004	1, 441	4	5, 347	5	2, 020
NASICON	2, 434	8	4, 857	8	2, 297

下载: 导出CSV

表 8 NASICON实体关系数据集在增强前后的数据示例对比

Table 8. Comparison of samples before and after augmentation of NASICON dataset.

数据集	样本数	实体数	关系数	示例
原始数据集	2434	4857	2297	The (O) ionic (B-Property) conductivity (I-Property) decreases (O) with (O) increasing (O) activation (B-Property) energy (I-Property) . (O)
cDA-DK 增强数据集	4846	9714	4594	The (O) electrode (B-Property) conductivity (I-Property) decreases (O) with (O) increasing (O) electric (B-Property) energy (I-Property) . (O)

下载: 导出CSV

表 9 实验数据集信息

Table 9. The details of experimental datasets.

数据集名称	应用领域	重命名	样本量	语料规模	来源
NASICON 实体识别数据集	NASICON 型固态电解质	Dataset 1	2, 434	55篇文献	领域专家标注
		Dataset 2	2, 434	—	数据增强
		Dataset 3	305	35篇文献	非专业人员标注
Matscholar^[11]	无机材料	Dataset 4	5, 459	800份摘要	领域专家标注
Matscholar^[11]	无机材料	Dataset 5	5, 459	—	数据增强

下载: 导出CSV

表 10 实体识别模型在不同材料数据集上的实验结果

Table 10. The results of NER model on various materials datasets.

数据集	材料类别	样本量	Precision	Recall	F1-score
Dataset 1	NASICON 型固态电解质	2, 434	0.78	0.83	0.80
Dataset 2		2, 434	0.68	0.72	0.70
Dataset 2+3		2, 739	0.83	0.85	0.84
Dataset 4	无机材料	5, 459	0.86	0.90	0.88
Dataset 5	无机材料	5, 459	0.75	0.78	0.77

下载: 导出CSV

[1]	Gupta T, Zaki M, Krishnan N M A, Mausam 2022 npj Comput. Mater. 8 102 Google Scholar
[2]	Olivetti E A, Cole J M, Kim E, Kononova O, Ceder G, Han T Y J, Hiszpanski A M 2020 Appl. Phys. Rev. 7 041317 Google Scholar
[3]	Venugopal V, Sahoo S, Zaki M, Agarwal M, Gosvami N N, Krishnan N M A 2021 Patterns 2 100290 Google Scholar
[4]	Kononova O, He T, Huo H, Trewartha A, Olivetti E A, Ceder G 2021 iScience 24 102155 Google Scholar
[5]	Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E 2017 Chem. Mater. 29 9436 Google Scholar
[6]	Mysore S, Jensen Z, Kim E, Huang K, Chang H S, Strubell E, Flanigan J, McCallum A, Olivetti E 2019 Proceedings of the 13th Linguistic Annotation Workshop Florence, Italy, August 1, 2019 p56
[7]	Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, Persson K A, Ceder G, Jain A 2019 Nature 571 95 Google Scholar
[8]	Vaucher A C, Zipoli F, Geluykens J, Nair V H, Schwaller P, Laino T 2020 Nat. Commun. 11 3601 Google Scholar
[9]	Nie Z, Zheng S, Liu Y, Chen Z, Li S, Lei K, Pan F 2022 Adv. Funct. Mater. 32 2201437 Google Scholar
[10]	Wang W R, Jiang X, Tian S H, Liu P, Dang D P, Su Y J, Lookman T, Xie J X 2022 npj Comput. Mater. 8 9 Google Scholar
[11]	Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson K A, Ceder G, Jain A 2019 J. Chem. Inf. Model. 59 3692 Google Scholar
[12]	Friedrich A, Adel H, Tomazic F, Hingerl J, Benteau R, Maruscyk A, Lange L 2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics Seattle, Washington, July 5–10, 2020 p1255
[13]	He T, Sun W, Huo H, Kononova O, Rong Z, Tshitoyan V, Botari T, Ceder G 2020 Chem. Mater. 32 7861 Google Scholar
[14]	Beal M S, Hayden B E, Le Gall T, Lee C E, Lu X, Mirsaneh M, Mormiche C, Pasero D, Smith D C, Weld A, Yada C, Yokoishi S 2011 ACS Comb. Sci. 13 375 Google Scholar
[15]	Rajan A C, Mishra A, Satsangi S, Vaish R, Mizuseki H, Lee K R, Singh A K 2018 Chem. Mater. 30 4031 Google Scholar
[16]	刘悦, 邹欣欣, 杨正伟, 施思齐 2022 硅酸盐学报 50 863 Google Scholar Liu Y, Zou X X, Yang Z W, Shi S Q 2022 J. Chin. Ceram. Soc. 50 863 Google Scholar
[17]	赵凯琳, 靳小龙, 王元卓 2021 软件学报 32 349 Google Scholar Zhao K L, Jin X L, Wang Y Z 2021 J. Software 32 349 Google Scholar
[18]	Wei J, Zou K 2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9 th International Joint Conference on Natural Language Processing Hong Kong, China, November 3–7, 2019 p6382
[19]	Morris J X, Lifland E, Yoo J Y, Grigsby J, Jin D, Qi Y 2020 Proceedings of the 2020 EMNLP (Systems Demonstrations) Punta Cana, Dominican Republic, November 16–20, 2020 p119
[20]	Malandrakis N, Shen M, Goyal A, Gao S, Sethi A, Metallinou A 2019 Proceedings of the 3rd Workshop on Neural Gene ration and Translation (WNGT 2019) Hong Kong, China, November 4, 2019 p90
[21]	Wu X, Lü S W, Zang L J, Han J Z, Hu S L 2019 Computational Science–ICCS 2019 (Cham: Springer Nature Switzerland AG) p84
[22]	Kumar V, Choudhary A, Cho E 2021 arXiv: 2003.02245 [cs. CL]
[23]	Xu X, Lei Y, Li Z 2020 IEEE Trans. Ind. Electron. 67 2326 Google Scholar
[24]	Shinyama Yhttps://euske.github.io/pdfminer/ [2022-11-20]
[25]	Jessop D M, Adams S E, Willighagen E L, Hawizy L, Murray-Rust P 2011 J. Cheminf. 3 41 Google Scholar
[26]	Hawizy L, Jessop D M, Adams N, Murray-Rust P 2011 J. Cheminf. 3 17 Google Scholar
[27]	Swain M C, Cole J M 2016 J. Chem. Inf. Model. 56 1894 Google Scholar
[28]	Sun C C 2009 J. Pharm. Sci. 98 1671 Google Scholar
[29]	Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D 1999 Natural Language Processing Using Very Large Corpora (Berlin: Springer) pp157–176
[30]	Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V 2019 arXiv: 1907.11692 [cs. CL]
[31]	Chen S, Wu C, Shen L, Zhu C, Huang Y, Xi K, Maier J, Yu Y 2017 Adv. Mater. 29 1700431 Google Scholar
[32]	肖睿娟, 李泓, 陈立泉 2018 物理学报 67 128801 Google Scholar Xiao R J, Li H, Chen L Q 2018 Acta Phys. Sin. 67 128801 Google Scholar
[33]	Liu Y, Ge X Y, Yang Z W, Sun S Y, Liu D H, Avdeev M, Shi S Q 2022 J. Power Sources 545 231946 Google Scholar

[1]	刘丹, 李渊, 孙若瑄, 漆星, 沈保根. 基于数据挖掘技术的稀土磁性材料研究进展. 物理学报, 2025, 74(13): 137102. doi: 10.7498/aps.74.20250431
[2]	陈鑫洁, 张敬娜, 张慧滔, 夏迪梦, 徐文峰, 朱溢佞, 赵星. 基于CT扫描数据的X射线能谱估计方法. 物理学报, 2023, 72(11): 118701. doi: 10.7498/aps.72.20222307
[3]	林丹樱, 牛敬敬, 刘雄波, 张潇, 张娇, 于斌, 屈军乐. 荧光寿命数据的相量分析及其应用. 物理学报, 2020, 69(16): 168703. doi: 10.7498/aps.69.20200554
[4]	马金龙, 杜长峰, 隋伟, 许向阳. 基于耦合强度的双层网络数据传输能力. 物理学报, 2020, 69(18): 188901. doi: 10.7498/aps.69.20200181
[5]	吴思远, 王宇琦, 肖睿娟, 陈立泉. 电池材料数据库的发展与应用. 物理学报, 2020, 69(22): 226104. doi: 10.7498/aps.69.20201542
[6]	郭淑慧, 吕欣. 网络直播平台数据挖掘与行为分析综述. 物理学报, 2020, 69(8): 088908. doi: 10.7498/aps.69.20191776
[7]	段焰辉, 吴文华, 范召林, 罗佳奇. 基于本征正交分解的气动优化设计外形数据挖掘. 物理学报, 2017, 66(22): 220203. doi: 10.7498/aps.66.220203
[8]	梁铭辉, 郑飞虎, 安振连, 张冶文. 基于Monte Carlo的热脉冲法数据分析. 物理学报, 2016, 65(7): 077702. doi: 10.7498/aps.65.077702
[9]	贾果, 黄秀光, 谢志勇, 叶君建, 方智恒, 舒桦, 孟祥富, 周华珍, 傅思祖. 液氘状态方程实验数据测量. 物理学报, 2015, 64(16): 166401. doi: 10.7498/aps.64.166401
[10]	张新鹏, 胡茑庆, 程哲, 钟华. 基于压缩感知的振动数据修复方法. 物理学报, 2014, 63(20): 200506. doi: 10.7498/aps.63.200506
[11]	苏勇, 范东明, 游为. 利用GOCE卫星数据确定全球重力场模型. 物理学报, 2014, 63(9): 099101. doi: 10.7498/aps.63.099101
[12]	杨富强, 张定华, 黄魁东, 王鹍, 徐哲. CT不完全投影数据重建算法综述. 物理学报, 2014, 63(5): 058701. doi: 10.7498/aps.63.058701
[13]	周文静, 胡文涛, 瞿惠, 朱亮, 于瀛洁. 单幅层析全息图的记录及数据重建. 物理学报, 2012, 61(16): 164212. doi: 10.7498/aps.61.164212
[14]	谭叶, 俞宇颖, 戴诚达, 谭华, 王青松, 王翔. 反向碰撞法测量Bi的低压Hugoniot数据. 物理学报, 2011, 60(10): 106401. doi: 10.7498/aps.60.106401
[15]	洪振杰, 刘荣建, 郭鹏, 董乃铭. 非球对称电离层掩星数据反演. 物理学报, 2011, 60(12): 129401. doi: 10.7498/aps.60.129401
[16]	丛蕊, 刘树林, 马锐. 基于数据融合的多变量相空间重构方法. 物理学报, 2008, 57(12): 7487-7493. doi: 10.7498/aps.57.7487
[17]	周南润, 曾贵华, 龚黎华, 刘三秋. 基于纠缠的数据链路层量子通信协议. 物理学报, 2007, 56(9): 5066-5070. doi: 10.7498/aps.56.5066
[18]	刘新元, 谢柏青, 戴远东, 王福仁, 李壮志, 马平, 谢飞翔, 杨涛, 聂瑞娟. 射频SQUID心磁图数据自适应滤波研究. 物理学报, 2005, 54(4): 1937-1942. doi: 10.7498/aps.54.1937
[19]	杨林保, 杨涛. 陈氏混沌系统的采样数据反馈控制. 物理学报, 2000, 49(6): 1039-1042. doi: 10.7498/aps.49.1039
[20]	王竹溪, 章立源. 由实验数据计算氢气的维里系数问题. 物理学报, 1965, 21(3): 508-518. doi: 10.7498/aps.21.508

补充材料-7-20222316-070701.pdf

计量

文章访问数: 15765
PDF下载量: 330
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

搜索

留言板