|本期目录/Table of Contents|

[1]蒋海昆,王锦红.适用于机器学习的地震序列类型判定特征重要性讨论[J].地震研究,2023,46(02):155-172.[doi:10.20015/j.cnki.ISSN1000-0666.2023.0034 ]
 JIANG Haikun,WANG Jinhong.Discussion on the Importance of the Features for the Judgement of Earthquake Sequence Types Applicable to Machine Learning[J].Journal of Seismological Research,2023,46(02):155-172.[doi:10.20015/j.cnki.ISSN1000-0666.2023.0034 ]
点击复制

适用于机器学习的地震序列类型判定特征重要性讨论(PDF/HTML)

《地震研究》[ISSN:1000-0666/CN:53-1062/P]

卷:
46
期数:
2023年02期
页码:
155-172
栏目:
出版日期:
2023-03-10

文章信息/Info

Title:
Discussion on the Importance of the Features for the Judgement of Earthquake Sequence Types Applicable to Machine Learning
作者:
蒋海昆1王锦红2
(1.中国地震台网中心,北京 100045; 2.中国地震局地震预测研究所,北京 100036)
Author(s):
JIANG Haikun1WANG Jinhong2
(1.China Earthquake Networks Center,Beijing 100045,China)(2.Institute of Earthquake Forecasting,China Earthquake Administration,Beijing 100036,China)
关键词:
地震序列类型 机器学习 特征 互信息
Keywords:
earthquake sequence type machine learning feature mutual information
分类号:
P315.7
DOI:
10.20015/j.cnki.ISSN1000-0666.2023.0034
摘要:
基于1970—2021年中国大陆及边邻地区地震目录、地震序列目录和历史地震震源机制资料,参考以往研究和震后趋势预测实践,构建基于地震观测数据的机器学习序列类型判定特征样本数据集。基于地震序列分类,设置多震型、主余型、孤立型3类样本标签,初步提出44个可用于机器学习地震序列类型判定的备选特征,包括主震及震源机制相关参数、历史地震序列类型、序列衰减和G-R关系相关参数、震级及频次相关参数。以44个备选特征为基础,变换震级下限、统计时段等参数,可以扩充出更多的机器学习备选特征。基于特征与标签之间的关联特性,评估特征对序列分类的重要性。宏观来看,震级相关参数、G-R关系和序列衰减相关参数、历史地震序列类型、震源机制相关参数等特征对序列分类有贡献,其中震级相关参数特征与标签之间的互信息值明显较大且排序稳定。补齐缺失特征不但能够增加可用的训练和检验样本,还可明显提升特征与序列类型之间的关联性,这意味着恰当的数据预处理在一定程度上有可能提高特征的序列分类能力。添加原始数据的交互特征是拓展可用特征数量的重要方式之一,非独立特征经信息交互处理之后显示出与序列标签更强的关联性,这意味着特征选择应以模型预测效能的综合评价结果为准,不宜过分强调特征参数的独立性。
Abstract:
Based on the catalog and focal mechanism of earthquakes in Chinese mainland since 1970 and referring to the previous research and practice on estimation of aftershock activity tendency,a feature sample dataset for judgement of earthquake sequence types by machine learning has been constructed.Three labels—multiplet mainshocks type,mainshock-aftershock type,as well as isolated earthquake type—have been set up according to the earthquake sequences.Forty-four alternative features that can be used for machine learning for earthquake sequence type judgement have been proposed preliminarily,including mainshock and focal-mechanism-related parameters,historical earthquake sequence types,sequence decay and G-R relationship-related parameters,magnitude- and frequency-related parameters.Based on the 44 alternative features,more features can be expanded by different threshold magnitude or statistical period.Based on the mutual information between features and labels,the feature importance or contribution rate of feature parameters to sequence classification has been evaluated.In summary,the magnitude-related parameters,G-R relationship,sequence-decay-related parameters,historical earthquake sequence type,focal mechanism related parameters are contributory for sequence classification.Especially,the mutual information between magnitude-related parameters and labels are obviously large and the ranking is stable.Our results show that the complementing of missing features can not only increase the available samples for model training and testing,but also significantly improve the correlation between features and labels,which means that appropriate data preprocessing on features may improve the ability of sequence classification to a certain extent.Adding interactive features of original data is one of the important ways to expand the number of available features,the independent features show a stronger correlation with sequence labels after information interaction processing in this paper,reminding us that the feature selection should be based on the results of efficiency estimation of the final model,and the feature independence should not be overemphasized.

参考文献/References:

-

备注/Memo

备注/Memo:
收稿日期:2022-09-19.
基金项目:地震动力学国家重点实验室开放基金(LED2022B05).
第一作者简介:蒋海昆(1964-),研究员,博士,主要从事余震统计、余震机理及余震预测研究.E-mail:jianghaikun@seis.ac.cn.
更新日期/Last Update: 2023-03-10