[1]JIANG Haikun,WANG Jinhong.Discussion on the Importance of the Features for the Judgement of Earthquake Sequence Types Applicable to Machine Learning[J].Journal of Seismological Research,2023,46(02):155-172.[doi:10.20015/j.cnki.ISSN1000-0666.2023.0034
]
Copy
Journal of Seismological Research[ISSN 1000-0666/CN 53-1062/P] Volume:
46
Number of periods:
2023 02
Page number:
155-172
Column:
Public date:
2023-03-10
- Title:
-
Discussion on the Importance of the Features for the Judgement of Earthquake Sequence Types Applicable to Machine Learning
- Author(s):
-
JIANG Haikun1; WANG Jinhong2
-
(1.China Earthquake Networks Center,Beijing 100045,China)(2.Institute of Earthquake Forecasting,China Earthquake Administration,Beijing 100036,China)
-
- Keywords:
-
earthquake sequence type; machine learning; feature; mutual information
- CLC:
-
P315.7
- DOI:
-
10.20015/j.cnki.ISSN1000-0666.2023.0034
- Abstract:
-
Based on the catalog and focal mechanism of earthquakes in Chinese mainland since 1970 and referring to the previous research and practice on estimation of aftershock activity tendency,a feature sample dataset for judgement of earthquake sequence types by machine learning has been constructed.Three labels—multiplet mainshocks type,mainshock-aftershock type,as well as isolated earthquake type—have been set up according to the earthquake sequences.Forty-four alternative features that can be used for machine learning for earthquake sequence type judgement have been proposed preliminarily,including mainshock and focal-mechanism-related parameters,historical earthquake sequence types,sequence decay and G-R relationship-related parameters,magnitude- and frequency-related parameters.Based on the 44 alternative features,more features can be expanded by different threshold magnitude or statistical period.Based on the mutual information between features and labels,the feature importance or contribution rate of feature parameters to sequence classification has been evaluated.In summary,the magnitude-related parameters,G-R relationship,sequence-decay-related parameters,historical earthquake sequence type,focal mechanism related parameters are contributory for sequence classification.Especially,the mutual information between magnitude-related parameters and labels are obviously large and the ranking is stable.Our results show that the complementing of missing features can not only increase the available samples for model training and testing,but also significantly improve the correlation between features and labels,which means that appropriate data preprocessing on features may improve the ability of sequence classification to a certain extent.Adding interactive features of original data is one of the important ways to expand the number of available features,the independent features show a stronger correlation with sequence labels after information interaction processing in this paper,reminding us that the feature selection should be based on the results of efficiency estimation of the final model,and the feature independence should not be overemphasized.