-
现代地震学研究可以追溯到19世纪早期严谨的弹性理论的创立与发展,19世纪后期地震仪的发明开始为人类探索地球内部结构提供观测数据. 从上世纪70年代以来,伴随着地震观测时空密度的不断增加、计算条件的逐步提高和地震大数据时代的来临,在地震波传播、震源物理、地震预报、地震活动性、地震构造等多个领域日益提升了研究的广度与深度,近200年的发展中取得了巨大进展. 然而,地球结构的复杂性、地球物理观测的间接性以及数字工具的复杂性,为地球物理研究的发展带来极大的挑战. 数据驱动学科与模型驱动技术相结合有助于深入探索地震学领域难题、加速推动新知识的产生. 在这种背景下,机器学习方法就是一个极佳的选择(Bergen et al., 2019; Kong et al., 2019; Reichstein et al., 2019). 机器学习在地震学领域中的应用不断扩大,尤其是近几年,使用机器学习技术开展地震学研究的科技文章发表量大幅增加(图1).
图 1 机器学习在地震学领域中的科技文章年度数量统计图(Scopus数据库)
Figure 1. Yearly number of scientific papers on topic of machine learning in seismology (Scopus Database)
机器学习(Machine Learning,ML)方法基于概率模型对数据进行学习,既有从数据流中提取出复杂的模式和有效特征的能力,也有归纳演绎新特点、新机制的潜能,能够提高对系统的预测能力,是一种与以往数据分析方法互补的实用工具. 地震学中ML应用的两个主要类别是监督学习(supervised learning)和无监督学习(unsupervised learning)(图2). 监督学习需借助标签特征预测建模,可根据不同的输出数据类型(离散型或连续型)进一步分为分类或回归. 例如,监督机器学习分类器可以在一组已被标记为地震事件的波形记录中自动提取特征并用于检测新的地震事件. 无监督学习适合于挖掘隐含信息,基于相似性将目标对象分组(聚类)或是降低输入数据的维度(降维),常用于数据挖掘、模式识别及图像处理. 此外,ML还有半监督学习(并行学习已标注数据集与未标注数据集)(Murphy, 2012; Goodfellow et al., 2016; Draelos et al., 2018),不同的算法具有各自的优缺点.
虽然利用ML的相关方法在微震检测(Giacco et al., 2009; Mousavi et al., 2016; Chen, 2020)、地震分类与定位(Sick, 2015; Li et al., 2018; Perol et al., 2018; Titos et al., 2018)、地震事件预测(Asim et al., 2017; Rouet-Leduc et al., 2017; Mousavi and Beroza, 2020)、地震预警(Li et al., 2016)、地震勘探(Xia et al., 2018)、慢滑移事件检测(Provost et al., 2017)、层析成像(Araya-Polo et al., 2018)和强地面运动预测(Trugman and Shearer, 2018)等方面已经取得了初步成功,但是如何在现有的时空背景条件下从缺乏标签的数据中获得与地球物理过程相耦合的最佳模式仍然是需要进一步解决的问题. 本文将介绍几种常用的监督学习和非监督学习算法,简述相关方法在地震学研究中的应用现状,并对ML未来的发展方向加以展望.
-
监督学习是用已标记的训练数据集来学习数据特征与标签之间的关系,并在新的未标记的数据集中推断出标签从而实现预测的功能. 监督学习方法是目前应用较为广泛的一种ML方法,从初期的线性回归和逻辑回归(Logistic Regression,LR),到朴素贝叶斯算法(naive Bayes algorithm)与支持向量机(Support-Vector Machine,SVM),进而发展到随机森林(Random Forest,RF)和神经网络(Neural Network,NN). 其中,神经网络、支持向量机与随机森林等算法已经在固体地球物理学研究领域展现了实际应用价值.
-
逻辑回归由统计学家Cox(1958)提出,可以对两类或多类因变量进行分类,利用回归模型将样本属性的预测值转化为[0, 1]之间的概率值,获得该样本属于某一类别的相对可能性,其本质上是一种广义线性模型,采用极大似然法估计回归参数,再利用经典的数值优化算法如梯度下降法求得最优解,估计每个样本属于某一类别的概率从而预测样本属性(Press and Wilson, 1978; Hosmer and Lemeshow, 1989; LeCessie, 1992; Kleinbaum and Klein, 2010). 在处理多分类问题时,LR推广为Softmax回归,给定训练集{(x(1), y(1)), (x(2), y(2)), …, (x(i), y(i)), …, (x(m), y(m))},其中特征变量x(i)∈Rn+1,类标签y(i)∈{1, 2, …, k},则Softmax回归模型为(Böhning, 1992; Bishop, 2006):
${h_\theta }({x^{\left(i \right)}}) = \left[ {\begin{array}{*{20}{c}} {p({y^{(i)}} = 1|{x^{(i)}};\theta)}\\ {p({y^{(i)}} = 2|{x^{(i)}};\theta)}\\ \vdots \\ {p({y^{(i)}} = k|{x^{(i)}};\theta)} \end{array}} \right] = \dfrac{1}{{\displaystyle\sum\nolimits_{j = 1}^k {{{\rm{e}}^{\theta _j^{\rm{T}}{x^{(i)}}}}} }}\left[ {\begin{array}{*{20}{c}} {{{\rm{e}}^{\theta _1^{\rm{T}}{x^{(i)}}}}}\\ {{{\rm{e}}^{\theta _2^{\rm{T}}{x^{(i)}}}}}\\ \vdots \\ {{{\rm{e}}^{\theta _k^{\rm{T}}{x^{(i)}}}}} \end{array}} \right]$
(1) 式中,
$\theta = {\left[ {\theta _1^{\rm{T}},\theta _2^{\rm{T}}, \ldots,\theta _k^{\rm{T}}} \right]^{\rm{T}}}$ 为模型参数. 通过最小化θ的负对数似然并添加一个权重衰减项$\dfrac{\lambda }{2}\displaystyle\sum\nolimits_{i = 1}^k {\displaystyle\sum\nolimits_{j = 0}^n } {\theta _{ij}^2}$ (λ > 0)来构建Softmax回归的损失函数(Böhning,1992; Bishop, 2006):$\begin{split} J(\theta) =& - \dfrac{1}{m}\left[ {\sum\nolimits_{i = 1}^m {\sum\nolimits_{j = 1}^k {1\{ } } {y^{(i)}} = j\} \log \dfrac{{{{\rm{e}}^{\theta _j^{\rm{T}}{x^{(i)}}}}}}{{\displaystyle\sum\nolimits_{l = 1}^k {{{\rm{e}}^{\theta _l^{\rm{T}}{x^{(i)}}}}} }}} \right] +\\&\dfrac{\lambda }{2}\sum\nolimits_{i = 1}^k {\sum\nolimits_{j = 0}^n {\theta _{ij}^2} } \end{split} $
(2) $\dfrac{{{\partial} J(\theta)}}{{{\partial} {\theta _j}}} = - \dfrac{1}{m}\left[ {{x^{(i)}}(1\{ {y^{(i)}} = j\} - p({y^{(i)}} = k|{x^{(i)}};\theta))} \right] + \lambda {\theta _j}$
(3) 式中,1{•}是指示函数,当输入{•}为真时,输出为1,反之输出为0. 权重衰减项能确保式J(θ)是关于θ的高阶可导连续凸函数,因此经典的数值优化算法如梯度下降法等均可以保证收敛至全局最优解. 如果使用梯度下降法,其第t+1 轮选代解的更新公式:
$\theta _j^{t + 1} = \theta _j^t - \alpha \dfrac{{\partial J(\theta)}}{{\partial {\theta _j}}}\;\;\;\left({j = 1,2, \ldots,k} \right)$
(4) 式中,
$ \alpha $ 为学习率,用来控制步长.LR分类器是一种相对简单的ML方法,分类结果的正确与否依赖于特征集合与类别的关联性. 在构建特征空间时,选择与类别相关程度高的特征有助于提高分类器灵敏度. 地震波的偏振(Kaur et al., 2013)和频率特征(Rabin, 2016)是两种常用的特征属性. Reynen和Audet(2017)从美国南加州地区地震台网记录中提取偏振和频率属性并构建特征空间,利用LR分类器对地震台网记录的天然地震、爆破和噪声记录数据进行分类,再通过关联每个台站不同信号的分类概率实现了由事件分类向地震检测的功能延伸.
一般来讲,逻辑回归模型的复杂程度要低于人工神经网络和非线性核心的支持向量机等学习方法,因此往往会被当做基准来衡量其它分类器的识别性能(Yilmaz, 2010; Mousavi et al., 2016; Maceda et al., 2018). Xu等(2012)对比了将二元统计、逻辑回归、人工神经网络和支持向量机模型应用于四川省平武县涪江流域的滑坡易发性评价结果,发现逻辑回归对于超参选取比较鲁棒. 该范例表明滑坡易发性评价模型会受到地震、地形、地质和环境等诸多影响因子的调制,丰富了各类学习方法性能对比的相关研究.
-
支持向量机分类属于监督学习算法,当输入样本线性可分或有少量样本不可分时,SVM通过硬间隔最大化或加上松弛变量的软间隔最大化的超平面进行数据划分;当输入样本为线性不可分数据时,无法通过直接划分超平面的方法完成分类任务,可以引入核函数处理非线性分类问题(Cortes and Vapnik, 1995; Vapnik, 1995). SVM最初用于解决经典的二分类问题. 在给定的两类样本中找到分类间隔最大化的最优分类面(超平面)(Vapnik, 1998; Cristianini and Shaw-Taylor, 2000). 距离超平面最近的训练样本点被称为支持向量,两个异类支持向量到超平面的距离之和称为间隔. 寻找最大分类间隔的问题可以转化为用拉格朗日乘数法解含不等式约束的最优化问题,考虑一个由两类y.*?>=>i = ±1构成的训练样本x.*?>=>i (i = 1, 2, …, N),优化目标可以写为(Vapnik,1995; Cristianini and Shaw-Taylor, 2000):
${\min _{{\omega },b,\xi }}\dfrac{1}{2}{\left| {\left| {\omega } \right|} \right|^2} + C\sum\nolimits_{i = 1}^N {{\xi _i}} $
(5) 约束条件为:
${y_i}\left({{{\omega }^{\rm{T}}}{{x}_i} + b} \right) \ge 1 - {\xi _i},\;i = 1,2, \ldots,N$
(6) ${\xi _i} \ge 0,\;\;i = 1,2, \ldots,N$
(7) 式中,ω和b是超平面的法向量和截距,决定了超平面的位置. ω和x的维数相同. ξ.*?>=>i是非负松弛变量,用于放松约束. C是ξ.*?>=>i的权重值. 借助拉格朗日对偶性(Lagrange duality)将原始问题转换为对偶问题,将目标改为求下式最大值(Vapnik,1995; Cristianini and Shaw-Taylor, 2000):
$L\left(\alpha \right) = \sum\nolimits_{i = 1}^N {{\alpha _i}} - \dfrac{1}{2}\sum\nolimits_{i = 1}^N {\sum\nolimits_{j = 1}^N {{\alpha _i}} } {\alpha _j}{y_i}{y_j}{x}_i^{\rm{T}}{{x}_j}$
(8) 受约束于:
$\sum\nolimits_{i = 1}^N {{y_i}} {\alpha _i} = 0,\;\;0 \le {\alpha _i} \le C$
(9) 式中,
${\alpha _i}$ (i = 1, 2, …, N) 是拉格朗日乘数.若样本在原始空间中线性不可分,可将样本映射到一个更高维的特征空间中,使得样本在这个高维空间内线性可分,令Φ(x) 为原始数据x映射后的高维空间数据,引入核函数(Vapnik, 1995; Cristianini and Shaw-Taylor, 2000; Cracknell and Reading, 2013; Shahnas et al., 2018):
$K\left({{{x}_i},{{x}_j}} \right) = \varPhi {\left({{{x}_i}} \right)^{\rm{T}}}\varPhi \left({{{x}_j}} \right)$
(10) 核函数K(x.*?>=>i, x.*?>=>j)能将原始输入空间投影于非线性可分空间中,主要有以下几种(Vapnik,1995; Cristianini and Shaw-Taylor, 2000; Cracknell and Reading, 2013; Shahnas et al., 2018):
线性核函数:
$K\left({{{x}_i},{{x}_j}} \right) = {x}_i^{\rm{T}}{{x}_j}$
(11) 多项式核函数:
$K\left({{{x}_i},{{x}_j}} \right) = {\left({\gamma {x}_i^{\rm{T}}{{x}_j} + r} \right)^d},\gamma > 0$
(12) 径向基核函数:
$K\left({{{x}_i},{{x}_j}} \right) = \exp \left({ - \gamma {{\left| {\left| {{{x}_i} - {{x}_j}} \right|} \right|}^2}} \right)$
(13) S形核函数:
$K\left({{{x}_i},{{x}_j}} \right) = \tanh \left({\gamma {x}_i^{\rm{T}}{{x}_j} + r} \right)$
(14) 式中,γ、r、d是需要人工设定的核参数.
SVM 对超平面的决策只由少数的支持向量所确定,可以降低该算法对于数据量的依赖,已由最初识别的两类训练样本扩展至解决多类分类问题(Giacco et al., 2009; Masotti et al., 2006; Malfante et al., 2018; Tang et al., 2020). 例如Giacco等(2009)训练多层感知器(Multilayer Perceptron,MLP)和支持向量机两种监督模式算法对滑坡、人工爆破和火山微震信号进行识别,经对比发现SVM对各类信号的鉴别能力略高于两层MLP网络. SVM本质为凸二次优化,能够找到最优分类面,已经在连续波形检测(Ruano et al., 2014)、震级快速估算(Reddy and Nair, 2013; Ochoa et al., 2018)、震源深度确定(Gutierrez et al., 2018)以及地震风险评估(Cheng et al., 2014)等领域得到了广泛应用.
-
随机森林是Breiman(2001)提出的一种基于集成学习理论和随机子空间的算法,它的基本单元是决策树,可以用来解决分类和回归任务. 决策树是基于逻辑的监督ML方法,它从一组无次序、无规则的数据中找出决策树表示形式的分类规则,采用自上而下的递归方式从数据中生成分类器(Safavian and Landgrebe, 1991). 但是,这种简单、快速的分类方法较难表达一些复杂的概念,从而导致决策树性能受阻. RF包含多个经训练得到的决策树(森林),通过在森林中进行投票来学习复杂的结构和关系,实现了对分类样本的深入观察,突破了决策树性能受阻的问题,在处理高维数据的分类和回归问题时表现出了较强的性能(Ho, 1998; Cracknell and Reading, 2013; Fernández-Delgado et al., 2014).
近十几年来,RF算法陆续被用于地震事件分类、断层物理性质推算和地震动预测等领域. Rubin等(2012)先后尝试使用了包括RF在内的12种ML算法从瑞士达沃斯郊外山区的地震波形记录中识别雪崩,其中RF、贝叶斯、人工神经网络、支持向量机等7种分类器的准确率可达90%以上. Hibert等(2017)利用法国富尔奈斯火山地区的波形记录,基于从波形、频谱、伪谱图和极化四类属性中提取出的60个特征训练了一个RL分类器,用于辨别火山构造地震和落石事件,经评估后认为当训练集样本数超过300时,RL分类器的正确识别率可达99%,在样本不足情况下的识别率也高于90%. Trugman和Shearer(2018)使用RF算法学习了美国旧金山湾区峰值地面加速度和地震动态应力降之间的关系,以此为基础建立了当地中小地震非参数地面运动预测方程. Rouet-Leduc等(2017)利用从实验室断层连续声发射时间序列资料中提取1 000个统计学特征,建立了随机森林模型预测断层破裂时间,为断层物理学研究带来了新的探索方向.
-
人工神经网络(Artificial Neural Network,ANN)是由数量庞大的神经元以并行方式依据某种拓扑结构连接而成的神经网络,可以处理并存储不精确的和模糊的数据(Patyra and Kwon, 1993; Kros et al., 2006). 人工神经元是对生物神经元的模拟与简化,单个神经元是一种多输入、单输出的前向型非线性元件. 中间层的神经元对上一层输入信号进行加权线性组合,采用非线性函数生成计算结果并输出至下一层,不同的神经元连接方式能够缔造不同的神经网络. ANN按拓扑结构可以分为反馈神经网络和前馈神经网络,其中最具代表性的前馈网络当属误差反向传播(Back Propagation,BP)神经网络(Rumelhart et al., 1986; Rumelhart et al., 1988; Werbos, 1990; Werbos, 1994). BP网络是由输入层、隐含层及输出层组成的三层前馈阶层网络(Haykin, 1998; Duda et al., 2000). 神经元x使用激活函数(activation function)处理信号,最常用的激活函数是Sigmoid函数,可以用指数表示为(Haykin,1998; Duda et al., 2000):
$S\left(x \right) = \dfrac{1}{{1 + {{\rm{e}}^{ - x}}}}$
(15) 对于具有一个隐含层的三层BP网络,假设输入层、隐含层和输出层的单元数分别为n,l和m,输入为X = [x.*?>=>1, x.*?>=>2, …, x.*?>=>n]T,隐含层神经元的激活阈值为θ = [θ.*?>=>1, θ.*?>=>2, …, θ.*?>=>l]T,输出层神经元的激活阈值为b = [b.*?>=>1, b.*?>=>2, …, b.*?>=>m]T,连接输入层和隐含层的权重矩阵为:
${{\omega }^{\rm{h}}} = \left[ {\begin{array}{*{20}{c}} {\omega _{11}^{\rm{h}}}&{\omega _{12}^{\rm{h}}}&{\omega _{13}^{\rm{h}}}& \cdots &{\omega _{1n}^{\rm{h}}}\\ {\omega _{21}^{\rm{h}}}&{\omega _{22}^{\rm{h}}}&{\omega _{23}^{\rm{h}}}& \cdots &{\omega _{2n}^{\rm{h}}}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {\omega _{l1}^{\rm{h}}}&{\omega _{l2}^{\rm{h}}}&{\omega _{l3}^{\rm{h}}}& \cdots &{\omega _{ln}^{\rm{h}}} \end{array}} \right]$
(16) 连接隐含层和输出层的权重矩阵为:
${{\omega }^{\rm{o}}} = \left[ {\begin{array}{*{20}{c}} {\omega _{11}^{\rm{o}}}&{\omega _{12}^{\rm{o}}}&{\omega _{13}^{\rm{o}}}& \cdots &{\omega _{1l}^{\rm{o}}}\\ {\omega _{21}^{\rm{o}}}&{\omega _{22}^{\rm{o}}}&{\omega _{23}^{\rm{o}}}& \cdots &{\omega _{2l}^{\rm{o}}}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {\omega _{m1}^{\rm{o}}}&{\omega _{m2}^{\rm{o}}}&{\omega _{m3}^{\rm{o}}}& \cdots &{\omega _{ml}^{\rm{o}}} \end{array}} \right]$
(17) 则该网络隐含层输出值H与输出层输出值Z为(Haykin, 1998; Duda et al., 2000):
${H} = S({{\omega }^{\rm{h}}} \cdot {X} + {\theta })$
(18) ${Z} = S({{\omega }^{\rm{o}}} \cdot {H} + {b})$
(19) 输入层的n个节点相当于我们从地震波中提取出的n个特征,输出层的m个单元相当于类别. 其学习主要包括两个阶段,第一阶段为信号输入网络后逐层经过隐含层最终实现输出的正向传播阶段;第二阶段为利用输出层误差逐层向前算出各隐含层误差并依此修正各前导层权重值的反向传播阶段. 通过最小化误差使得网络最终输出值与期望值逐步接近,目前,常用优化方法包括自适应矩估计(Adaptive Moment Estimation,Adam)和随机梯度下降法(Stochastic Gradient Descent,SGD)(Van der Baan and Jutten, 2000; Poulton, 2002; Kingma and Ba, 2015).
早在20世纪90年代ANN就已经被引入到地震领域,如地震事件分类(Dowla et al., 1990; Dysart and Pulli, 1990; Fedorenko et al., 1999; Ursino et al., 2001; Esposito et al., 2006; Ait Laasri et al., 2013)、地震事件检测(Wang and Teng, 1995; Zhao and Takano, 1999; Gentili and Michelini, 2006)、地震早期预警(Böse et al., 2008; Kong et al., 2016)、地震预测(Sharma and Arora, 2005; Asencio-Cortés et al., 2017)、峰值地面加速度估计(García et al., 2006; Alavi and Gandomi, 2011)以及速度模型反演(Moya and Irikura, 2010). 伴随着计算机科学神经网络结构的不断优化,基于ANN的应用研究获得了越来越多的关注和推广. Murat和Rudman(1992)利用ANN识别噪声背景下的地震波初至信号,用4个基于振幅的波形属性作为输入,经过由10个节点构成的隐含层之后输出一个表征真实初至的标量(1为真初至,0为假初至),发现综合了4个波形属性的特征向量比仅用单个属性更能刻画出输入与输出之间复杂的映射关系. Paitz等(2018)利用ANN算法识别出了适用于噪声成像的时间序列,在学习过程中发现ANN识别准确率随训练样本数量的减少而降低,并证明了ANN算法的识别率高于人工判读. Asencio-Cortés等(2017)将ANN用于预测时间窗为7天的中强震震级预测任务,取得了比K近邻(K-Nearest Neighbor, KNN)、SVM、朴素贝叶斯和决策树等算法都更优越的预测性能. 在地震早期预警领域,Kong等(2016)利用ANN辨别智能手机内置加速度计中的地震信号与人类活动噪声,提出了借助智能手机平台实现地震早期预警的新方法.
-
无监督学习在无标签的数据集中寻找隐藏数据分布,描述未标记数据中复杂的模型与关系,大规模的无监督学习能够找出数据中的隐藏特征. 无监督学习最常见的应用是聚类(clustering)和降维(dimensionality reduction),两者的着眼点不同:聚类是基于内部属性的相似性对事物进行归类;降维是通过数学变换将原始高维数据集转换到低维子空间,并分析其分布规律. 聚类算法大致可分为层次化、划分式、基于密度、基于网格的聚类和其它聚类算法,降维则可以分为非线性和线性降维. 这里介绍几种地震学中常用的无监督学习技术.
-
聚类分析是一种常见的数据挖掘技术,可以将样本集转化为类集. 同一类别的数据具有相似的属性值,不同类别的数据尽显差异,从而有助于人们探寻复杂事物中蕴藏的规律和模式. 最常用的聚类技术是基于划分的聚类算法,同时,随着ANN技术的飞速发展也产生了一种智能的聚类方式——自组织映射(Self-Organizing Map, SOM).
-
划分聚类对于一个给定样本的数据集,计算每个样本点属于每个聚类的概率,以此为权重值求取每一个聚类的均值和方差进而估计聚类的极大似然,迭代计算似然函数直至收敛于某个最优值,以此将每个样本划分到其所属的类别当中(Jain et al., 1999; McLachlan and Krishnan, 2007; Kalyani, 2013). 划分聚类原理简单、收敛速度快,但聚类结果的好坏强烈依赖于初始聚类中心的选择,其代表算法有k-means、k-modes、模糊聚类、图聚类等(Xu and Tian, 2015). ML技术兴起于高效率解决图像识别问题,然而识别复杂地震波场中隐含特征的问题更为棘手,而且监督学习算法受到标签效应带来的制约,因此,经典的划分聚类成为缺少标签约束情况下的灵活选择. Galvis等(2017)先后使用基于相似性的地震属性融合(attribute fusion)技术和k-means聚类分析,从哥伦比亚地区爆破地震资料中识别出了体波、基阶面波和高阶面波. Chen(2020)利用模糊聚类算法将微震记录划分为波形序列和非波形序列,以波形序列的第一个索引信号确定震相到时,证明了模糊聚类框架下综合使用振幅均值、功率和长短时窗能量比(STA/LTA)这三个特征要素要比仅采用单个要素获得更佳的拾取效果. Xia等(2018)提出了一种面向原始单炮地震记录的智能面波识别方法,该方法将面波三大突出属性(低频、低速、强振幅)作为特征要素,使用k-means聚类分析技术识别体波、面波和噪声,该方法即使在有噪声和坏道数据的测试中也能表现出稳定的判别能力.
-
自组织映射是一种基于ANN的聚类方法,通过自适应地改变网络参数与结构,将高维源空间中的所有点映射到低维目标空间中,从而实现可视化分类(Kohonen, 1982; Kohonen and Somervuo, 2002). 在低维拓扑图中,每一个神经元对应着一个类目. Maurer等(1992)首次将SOM方法引入到地震资料的分析中,随后陆续出现了一些基于天然地震(Musil and Plešinger, 1996; Tarvainen, 1999; Plešinger et al., 2000; Essenreiter et al., 2001; Esposito et al., 2008; Köhler et al., 2009; Mojarab et al., 2014; Spampinato et al., 2019)和主动源(Klose, 2006; De Matos et al., 2007; Köhler et al., 2010; Esposito et al., 2013)数据的SOM相关应用. Jollife(1986)尝试使用SOM检测连续波形中的岩崩和火山活动事件,开启了更为深入的聚类分析. Sick等(2015)联合使用主成分分析(Principal Component Analysis,PCA)方法(Kohavi, 1995)和最近邻分类SOM识别了智利北部阿塔卡马沙漠地区地震事件的类型(采石场爆破和天然地震)和深度,采用PCA从波形图谱特征空间中提取出80个主成分构建特征向量,经过1 000 次双重交叉验证(Roden et al., 2015)随机划分数据集和训练集,得到的SOM模型对不同类型和深度的事件具备较高的识别能力(80.5%). Roden等(2015)利用PCA从大量地震属性中筛选出最能反映地质条件变化的若干个属性训练SOM识别不同的地质体,训练好的SOM能将属性数值映射成二维彩图,从而直观地反映隐晦的地质特征和异常区域,使得从地震多属性中描绘不同地质体特征的能力得到了提升.
-
机器学习中PCA是一种非常典型的降维方法,通过线性变换将一组相关变量转换为一组线性无关的综合变量(主成分),保留能充分表达原始数据信息的主成分,舍弃部分携带信息少的主成分,能使样本的采样密度增大,更容易进行学习(Kohavi,1995). 在本文2.1.2节中实际上已经具体介绍了PCA和SOM的混合方法在地震事件、深度,地质体识别等方面的应用. 字典学习(Dictionary Learning,DL)也是一种数据降维方法,来源于压缩感知,旨在从原始数据中找到一组特殊的稀疏信号,使得这组稀疏元素能够充分将原始信号线性表示出来,已经被广泛用于图像去噪、分类和成像等领域( Engan et al., 1999; Elad and Aharon, 2006; Bobin et al., 2008, 2013; Sadeghi et al., 2013; Beckouche and Ma, 2014; Akhtar and Mian, 2018). DL方法提供了一个相当灵活的框架,可以根据地震数据本身自适应地构造稀疏数据表示,在地震学中最常见的应用是去除资料中的噪声(Hinton and Salakhutdinov, 2006; Zhu et al., 2015; Chen et al., 2016; Zhu et al., 2017; Wang and Ma, 2019).
-
传统的人工神经网络是一个浅层的结构,随着计算机能力的提升和大数据时代的到来,人们随之提出了深度神经网络的概念(Deep Neural Network,DNN). Hinton和Salakhutdinov(2006)提出通过集成大量隐含层构建能提取抽象的属性和特征的深度神经网络,从而帮助人们更好地完成分类、回归、可视化和高维建模等任务. 深度神经网络是人类在机器学习领域中取得的突破性进展,开启了深度学习的时代. 卷积神经网络(Convolutional Neural Network,CNN)是一种较为常用的深度神经网络,因其稀疏交互、权值共享和降采样的设计思想减少了特征构建负担,提升了计算速度,因而在图像处理(Lecun et al., 1998; Martínez-Alvarez et al., 2015; He et al., 2016)、感知任务(Sermanet et al., 2013; Zhao et al., 2015)、语义理解(Abdel-Hamid et al., 2014)和语音识别(Krizhevsky et al., 2012)等领域得到广泛应用. 基于CNN的深度网络方法已经在自动提取抽象时空特征、优化分类和预测复杂实体趋势等高级任务中显示出了其优越的性能.
计算能力的提高和大数据时代的来临推动着深度学习的热潮,经典的CNN结构包括LeNet、AlexNet、ZF Net、GoogLeNet、VGGNet、ResNet和DenseNet等. 在CNN的基础上,又出现了结构更为复杂的深度卷积神经网络(Deep Convolutional Neural Network, DCNN)、纯卷积神经网络(Fully Connected Network, FCN)和U形卷积神经网络(U-shaped CNN, U-Net)等深度网络结构. Krizhevsky等(2012)训练了一个具有650 000个神经元的DCNN,对共计1 000类的1 200 000幅图像进行分类,并将该算法应用于2012年ImageNet数据集大规模视觉识别挑战赛且一举夺冠,开启了DCNN在计算机视觉领域中的应用时代. 随后,Long等(2015)提出FCN解决了语义级别的图像分割问题,使得DCNN无需全连接层即可完成像素级别的语义分割. Ronneberger等(2015)提出了用U-Net分割生物医学图像,该方法使用非常少的数据就能完成端到端的训练,并能获得精确的分割效果. 随着深度学习技术的快速发展,越来越多的地震学家应用深度学习技术从庞大的数据集中探索数据的本质属性,捕捉抽象关系的表达形式,执行复杂的预测任务.
利用深度神经网络识别图像中既定目标、并确定其边界的思想,与在时间序列中判别信号类型、并寻找震相初至时间的问题不谋而合,因而深度学习技术与震相拾取工作之间产生了共鸣. 于子叶等(2018)训练了一个17层Inception深度网络模型,开展了近震P波、S波震相到时拾取工作. 训练过程中考虑从标签数据添加噪声、卷积层输出结果的正则化、dropout操作和基于噪声步进的训练方式等多个角度入手,提高了神经网络的泛化能力. 该方法能直接输出震相到时信息,并且在噪声容忍度和算法稳定性方面均优于传统的震相拾取方法. Ross等(2018)利用美国南加州地区480多万个人工标记了P波到时和极性的地震图训练了一个CNN分类器,实现了高精度的P波震相到时和初动极性的自动化测量. 正是因为CNN卷积、滤波的网络结构有效规避了专业区别、复杂的特征提取和模型构建等工作,使得直接将地震图输入网络成为可能. Zhu和Beroza(2019)首次使用U-Net识别美国南加州地区P波和S波初至,并利用概率分布确定初至波的到时,取得了非常好的效果,为震相自动拾取工作尤其是S波的自动拾取提供了新方法. 赵明等(2019)发展了U形网络算法(Unet_cea),使用汶川余震和首都圈地震台网记录的89 344个不同震级、不同信噪比的样本进行训练和测试,实现了Pg、Sg震相的自动识别与到时拾取. U形网络在命中率、均方根误差等性能指标上均明显优于STA/LTA和峰度分析自动拾取方法. Zhou等(2019)设计了一个CNN和循环卷积网络(Recurrent Neural Network, RNN)串联结构,在连续波形中拾取P波和S波初至. 该方法先用一个8层CNN识别地震信号和噪声样本,再将识别出的信号输入一个两层双向RNN提取P波和S波到时,其结果的误差均值分别仅有–0.03±0.48 s和0.03±0.56 s. 刘芳等(2020)用阿里余震AI捕捉大赛(Fang et al., 2017)和Hi-net(Okada et al., 2004; Obara et al., 2005)数据训练生成一种改进的U-Net模型,并结合台阵资料对结果进行约束,实现了在连续波形中准确识别地震并精确提取到时.
深度学习技术在事件类型区分和事件检测方面显示出了较好的应用前景,取得了一些卓有成效的进步与发展. Titos等(2018)评估了5种ML算法在火山地震事件分类中的应用效果,首先用线性预测编码(Linear Predictive Coding, LPC)对墨西哥Volcán de Fuego火山区7类不同频段的原始波形进行压缩并构建特征向量,训练了多层降噪自动编码机(Stacked Denoising Autoencoder, SDA)、深度信念网络(Deep Belief Network, DBN)、SVM、MLP和RF共计5种不同的分类器,发现SDA和DBN方法的精度(precision)、召回率(recall)和F1分数(F1 score)优于传统的SVM、MLP和RF方法. Dokht等(2019)用加拿大西部4 900个地震的波形记录数据和地震初至震相小波能量谱数据各训练了一个CNN,前者用于区分地震信号和背景噪声,后者用于识别震相类型并估计震相到时,两个CNN的平均准确率均接近99%. Mousavi等(2019)提出了基于卷积自编码器(convolutional autoencoder)的波形识别技术,借助于无监督的卷积自编码器来学习地震信号的特征,并将学习到的特征用以判别近/远震类型及确定P波初动极性,在事件类型识别上取得了与监督学习算法相当的性能. Perol等(2018)将不同地区发生的地震视为不同的类别,提出了面向事件检测和定位问题的CNN,并用该方法在美国俄克拉何马地区检测出了数量超过原始目录17倍的地震事件. 但这种凭借震源位置区域分类的做法无形中平均了训练集中每个子区域内的样本量,因此得到的定位准确率低于80%. Wu等(2018)设计了一种名为DeepDetect的级联卷积神经网络(Cascade Convolutional Neural Network, CCNN),用于检测不同时间长度的地震,其检测到的事件的准确率达到63.8%,显著超过基于模板匹配方法得到的5.5%的准确率.
DNN同时具备监督和无监督的学习结构,具有优异的特征学习能力. DNN还在地震信号去噪(Zhu et al., 2019)、震源参数测定(Kriegerowski et al., 2019)、层析成像(Araya-Polo et al.,2018; 奚先和黄江清, 2018,2020)以及基于CNN(Geng et al., 2019; Lomax et al., 2019)和RNN模型(Asim et al., 2017; Mousavi and Beroza1, 2020)的地震事件预测、地震岩性预测(Zhang et al., 2018)、地震早期预警和灾害评估(Li et al., 2016)等方面取得了丰富的研究成果.
-
纵使地球深部构造的复杂性和观测数据的指数型激增为深入剖析地震现象带来了严峻挑战,人类探究地震学理论与应用的步伐从未停止. 人工智能机器学习技术的发展加速了人们分析与解释复杂非线性物理学数据集的能力,而依托于经典地震学理论的“智能化”地震学的不断发展也为人类揭示地球结构与地震震源的物理本质提供了强有力的工具.
虽然新兴的ML技术在激增数据的特征提取和自动化处理、典型信号模式识别和趋势预测等方面已经取得了显著成效,但由于受到算法的选择、数据的可用性以及物理的可解释性等诸多因素的制约,它在各领域中取得的发展并不均衡. 目前,多种监督与无监督的ML模型已经在地震事件类型识别、震相拾取、地震定位、层析成像、地震多属性融合及可视化分析、地质异常区域解释、信号去噪、地震预报、地震预警以及灾害评估等技术领域取得了较大进展(Hinton and Salakhutdinov, 2006; Asencio-Cortés et al., 2017; Trugman and Shearer, 2018; Dokht et al., 2019; Chen, 2020; Mousavi and Beroza, 2020). 其中最常用的模型是以ANN为基础发展起来的多种神经网络架构,而应用最多的领域当属地震波观测记录时间序列的自动化处理. 如果以地震学中成熟的理论支撑和丰富的经验指导ML的实际应用,同时结合ML强有力的算法,推动地震成因、机理与新概念的诞生,则这种互补手段的融合有望促进多领域的智能化发展.
目前基于ML的学科研究还存在一些亟待解决的问题与挑战. 与监督学习机制相比,无监督技术的确在很大程度上摆脱了标签样本的束缚,但是无监督技术本身的设计原理使得人们需要根据主观经验或者试验结果预先设定一些关键参数值,如何客观地设定参数值仍然需要进一步的研究和探索. 通常,联合反演地震、地电、地磁及重力等多种地球物理观测资料能实现对深部构造的综合解释(张正一等,2018;Afonso et al.,2019;Contreras-Reyes et al.,2019),但是在ML领域,不同类型的数据有着本质的差别和不同的分布,因此如何构建面向综合数据库任务的ML结构,实现各学科的联合反演也是一个巨大的挑战. 深度学习凭借自身高级的数据学习、建模和推理预测能力受到了地球物理各学科迅速而广泛的关注,成为众多ML应用中的热点选择. 随着计算机处理速度和存储能力的进一步提升、大数据时代下可用数据的充分积累以及学科与时俱进的研究进展,深度学习的应用在未来有着更为广阔的发展前景.
致谢
北京大学《地球物理研究的论文写作指导》课程为本综述完成提供了指导,北京大学赵里教授、中国地震局地球物理研究所房立华研究员提出了很多修改建议和意见,审稿专家也提出了宝贵的修改意见,在此表示感谢.
机器学习在地震学中的应用进展
Machine learning and its application in seismology
-
摘要: 理解并预测多尺度、高维度和非线性的地震学现象是一个极具挑战性的科学任务. 与日俱增的海量观测数据降低了信息收集和信息解读之间的耦合程度,增加了信息解读的抽象性和不确定性. 然而,伴随大数据一同来临的还有人工智能计算机技术——机器学习. 机器学习突出的隐式关系提取和复杂任务处理能力推动着研究学者们不断将机器学习的应用推向更广阔的领域. 本文介绍了地震学中常用的机器学习算法及其应用范围,讨论了人工智能与地震数据相结合的发展方向.Abstract: It is an inherently challenging scientific endeavor to understand and predict multi-scale, high-dimensional and nonlinear seismological phenomena. The increasing amount of observational big data breaks the linkage between data collection and interpretation, and increases the obscurity and uncertainty in data analysis. However, there is also artificial intelligent computer technology, i.e. machine learning in the era of big data. The excellent capability of machine learning for implicit relation extraction and complex task processing has enabled it to be applied to a variety of fields. In this article, we introduce some of the commonly used machine learning algorithms in seismology as well as their applications, and discuss the future directions of integrating artificial intelligence with seismic data interpretation.
-
Key words:
- seismology /
- machine learning /
- feature extraction /
- deep learning /
- neural network
-
-
[1] Abdel-Hamid O, Mohamed A R, Jiang H, et al. Convolutional neural networks for speech recognition[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2014, 22(10): 1533-1545. doi: 10.1109/TASLP.2014.2339736 [2] Afonso J C, Salajegheh F, Szwillus W, et al. A global reference model of the lithosphere and upper mantle from joint inversion and analysis of multiple data sets[J]. Geophysical Journal International, 2019, 217(3): 1602-1628. doi: 10.1093/gji/ggz094 [3] Ait Laasri E H, Akhouayri E S, Agliz D, et al. Seismic signal classification using multi-layer perceptron neural network[J]. International Journal of Computer Applications, 2013, 79(15): 35-43. doi: 10.5120/13821-1950 [4] Akhtar N, Mian A. Nonparametric coupled Bayesian dictionary andclassifier learning for hyperspectral classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(9): 4038-4050. doi: 10.1109/TNNLS.2017.2742528 [5] Alavi A H, Gandomi A H. Prediction of principal groundmotion parameters using a hybrid method coupling artificial neural networks and simulated annealing[J]. Computers and Structures, 2011, 89(23): 2176-2194. [6] Araya-Polo M, Jennings J, Adler A, et al. Deep-learning tomography[J]. The Leading Edge, 2018, 37(1): 58-66. doi: 10.1190/tle37010058.1 [7] Asencio-Cortés G, Martínez-Álvarez F, Troncoso A, et al. Medium-large earthquake magnitude prediction in Tokyo with artificial neural networks[J]. Neural Computing and Applications, 2017, 28: 1043-1055. [8] Asim K M, Martinezalvarez F, Basit A W, et al. Earthquake magnitude prediction in Hindukush region using machine learning techniques[J]. Natural Hazards, 2017, 85(1): 471-486. doi: 10.1007/s11069-016-2579-3 [9] Beckouche S, Ma J. Simultaneous dictionary learning and denoising for seismic data[J]. Geophysics, 2014, 79: A27-A31. doi: 10.1190/geo2013-0382.1 [10] Bergen K J, Johnson P A, de Hoop M V, et al. Machine learning for data-driven discovery in solid Earth geoscience[J]. Science, 2019, 363(6433): eaau0323. doi: 10.1126/science.aau0323 [11] Bishop C M. Pattern Recognition and Machine Learning[M]. New York: Springer, 2006: 205-213. [12] Bobin J, Moudden Y, Starck J L, et al. SZ and CMB reconstruction using generalized morphological component analysis[J]. Statistical Methodology, 2008, 5: 307-317. doi: 10.1016/j.stamet.2007.10.003 [13] Bobin J, Starck J-L, Sureau F, et al. Sparse component separation for accurate cosmic microwave background estimation[J]. Astronomy and Astrophysics, 2013, 550, A73. doi: 10.1051/0004-6361/201219781 [14] Böhning D. Multinomial logistic regression algorithm[J]. Annals of the Institute of Statistical Mathematics, 1992, 44(1): 197-200. doi: 10.1007/BF00048682 [15] Böse M, Wenzel F, Erdik M. PreSEIS: a neural network-based approach to earthquake early warning for finite faults[J]. Bulletin of the Seismological Society of America, 2008, 98(1): 366-382. doi: 10.1785/0120070002 [16] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324 [17] Chen Y K. Automatic microseismic event picking via unsupervised machine learning[J]. Geophysical Journal International, 2020, 222(3): 1750-1764. doi: 10.1093/gji/ggaa186 [18] Chen Y K, Ma J, Fomel S. Double-sparsity dictionary for seismic noise attenuation[J]. Geophysics, 2016, 81(2): V103-V116. doi: 10.1190/geo2014-0525.1 [19] Cheng M Y, Wu Y W, Syu R F. Seismic assessment of bridge diagnostic in Taiwan using the evolutionary support vector machine inference model ESIM[J]. Applied Artificial Intelligence, 2014, 28(5): 449-469. doi: 10.1080/08839514.2014.905818 [20] Contreras-Reyes E, Muñoz-Linford P, Cortés-Rivas V, et al. Structure of the collision zone between the Nazca Ridge and the Peruvian convergent margin: Geodynamic and seismotectonic implications[J]. Tectonics, 2019, 38(9): 3416-3435. doi: 10.1029/2019TC005637 [21] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20: 273-297. [22] Cox D R. The Regression Analysis of Binary Sequences[J]. Journal of the Royal Statistical Society,Series B (Methodological), 1958, 21(1): 215-232. [23] Cracknell M J, Reading A M. The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random forests and support vector machines[J]. Geophysics, 2013, 78: WB113-WB126. doi: 10.1190/geo2012-0411.1 [24] Cristianini N, Shaw-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods[M]. New York: Cambridge University Press, 2000. [25] De Matos M, Osorio P, Johann P. Unsupervised seismic facies analysis using wavelet transform and self-organizing maps[J]. Geophysics, 2007, 72: 9-21. [26] Dokht R M, Kao H, Visser R, et al. Seismic event and phase detection using time-frequency representation and convolutional neural networks[J]. Seismological Research Letters, 2019, 90(2A): 481-490. doi: 10.1785/0220180308 [27] Dowla F U, Taylor S R, Anderson R W. Seismic discrimination with artificial neural networks: preliminary results with regional spectral data[J]. Bulletin of the Seismological Society of America, 1990, 80(5): 1346-1373. [28] Draelos T J, Peterson M G, Knox H A, et al. Dynamic tuning of seismic signal detector trigger levels for local networks[J]. Bulletin of the Seismological Society of America, 2018, 108: 1346-1354. doi: 10.1785/0120170200 [29] Duda R O, Hart P E, Stork D G. Pattern Classification[M]. Hoboken, New Jersey, USA:Wiley Interscience, 2000. [30] Dysart P S, Pulli J J. Regional seismic event classification at the NORESS array: Seismological measurements and the use of trained neural networks[J]. Bulletin of the Seismological Society of America, 1990, 80(6B): 1910-1933. [31] Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries[J]. IEEE Transactions on Image Processing, 2006, 15: 3736-3745. doi: 10.1109/TIP.2006.881969 [32] Engan K, Aase S O, Hakon Husoy J. Method of optimal directions for frame design[C]// 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP99 (Cat. No.99CH36258), Phoenix, AZ, USA, 1999, 5: 2443-2446. [33] Esposito A M, D'Auria L, Giudicepietro F, et al. Neural analysis of seismic data: Applications to the monitoring of Mt. Vesuvius[J]. Annals of Geophysics, 2013, 56(4), S0446. [34] Esposito A M, Giudicepietro F, D’Auria L, et al. Unsupervised neural analysis of very-long-period events at Stromboli volcano using the self-organizing maps[J]. Bulletin of the Seismological Society of America, 2008, 98(5): 2449-2459. doi: 10.1785/0120070110 [35] Esposito A M, Giudicepietro F, Scarpetta S, et al. Automatic discrimination among landslide, explosion-quake, and microtremor seismic signals at Stromboli volcano using neural networks[J]. Bulletin of the Seismological Society of America, 2006, 96(4A): 1230-1240. doi: 10.1785/0120050097 [36] Essenreiter R, Karrenbach M, Treitel S. Identification and classification of multiple reflections with self-organizing maps[J]. Geophysical Prospecting, 2001, 49(3): 341-352. doi: 10.1046/j.1365-2478.2001.00261.x [37] Fang L H, Wu Z L, Song K. SeismOlympics[J]. Seismological Research Letters, 2017, 88(6):1429-1430. doi: 10.1785/0220170134 [38] Fedorenko Y V, Husebye E S, Ruud B O. Explosion site recognition; neural net discriminator using single three-component stations[J]. Physics of the Earth and Planetary Interiors, 1999, 113(1): 131-142. [39] Fernández-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems[J]. Journal of Machine Learning Research, 2014, 15(1): 3133-3181. [40] Galvis I S, Villa Y, Duarte C, et al. Seismic attribute selection and clustering to detect and classify surface waves in multicomponent seismic data by using k-means algorithm[J]. The Leading Edge, 2017, 36: 239-248. doi: 10.1190/tle36030239.1 [41] García S R, Romo M P, Mayoral J M. Estimation of peak ground accelerations for Mexican subduction zone earthquakes using neural networks[J]. Geofísica Internacional, 2006, 46(1): 51-63. [42] Geng Y, Su L, Jia Y, et al. Seismic events prediction using deep temporal convolution networks[J]. Journal of Electrical and Computer Engineering, 2019, 2019: 1-14. [43] Gentili S, Michelini A. Automatic picking of P- and S-phases using a neural tree[J]. Journal of Seismology, 2006, 10(1): 39-63. doi: 10.1007/s10950-006-2296-6 [44] Giacco F, Esposito A M, Scarpetta S, et al. Support vector machines and MLP for automatic classification of seismic signals at stromboli volcano[C] // Neural Nets WIRN09 - Proceedings of the 19th Italian Workshop on Neural Nets. Vietri sul Mare, Salerno, Italy: IOS Press, 2009: 116-123. [45] Goodfellow I, Bengio Y, Courville A, et al. Deep Learning[M]. Cambridge, Massachusetts: MIT Press, 2016. [46] Gutierrez L H, Vasquez L F, Jimenez C A. Fast determination of earthquake depth using seismic records of a single station, implementing machine learning techniques[J]. Revista Ingenieria E Investigacion, 2018, 38(2): 97-103. [47] Haykin S. Neural Networks: A Comprehensive Foundation (2nd Edition)[M]. Upper Saddle River: Prentice Hall, 1998. [48] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, 2016: 770-778. [49] Hibert C, Provost F, Malet J-P, et al. Automatic identification of rockfalls and volcano-tectonic earthquakes at the Piton de la Fournaise volcano using a Random Forest algorithm[J]. Journal of Volcanology and Geothermal Research, 2017, 340: 130-142. doi: 10.1016/j.jvolgeores.2017.04.015 [50] Hinton G, Salakhutdinov R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647 [51] Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832–844. doi: 10.1109/34.709601 [52] Hosmer D W, Lemeshow S. Applied Logistic Regression[M]. Hoboken, New Jersey, USA: John Wiley and Sons, Inc., 1989. [53] Jain A K, Murty M N, Flynn P J. Data clustering: A review[J]. ACM Computing Surveys, 1999, 31(3): 264−323. doi: 10.1145/331499.331504 [54] Jollife I. Principal Component Analysis[M]. New York: Springer, 1986. [55] Kalyani P. Approaches to partition medical data using clustering algorithms[J]. International Journal of Computer Applications, 2013, 49(23): 7-10. [56] Kaur K, Wadhwa M, Park E K. Detection and identification of seismic P-waves using artificial neural networks[C]// The 2013 International Joint Conference on Neural Networks (IJCNN). Dallas, TX, USA, 2013: 1-6. [57] Kingma D P, Ba J. Adam: A method for stochastic optimization[C]// Proceedings of the 3rd International Conference on Learning Representations. San Diego, CA, USA. 2015. [58] Kleinbaum D G, Klein M. Logistic Regression: A Self Learning Text[M]. New York: Springer, 2010. [59] Klose C. Self-organizing maps for geoscientific data analysis: geological interpretation of multidimensional geophysical data[J]. Computers and Geosciences, 2006, 10(3): 265-277. doi: 10.1007/s10596-006-9022-x [60] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C] // The 14th International Joint Conference on Artificial Intelligence. Montreal: Morgan Kaufmann Publishers Inc, 1995, 14: 1137-1143. [61] Köhler A, Ohrnberger M, Scherbaum F. Unsupervised feature selection and general pattern discovery using Self-Organizing Maps for gaining insights into the nature of seismic wavefields[J]. Computers and Geosciences, 2009, 35(9): 1757-1767. doi: 10.1016/j.cageo.2009.02.004 [62] Köhler A, Ohrnberger M, Scherbaum F. Unsupervised pattern recognition in continuous seismic wavefield records using Self-Organizing Maps[J]. Geophysical Journal International, 2010, 182(3): 1619-1630, doi: 10.1111/j.1365-246X.2010.04709.x. [63] Kohonen T. Self-organized formation of topologically correct feature maps[J]. Biological Cybernetics, 1982, 43(1): 59-69. doi: 10.1007/BF00337288 [64] Kohonen T, Somervuo P. How to make large self-organizing maps for nonvectorial data[J]. Neural Networks, 2002, 15(114): 945-952. [65] Kong Q K, Allen R M, Schreier L, et al. MyShake: A smartphone seismic network for earthquake early warning and beyond[J]. Science Advances, 2016, 2: e1501055. doi: 10.1126/sciadv.1501055 [66] Kong Q K, Trugman D T, Ross Z E, et al. Machine learning in seismology: turning data into insights[J]. Seismological Research Letters, 2019, 90(1): 3-14. doi: 10.1785/0220180259 [67] Kriegerowski M, Petersen G M, Vasyura-Bathke H, et al. A deep convolutional neural network for localization of clustered earthquakes based on multistation full waveforms[J]. Seismological Research Letters, 2019, 90(2A): 510-516. doi: 10.1785/0220180320 [68] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, 2012: 1097-1105. [69] Kros J F, Lin M, Brown M L. Effects of the neural network s-Sigmoid function on KDD in the presence of imprecise data[J]. Computers and Operations Research, 2006, 33(11): 3136-3149. doi: 10.1016/j.cor.2005.01.024 [70] LeCessie S, Van Houwelingen J C. Ridge estimators in logistic regression[J]. Applied Statistics, 1992, 41(1): 191-201. doi: 10.2307/2347628 [71] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[C]// Proceedings of the IEEE, 1998, 86(11): 2278-2324. [72] Li Z, Meier M A, Hauksson E, et al. Machine learning seismic wave discrimination: Application to earthquake early warning[J]. Geophysical Research Letters, 2018, 45: 4773-4779. doi: 10.1029/2018GL077870 [73] Li Z H, Tian K, Wang F S, et al. Home damage estimation after disasters using crowdsourcing ideas and convolutional neural networks[C]// 5th International Conference on Measurement, Instrumentation and Automation (ICMIA 2016). Shenzhen, 2016: 857-860. [74] 刘芳, 蒋一然, 宁杰远, 等. 结合台阵策略的震相拾取深度学习方法[J]. 科学通报, 2020, 65(11):1016-1026. doi: 10.1360/TB-2019-0608 Liu F, Jiang Y R, Ning J Y, et al. An array-assisted deep learning approach to seismic phase-picking[J]. Chinese Science Bulletin, 2020: 65(11):1016-1026 (in Chinese). doi: 10.1360/TB-2019-0608 [75] Lomax A, Michelini A, Jozinovic D. An investigation of rapid earthquake characterization using single‐station waveforms and a convolutional neural network[J]. Seismological Research Letters, 2019, 90(2A): 517-529. doi: 10.1785/0220180311 [76] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// IEEE Conference on Computer Vision and Pattern Regression. Boston, MA, USA: IEEE Computer Society, 2015: 3431-3440. [77] Maceda L, Llovido J, Satuito A. Categorization of earthquake-related tweets using machine learning approaches [C]// 2018 International Symposium on Computer, Consumer and Control (IS3C). Taichung, Taiwan, 2018:229-232. [78] Malfante M, Mura M D, Metaxian J, et al. Machine learning for volcano-seismic signals: Challenges and perspectives[J]. IEEE Signal Processing Magazine, 2018, 35(2): 20-30. doi: 10.1109/MSP.2017.2779166 [79] Martínez-Alvarez J J, Garrigós J, Toledo J, et al. A scalable CNN architecture and its application to short exposure stellar images processing on a HPRC[J]. Neurocomputing, 2015, 151: 91-100. doi: 10.1016/j.neucom.2014.09.071 [80] Masotti M, Falsaperla S, Langer H, et al. Application of support vector machine to the classification of volcanic tremor at Etna, Italy[J]. Geophysical Research Letters, 2006, 33: L20304. doi: 10.1029/2006GL027441 [81] Maurer W, Dowla F, Jarpe S. Seismic event interpretation using self-organizing neural networks[J]. The International Society for Optical Engineering (SPIE), 1992, 1709: 950-958. [82] McLachlan G J, Krishnan T. The EM Algorithm and Extensions, Second Edition[M]. Hoboken, New Jersey, USA: John Wiley and Sons, Inc., 2007: 77-103. [83] Mojarab M, Memarian H, Zare M, et al. Modeling of the seismotectonic provinces of Iran using the self-organizing map algorithm[J]. Computers and Geosciences, 2014, 67: 150-162. doi: 10.1016/j.cageo.2013.12.007 [84] Mousavi S M, Beroza G C. A machine-learning approach for earthquake magnitude estimation[J]. Geophysical Research Letters, 2020, 47: e2019GL085976. [85] Mousavi S M, Horton S P, Langston C A, et al. Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression[J]. Geophysical Journal International, 2016, 207(1): 29-46. doi: 10.1093/gji/ggw258 [86] Mousavi S M, Zhu W Q, Ellsworth W, et al. Unsupervised clustering of seismic signals using deep convolutional autoencoders[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(11): 1693-1697. doi: 10.1109/LGRS.2019.2909218 [87] Moya A, Irikura K. Inversion of a velocity model using artificial neural networks[J]. Computers and Geosciences, 2010, 36(12): 1474-1483. doi: 10.1016/j.cageo.2009.08.010 [88] Murat M E, Rudman A J. Automated first arrival picking: a neural network approach[J]. Geophysical Prospecting, 1992, 40(6): 587-604. doi: 10.1111/j.1365-2478.1992.tb00543.x [89] Murphy K P. Machine Learning: A Probabilistic Perspective[M]. Cambridge, Massachusetts: MIT Press, 2012. [90] Musil M, Plešinger A. Discrimination between local microearthquakes and quarry blasts by multi-layer perceptrons and Kohonen maps[J]. Bulletin of the Seismological Society of America, 1996, 86(4): 1077-1090. [91] Obara K, Kasahara K, Hori S, et al. A densely distributed high-sensitivity seismograph network in Japan: hi-net by National Research Institute for Earth Science and Disaster Prevention[J]. Review of Scientific Instruments, 2005, 76(2): 021301. doi: 10.1063/1.1854197 [92] Ochoa L H, Niño L F, Vargas C A. Fast magnitude determination using a single seismological station record implementing machine learning techniques[J]. Geodesy and Geodynamics, 2018, 9: 34-41. doi: 10.1016/j.geog.2017.03.010 [93] Okada Y, Kasahara K, Hori S, et al. Recent progress of seismic observation networks in Japan—Hi-net, F-net, K-NET and KiK-net—[J]. Earth Planet Space, 2004, 56, xv–xxviii. doi: 10.1186/BF03353076 [94] Paitz P, Gokhberg A, Fichtner A. A neural network for noise correlation classification[J]. Geophysical Journal International, 2018, 212(2): 1468-1474. doi: 10.1093/gji/ggx495 [95] Patyra M J, Kwon T M. Processing of incomplete fuzzy data using artificial neural networks [C]// Proceedings of the Second IEEE International Conference on Fuzzy Systems. San Francisco, CA, USA, 1993, 1: 429-434. [96] Perol T, Gharbi M, Denolle M. Convolutional neural network for earthquake detection and location[J]. Science Advances, 2018, 4(2), e1700578. doi: 10.1126/sciadv.1700578 [97] Plešinger A, Rǔžek B, Boušková A. Statistical interpretation of WEBNET seismograms by artificial neural nets[J]. Studia Geophysica et Geodaetica, 2000, 44(2): 251-271. doi: 10.1023/A:1022119011057 [98] Press S J, Wilson S. Choosing between logistic regression and discriminant analysis[J]. Journal of the American Statistical Association, 1978, 73: 699-705. doi: 10.1080/01621459.1978.10480080 [99] Provost F, Hibert C, Malet J P. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier[J]. Geophysical Research Letters, 2017, 44: 113-120. doi: 10.1002/2016GL070709 [100] Poulton M M. Neural networks as an intelligence amplification tool: A review of applications[J]. Geophysics, 2002, 67: 979-993. doi: 10.1190/1.1484539 [101] Rabin N, Bregman Y, Lindenbaum O, et al. Earthquake-explosion discrimination using diffusion maps[J]. Geophysical Journal International, 2016, 207(3): 1484-1492. doi: 10.1093/gji/ggw348 [102] Reddy R, Nair R R. The efficacy of support vector machines (SVM) in robust determination of earthquake early warning magnitudes in central Japan[J]. Journal of Earth System Science, 2013, 122: 1423-1434. doi: 10.1007/s12040-013-0346-3 [103] Reichstein M, Camps-Valls G, Stevens B, et al. Deep learning and process understanding for data-driven Earth system science[J]. Nature, 2019, 566(7743): 195-204. doi: 10.1038/s41586-019-0912-1 [104] Reynen A, Audet P. Supervised machine learning on a network scale: Application to seismic event classification and detection[J]. Geophysical Journal International, 2017, 210(3): 1394-1409. doi: 10.1093/gji/ggx238 [105] Roden R, Smith T, Sacrey D. Geologic pattern recognition from seismic attributes: Principal component analysis and self-organizing maps[J]. Interpretation, 2015, 3(4): SAE59-SAE83. doi: 10.1190/INT-2015-0037.1 [106] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer, 2015: 234-241. [107] Ross Z E, Meier M, Hauksson E. P-wave arrival picking and first-motion polarity determination with deep learning[J]. Journal of Geophysical Research, 2018, 123(6): 5120-5129. [108] Rouet-Leduc B, Hulbert C, Lubbers N, et al. Machine Learning Predicts Laboratory Earthquakes[J]. Geophysical Research Letters, 2017, 44: 9276-9282. doi: 10.1002/2017GL074677 [109] Ruano A E, Madureira G, Barros O, et al. Seismic detection using support vector machines[J]. Neurocomputing, 2014, 135: 273-283. doi: 10.1016/j.neucom.2013.12.020 [110] Rubin M J, Camp T, Herwijnen A V, et al. Automatically detecting avalanche events in passive seismic data[C] // 2012 11th International Conference on Machine Learning and Applications. Boca Raton, FL: IEEE, 2012: 13-20. [111] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagation errors[J]. Nature, 1986, 323:533-536. doi: 10.1038/323533a0 [112] Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation[J]. Readings in Cognitive Science, 1988, 323(6088): 399-421. [113] Sadeghi M, Babaie-Zadeh M, Jutten C. Dictionary Llearning for sparse representation: A novel approach[J]. IEEE Signal Processing Letters, 2013, 20(12): 1195-1198. doi: 10.1109/LSP.2013.2285218 [114] Safavian S R, Landgrebe D. A survey of decision tree classifier methodology[J]. IEEE Transactions on Systems, Man and Cybernetics, 1991, 21(3): 660-674. doi: 10.1109/21.97458 [115] Sermanet P, Kavukcuoglu K, Chintala S, et al. Pedestrian detection with unsupervised multi-stage feature learning[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR: IEEE, 2013: 3626-3633. [116] Shahnas M H, Yuen D A, Pysklywec R N. Inverse Problems in Geodynamics Using Machine Learning Algorithms[J]. Journal of Geophysical Research: Solid Earth, 2018, 123: 296-310, doi: 10.1002/2017JB014846. [117] Sharma M L, Arora M K. Prediction of seismicity cycles in the Himalayas using artificial neural networks[J]. Acta Geophysica Polonica, 2005, 53(3): 299-309. [118] Sick B, Guggenmos M, Joswig M. Chances and limits of single-station seismic event clustering by unsupervised pattern recognition[J]. Geophysical Journal International, 2015, 201(3): 1801-1813. doi: 10.1093/gji/ggv126 [119] Spampinato S, Langer H, Messina A, et al. Short-term detection of volcanic unrest at Mt. Etna by means of a multi-station warning system[J]. Scientific Reports, 2019, 9: 6506. doi: 10.1038/s41598-019-42930-3 [120] Tang L, Zhang M, Wen L. Support vector machine classification of seismic events in the Tianshan orogenic belt[J]. Journal of Geophysical Research: Solid Earth, 2020, 125: e2019JB018132. [121] Tarvainen M. Recognizing explosion sites with a self-organizing network for unsupervised learning[J]. Physics of the Earth and Planetary Interiors, 1999, 113(1-4): 143-154. doi: 10.1016/S0031-9201(99)00019-9 [122] Titos M, Bueno A, García L, et al. A deep neural networks approach to automatic recognition systems for volcano-seismic events[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(5): 1533-1544. doi: 10.1109/JSTARS.2018.2803198 [123] Trugman D T, Shearer P M. Strong correlation between stress drop and peak ground acceleration for recent M1-4 earthquakes in the San Francisco bay area[J]. Bulletin of the Seismological Society of America, 2018, 108(2): 929-945. doi: 10.1785/0120170245 [124] Ursino A, Langer H, Scarfì L, et al. Discrimination of quarry blasts from tectonic microearthquakes in the Hyblean plateau (southeastern Sicily)[J]. Annals of Geophysics, 2001, 44(4): 703-722. [125] Van der Baan M, Jutten C. Neural networks in geophysical applications[J]. Geophysics, 2000, 65 (4): 1032-1047 doi: 10.1190/1.1444797 [126] Vapnik V. The Nature of Statistical Learning Theory[M]. New York: Springer, 1995. [127] Vapnik V. Statistical Learning Theory[M]. New York: John Wiley, 1998. [128] Wang X J, Ma J W. Adaptive dictionary learning for blind seismic data denoising[J]. IEEE Geoence and Remote Sensing Letters, 2019, 99: 1-5. [129] Wang J, Teng T L. Artificial neural network-based seismic detector[J]. Bulletin of the Seismological Society of America, 1995, 85(1): 308-319. [130] Werbos P J. Backpropagation through time: what it does and how to do it[C]. Proceedings of the IEEE, 1990, 78(10): 1550-1560. [131] Werbos P J. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting[M]. New York, USA: John Wiley, 1994. [132] Wu Y, Lin Y Z, Zhou Z, et al. DeepDetect: A cascaded region-based densely connected network for seismic event detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57(1): 62-75. [133] 奚先, 黄江清. 复杂散射波场的深度学习反演成像法[J]. 地球物理学进展, 2018, 33(6): 2483-2489. Xi X, Huang J Q. Deep learning inversion imaging method for scattered wavefield[J]. Progress in Geophysics, 2018, 33(6): 2483-2489 (in Chinese). [134] 奚先, 黄江清. 基于卷积神经网络的地震偏移剖面中散射体的定位和成像[J]. 地球物理学报, 2020, 63(2): 687-714. Xi X, Huang J Q. Location and imaging of scatterers in seismic migration profiles based on convolution neural network[J]. Chinese Journal of Geophysics, 2020, 63(2): 687-714 (in Chinese). [135] Xia K Y, Hilterman F, Hu H. Unsupervised machine learning algorithm for detecting and outlining surface waves on seismic shot gathers[J]. Journal of Applied Geophysics, 2018, 157(2018): 73-86. [136] Xu D, Tian Y. A comprehensive survey of clustering algorithms[J]. Annals of Data Science, 2015, 2: 165-193. doi: 10.1007/s40745-015-0040-1 [137] Xu C, Xu X W, Dai F C, et al. Comparison of different models for susceptibility mapping of earthquake triggered landslides related with the 2008 Wenchuan earthquake in China[J]. Computers and Geosciences, 2012, 46: 317-329. doi: 10.1016/j.cageo.2012.01.002 [138] Yilmaz I. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine[J]. Environmental Earth Sciences, 2010, 61(4): 821-836. doi: 10.1007/s12665-009-0394-9 [139] 于子叶, 储日升, 盛敏汉. 深度神经网络拾取地震P波和S波到时[J]. 地球物理学报, 2018, 61(12): 4873-4886. Yu Z Y, Chu R S, Sheng M H. Pick onset time of P and S phase by deep neural network[J]. Chinese Journal of Geophysics, 2018, 61(12): 4873-4886(in Chinese). [140] 张正一, 范建柯, 白永良, 等. 中国海—西太平洋地区典型剖面的重-磁-震联合反演研究[J]. 地球物理学报, 2018, 61(7): 2871-2891. Zhang Z Y, Fan J K, Bai Y L, et al. Joint inversion of gravity-magnetic-seismic data of a typical profile in the China Sea-Western Pacific area[J]. Chinese Journal of Geophysics, 2018, 61(7): 2871-2891.(in Chinese). [141] Zhang G Y, Wang Z Z, Chen Y K. Deep learning for seismic lithology prediction[J]. Geophysical Journal International, 2018, 215(2): 1368-1387. [142] 赵明, 陈石, 房立华, 等. 基于U形卷积神经网络的震相识别与到时拾取方法研究[J]. 地球物理学报, 2019, 62(8): 3034-3042. Zhao M, Chen S, Fang L H, et al. Earthquake phase arrival auto-picking based on U-shaped convolutional neural network[J]. Chinese Journal of Geophysics, 2019, 62(8): 3034-3042(in Chinese). [143] Zhao R, Ouyang W L, Li H S, et al. Saliency detection by multi-context deep learning[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, 2015: 1265-1274. [144] Zhao Y, Takano K. An artificial neural network-based seismic detector[J]. Bulletin of the Seismological Society of America, 1999, 77: 670-680. [145] Zhou Y J, Yue H, Kong Q K, et al. Hybrid event detection and phase-picking algorithm using convolutional and recurrent neural networks[J]. Seismological Research Letters, 2019, 90: 1079-1087. doi: 10.1785/0220180319 [146] Zhu W Q, Beroza G C. PhaseNet: A deep-neural-network-based seismic arrival-time picking method[J]. Geophysical Journal International, 2019, 216(1): 261-273. [147] Zhu L C, Liu E T, McClellan J H. Seismic data denoising through multiscale and sparsity-promoting dictionary learning[J]. Geophysics, 2015, 80(6): WD45-WD57. doi: 10.1190/geo2015-0047.1 [148] Zhu L C, Liu E T, McClellan J H. Joint seismic data denoising and interpolation with double-sparsity dictionary learning[J]. Journal of Geophysics and Engineering, 2017, 14(4): 802-810. doi: 10.1088/1742-2140/aa6491 [149] Zhu W, Mousavi S M, Beroza G C. Seismic signal denoising and decomposition using deep neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(11): 9476-9488. doi: 10.1109/TGRS.2019.2926772 -