Neural network approach to the prediction of seismic events based on low-frequency signal monitoring of the Kuril-Kamchatka and Japanese regions

Very-low-frequency/ low-frequency (VLF/LF) sub-ionospheric radiowave monitoring has been widely used in recent years to analyze earthquake preparatory processes. The connection between earthquakes with M ≥5.5 and nighttime disturbances of signal amplitude and phase has been established. Thus, it is possible to use nighttime anomalies of VLF/LF signals as earthquake precursors. Here, we propose a method for estimation of the VLF/LF signal sensitivity to seismic processes using a neural network approach. We apply the error back-propagation technique based on a three-level perceptron to predict a seismic event. The back-propagation technique involves two main stages to solve the problem; namely, network training, and recognition (the prediction itself). To train a neural network, we first create a so-called ‘training set’. The ‘teacher’ specifies the correspondence between the chosen input and the output data. In the present case, a representative database includes both the LF data received over three years of monitoring at the station in Petropavlovsk-Kamchatsky (2005-2007), and the seismicity parameters of the Kuril-Kamchatka and Japanese regions. At the first stage, the neural network established the relationship between the characteristic features of the LF signal (the mean and dispersion of a phase and an amplitude at nighttime for a few days before a seismic event) and the corresponding level of correlation with a seismic event, or the absence of a seismic event. For the second stage, the trained neural network was applied to predict seismic events from the LF data using twelve time intervals in 2004, 2005, 2006 and 2007. The results of the prediction are discussed.


Introduction
Low-frequency (LF) signals (range, 10-50 kHz) propagate between the Earth and the ionosphere as in a spherical waveguide.The bottom boundary of the waveguide is the Earth, and the top boundary is the lowest part of the ionosphere.The propagation of low-frequency signals is determined on the one hand by the electrical conductivity of the Earth surface, and on the other hand by the conductivity of the lower ionosphere and upper atmosphere.The analysis of the behavior of the amplitude and phase signals from verylow-frequency (VLF)/LF transmitters have shown the possibility for their use as precursors of earthquakes.A night disturbance of the signal amplitude and phase for the long paths has been observed before several strong earthquakes, as described by Gokhberg et al. [1987Gokhberg et al. [ , 1989] ] and Gufeld et al. [1992] over two decades ago.More recently, the changes in the position in minima of the phase and amplitude daily variations during sunset and sunrise for a few days before strong earthquakes in Japan were describe [Hayakawa et al. 1996, Molchanov andHayakawa 1998].
The usefulness of the sub-ionospheric VLF/LF signal propagation method for the detection of seismoionospheric perturbations from observations of ground stations has been demonstrated recently in Japan [Maekawa et al. 2006, Muto et al. 2009, Hayakawa et al. 2010], Italy [Biagi et al. 2007[Biagi et al. , 2008] ] and Russia [Rozhnoi et al. 2006[Rozhnoi et al. , 2007a[Rozhnoi et al. , 2009]].This method was used for the analysis of both ground-based transmitter signals detected onboard the DEMETER satellite above seismic regions, and ground observations [Muto et al. 2008, Rozhnoi et al. 2007b, Solovieva et al. 2009, Rozhnoi et al. 2012].
A statistical analysis was realized by Rozhnoi et al. [2004] for the purpose of determining the sensitivity threshold of LF signals to the magnitude of an earthquake, and unearthing probable periods of observation of anomalies caused by seismic activity.They showed that the sensitivity of the LF signal to seismic processes becomes apparent for M ≥5.5.The signal anomalies for earthquakes with such magnitudes were observed in 20% to 25% of cases.
In this study, we propose a method for the estimation of the LF signal sensitivity to seismic processes using a neural network approach.The trained neural network is applied in forecast mode for the automatic detection of abnormal changes in the signal, relating to seismic activity above a certain threshold.

Methods
Historically, the first successful applications of the neural network method were implemented for a pattern-recognition problem; namely, the problems of recognition of printed text, image compression, and image recognition in the field of computer vision.Eventually, the properties of neural networks proved to be useful to other areas of knowledge.The essential difference between traditional computing and neurocomputing is that neural networks can produce their own rules from incomplete and noisy data.When it is hard to find a traditional algorithm for the solution to a problem, the ability of a neural network to extract the 'rules of exit', to effectively solve nonlinear tasks, and to per-form interpolation and extrapolation of an available database can be helpful for many tasks in geophysics.An excellent review of neural network paradigms and a detailed analysis of their application to various geophysical problems was given by Poulton [2002].
Neural network methods have been used in geoelectrics for inversion of electromagnetic data in threedimensional geoelectric structures [Spichak and Popova 2000].Interpretation of multidimensional geophysical data generated during a geological exploration was carried out using the neural network method known as self-organizing mapping [Klose 2006].A neural network application for seismic data processing was developed for first-break peaking and trace editing [McCormack et al. 1993].Neural network technology has been used in various fields of geophysics for parameter estimation, filtering, classification and prediction.A backpropagated and associative neural network was applied to predict magnitudes of earthquakes from seismic network signals, electric preseismic signals, and average magnitudes of previous earthquakes [Dutta 2011].The variation of the geomagnetic field declination, horizontal component, hourly relative humidity, ground temperature, rain rate per day, mean number of rainy hours per day, and ground temperature, were used to predict the magnitude of an earthquake two days before its occurrence by means of a neural network [Suratgar et al. 2008].
We applied the back-propagation technique [Rumelhart and McClelland 1986], based on the three-layer perceptron to estimate the sensitivity of the VLF/LF signal to seismic processes, as illustrated on Figure 1.This type of a neural network is known as 'supervised'.This involves two main stages of solving a problem: the training of the network, and the recognition (the prediction itself ).In the supervised scheme of teaching, the network is taught the relationship between the input and output pairs, which is called the training set.
To train the neural network, we created a teaching database that included both the catalog of seismic events from 2005 to 2007 and the corresponding data (amplitude and phase of the LF signals), measured in the regime of monitoring at the receiving station in Petropavlovsk-Kamchatsky from the Japanese transmitter JJY (see Figure 2).The seismic events were excluded from the database for the days when the index of the magnetic field activity, Dst, and the flux of relativistic electrons exceeded the given thresholds.The optimal properties for the formation of the teaching database were derived after many experiments on teaching and testing of the neural networks.As a result, the training samples included the features calculated from the amplitudes and phases of the signals, which were measured for five days before 40 seismic events of M ≥5.5 that occurred at a depth (H) of less than 150 km.The ratio of the radius (R) of the zone of precursors displayed to the distance (D) of the epicenter from the axis of the propagation path between the transmitter and receiver (R/D) was selected as >0.7, because the preparation zone intersected the sensitivity zone in this case.Thus, the preparation activity can influence the LF signal propagation.The radius of the zone of the precursor display was calculated using the relationship R = 10 0.43M , where M is the magnitude of an earthquake [Dobrovolsky et al. 1979].
The teaching database also included 40 examples of a lack of seismic events, because the neural network had to learn to distinguish between the seismic events of M ≥5.5 and their absence.Thus, the total teaching database contained 80 examples.Each example contained the input and output data for the teaching of the neural network.Below, we refer to the seismic events of M ≥5.5 as the event', and to the rest as a 'lack of the seismic event'.
The mean and dispersion values of the phases and amplitudes in the nighttime for five days before the seismic event (or the lack of it) were used as the input data.Thus, the number of the input neurons in the threelayer perceptron was 20.The mean and dispersion values of the amplitude are marked in Figure 1 as M A,k and D A,k , and those of the phase, as M Ph,k and D Ph,k , respectively.The index k refers to the number of days before the seismic event (or lack of it) and this varies from t-1 to t-5, where t is the number of the day of a seismic event (or lack of it).The input signal X consisted of these values.The corresponding correlation level with the seismic event (where the correlation value was 1) or with a lack of the seismic event (where the correlation value was 0) were used as output data U. Thus, the value of the output neurons in the three-layer perceptron was equal to unity.The level of correlation is marked as C t on Figure 1.After the preparation of the input-output pair, the neural network was trained using 80 examples of the teaching database.
The input signal X that is represented by the means and dispersions of the phases and amplitudes in the nighttime for five days, propagated forward from one layer to the next layer.Thereby, each neuron i of a next layer received the total signal from all of the neurons j of a previous layer: where u i l is the output signal of neuron i of layer l, G is the neuron response function (e.g., hyperbolic tangent), W i j l are the weight connections between the neurons of layers l -1 and l, and x j is the value of the neuron j of the layer l -1.
At the training state, we must obtain the output signal u i on the third layer that minimizes the total standard error: (2) The summation was carried out for each training example p over all of the neurons i of the output layer.The 'target' value u i t represented the sample value of the correlation coefficient for the corresponding training example.The real value u i represented the value of the output neuron formed as a result of signal propagation (Equation 1).
Weight connections W 1 and W 2 between the layers of the neural network were the parameters that defined the value of an error (Equation 2).Therefore, the essence of the teaching process was the search for the weight connections W ij for the error minimization.The weights were assigned random values within a certain range at the beginning of the teaching procedure.The teaching procedure was based on the gradient descent technique of the error minimization (Equation 2) between the target values of the outputs specified by the 'teacher' and those produced by the neural network: (3) where DW ij (n) is the increment of the weight connection at step n, DW ij (n−1) is its increment at the previous step, and a and b are internal parameters of the neural network.This procedure was fulfilled for all of the teaching examples, and finished when it reached the The procedure of the recognition (prediction) uses the interpolation and extrapolation properties of the neural network.Unlike the training procedure that requires many steps of the iteration process, the prediction involves only one passage of the recognizable signal from input to output and uses the weight connections specified at the teaching process.The recognition procedure is performed quickly.The final result formed at the output can be treated as a correlation level with the seismic event, or lack of it.
The procedure of the recognition (prediction) was performed as follows.We chose the five days before the day of the seismic event and formed the input for the neural network.Its dimension was equal to the number of input neurons of the trained neural network (there were 20 input neurons in our case).We obtained the value of the correlation level for the day of the seismic event (sixth day) at the network output after one passage of the input signal from the input to the output of the trained neural network.This value can vary from zero to unity.If the correlation level is more than 0.5, a seismic event of M ≥5.5 is possible.
We can consider not only one day of a seismic event, but also the time interval, which includes the day of the seismic event, since we can determine the behavior of the correlation level a few days before the seismic event.Therefore, we should form the input vector from the five days before the first day of the time interval and recognize the corresponding correlation level for the first day of the interval using the trained neural network.Afterwards, we formed the next input vector from the five days before the second day of the interval, and also recognize the corresponding correlation level for the second day of the interval.Thus, we used a sliding window of five days.The recognition (prediction) procedure was performed step by step with a shift of one day for all of the days of the chosen time interval.

Results
We chose 12 time intervals in 2004,2005,2006 and 2007 to predict a seismic event from the LF data.The corresponding seismic events were not used in the training set for the teaching of the neural network; therefore, they can be applied to the prediction.The time intervals varied from 6 days to 8 days, including the days of the seismic events.Table 1 shows the corresponding seismic events of M ≥5.5.The data in Table 1 were taken from an earthquake catalog (http://earthquake.usgs.gov/earthquakes/eqarchives/epic/).Table 1 contains the following parameters: the year, month, day and time of the seismic event, the latitude and longitude location of the earthquake, the depth of the epicenter position, the magnitude M, the distance D from the epicenter of the earthquake to the axis of the 'transmitter-receiver' line, and the ratio R/D.
We suggest that neural network indicates changes in the LF signal due to an earthquake of M ≥5.5 if the values of the correlation coefficient recognized by the network are >0.5 for several days in a row before the earthquake.For six of the 12 time intervals (see Table 1 for the seismic events numbered from 1 to 6), the neural network detected changes in the LF signal for several days in a row (2-3 days) before an earthquake of M ≥5.5 and on the day itself.These examples are shown in Figure 3.
For the next three time intervals (see Table 1 for the seismic events numbered from 7 to 9), the neural network detected changes in a signal that indicated an earthquake on the second or third day before the earthquake, excluding the day itself.These results can also be considered as positive.For the remaining three time intervals (see Table 1 for the seismic events from 10 to 12), there were no correlations between the seismic events of M ≥5.5 and changes in the signal.We discuss the results of the prediction on the example presented in Figure 3a.The magnitudes of the earthquakes that occurred during the period of analysis are shown in the upper section of Figure 3a.If a magnitude is ≥5.5, the following parameters are given for the corresponding column: magnitude M, depth H, distance D from the epicenter of the earthquake to the axis of the transmitter-receiver line, and the ratio R/D.The dashed line represents the threshold at which M ≥5.5.
The results of the prediction from January 6-11, 2007, are shown on the bottom section of Figure 3a.These results are formed as the output (single neuron) of the previously trained neural network.This output represents the correlation coefficient.In this way, we find the degree of the correlation with a seismic event of M ≥5.5.A procedure for recognition (prediction) is performed step by step with a shift in one day, up to the day of the seismic event of M ≥5.5.The dashed line in Figure 3a represents the threshold value of the correlation coefficient of 0.5.
One can see from the bottom section of Figure 3a that the correlation coefficients are more than 0.5 for two days in a row before the earthquake and on the day of seismic event itself.Such behavior of the correlation coefficient indicates an earthquake of M ≥5.5.

Discussion and conclusions
For nine of the twelve time intervals, the neural network successfully recognized changes in the LF signal that indicated an earthquake of M ≥5.5 a few days before the earthquake.These results confirm that shortterm prediction of seismic events based on changes in the LF signal is possible.A mean value and dispersion calculated from the amplitudes and phases of signals for the night period can be considered as indicators of seismic events.
The LF signal sensitivity to seismic processes is seen for seismic events of M ≥5.5 that occur at a depth of <150 km.In addition, the ratio R/D should be >0.7.
As an outlook, we suggest the use of several wave paths as input data of a neural network, as well as the additional output parameters, such as a magnitude value and a forecast probability.

Figure 1 .
Figure1.Three layer perceptron of back-propagation neural network applied to prediction of seismic events.The input signal represents the mean and dispersion values of the phases (M Ph,t-n and D Ph,t-n , respectively) and amplitudes (M A,t-n and D A,t-n , respectively) in the nighttime for five days before the seismic event (n varies from 1 to 5).The corresponding level of correlation with the seismic event C t is used as the output data.

Figure 3 .
Figure 3.The neural network prediction.(a) Result of prediction for January 6-11, 2007.(b) Result of prediction for July 16-23, 2004.(c) Result of prediction for September 8-14, 2004.Each column on the upper section represents the magnitude of the earthquakes that occurred on certain days of the time period considered.If M ≥5.5, the corresponding column is marked with the following parameters: magnitude M, depth H, distance D from the epicenter of the earthquake to the axis of the 'transmitter-receiver' line, and the ratio R/D.Dashed line, threshold at which M ≥5.5.The corresponding values of the correlation coefficients are presented in the bottom section: dashed line, threshold of the correlation coefficient of 0.5.