“ EXTRACTING BOREHOLE STRAIN PRECURSORS ASSOCIATED WITH THELUSHAN EARTHQUAKE THROUGH PRINCIPAL COMPONENT ANALYSIS „

A YRY-4 borehole strainmeter installed at the Guza Station recorded anomalous changes in borehole strain data preceding the Lushan earthquake on April 20, 2013 (UTC) (Mw = 7.0). To identify earthquake-induced abnormal strain changes, we apply principal component analysis (PCA) for the first time to analyse the borehole strain data from the Guza Station. The first principal component eigenvalues and eigenvectors demonstrate that the anomalous days are mainly concentrated within two time periods: 1) October 25-December 30, 2012, and 2) April 15-19, 2013. A combined eigenvalue and eigenvector analysis reveals that the abnormal days exhibit a clustered distribution that is aggregated in the same location for both periods, intuitively indicating that there is a forceful correlation between the two anomalies. We tentatively infer that a similar process contributed to the formation of both anomalies and that these two anomalies are both earthquake precursors associated with the Lushan earthquake. These findings also indicate that the PCA approach exhibits potential for the extraction of earthquake precursor anomalies. lated anomalies, including a slow earthquake, a high rate of compressional strain, and high-frequency vibration and strain transients, which were relatively coherent at all of the sites. Borehole strain data are affected by several other factors, and various data processing methods, such as wavelet analysis [Qiu et al., 2011], time-frequency spectrum analysis [Qi et al., 2011], and statistical analysis, are needed to clarify the association of such strain signals with crustal activity by revealing the characteristics of the subsurface physical mechanisms that are involved in strain change processes. Principal component analysis (PCA) is a statistical method that is widely employed to reveal relevant information in confusing datasets [Gómez et al., 2005]. PCA, which is a non-parametric method that does not utilize deviations from previously described strain changes to determine data anomalies, has the potential to describe the spatial distribution of earthquake-related strain changes. This method has been successfully applied during previous research, for many types of earthquake precursor data to obtain meaningful results. Gotoh et al. [2003] and Hattori et al. [2004, 2013] found that eigenvalues and eigenvectors are likely to be correlated with large earthquakes by using PCA to investigate ultra-low frequency (ULF) geomagnetic data. Similarly, Telesca et al. [2004] discovered earthquake precursor patterns in the daily variations of the principal components of eigenvalues by using PCA to investigate ULF geoelectrical data. Furthermore, Lin [2011] used PCA and image processing to identify ionospheric total electron content anomalies before a strong earthquake. C. Goltz [2001] used PCA to study the seismicity of the Southern California area and discussed the promising results in view of their implications, potential applications and possible precursory qualities. Fang Y et al. [2015] identified possible shortand medium-term earthquake precursors before the Lushan earthquake by using PCA to investigate the cross-fault data. In this work, we apply PCA for the first time to timeseries borehole strain data that were recorded at the Guza Station. The aim of this study is to detect anomalies in the borehole strain that preceded the Lushan earthquake and to investigate any correlation among the anomalies. 2. OBSERVATION AND DATA At 08:02 on April 20, 2013, a Ms 7.0 earthquake occurred in Lushan County, Ya’an, Sichuan. The epicentre was located at 30.277° N and 102.937° E. According to the data published by the China Earthquake Networks Center of the China Earthquake Administration, the earthquake magnitude was Ms 7.0 and the focal depth was approximately 13 km. The Lushan earthquake is a high-angle thrust event [Zeng X. F. et al., 2013]. The Guza Station is located at the southwestern extent of the Longmenshan fault zone. At the Guza station, the deformation observation instrumentation includes a very-long-period vertical pendulum tiltmeter, YRY-4 borehole strainmeter, DSQ water pipe inclinometer, SSY body piercing extensometer, and DZW digitalized gravity meter. The YRY-4 borehole strainmeter was installed in October 2006. Continuous recordings have been collected from this strainmeter at a sampling rate of one sample per minute since December 1, 2006. A seasonal change rule can clearly be seen in the strain data. Therefore, the borehole strain data from the Guza Station have a normal background [Qiu et al., 2011]. The distance between the epicentre of the Lushan earthquake and the receiver at Guza Station is 72 km (Figure 1).The Lushan earthquake had a wide range of influence, and the physical phenomena taking place during earthquake preparation is very complicated, consequently we cannot exclude a-priori the possibility that the YRY-4 borehole strainmeter installed at the Guza Station recorded the strain anomaly phenomena taking place during earthquake preparation. The YRY-4 borehole strainmeter contains four horizontally emplaced sensors to measure changes in the borehole diameter. The orientations of the four sensors are arranged at 45-degree intervals within a cylindrical case. With one additional measurement, a simple relationship among the four measurements can be obtained straight forwardly using the equation: S1+S3=S2+S4. , which is the self-consistency equation of the YRY4 borehole strainmeter. This equation can be employed to estimate the credibility of the data. There are only ZHU ET AL. 2 FIGURE 1. A location map showing the Guza Station and the Lushan earthquake epicentre. three independent variables under plain strain conditions at or near the Earth’s surface. We can therefore derive various strains from the Guza recordings. The formulas used are as follows:

The high-resolution records from borehole strain-meters also provide the opportunity to investigate strain changes prior to earthquakes.Johnston et al. [1994Johnston et al. [ , 2006] ] studied two large earthquakes in California with high-resolution continuous strain data.Qiu et al. [1998] found that the Douhe and Zhaogezhuang stress-monitoring stations that are located along a supposed fault zone observed significant tensile pulses of ground stresses normal to the fault zone before the great M7.8 Tangshan, China, earthquake in 1976, and they considered such surficial observations to reflect movements of the crust associated with earthquakes.Using data from the beginning of May 2008, Ouyang et al. [2009] observed that five RZB borehole strainmeters situated in the Chongqing section of the Three Gorges Reservoir region located 400-500 km southeast of the Wenchuan earthquake epicenter, simultaneously recorded corre-lated anomalies, including a slow earthquake, a high rate of compressional strain, and high-frequency vibration and strain transients, which were relatively coherent at all of the sites.Borehole strain data are affected by several other factors, and various data processing methods, such as wavelet analysis [Qiu et al., 2011], time-frequency spectrum analysis [Qi et al., 2011], and statistical analysis, are needed to clarify the association of such strain signals with crustal activity by revealing the characteristics of the subsurface physical mechanisms that are involved in strain change processes.Principal component analysis (PCA) is a statistical method that is widely employed to reveal relevant information in confusing datasets [Gómez et al., 2005].PCA, which is a non-parametric method that does not utilize deviations from previously described strain changes to determine data anomalies, has the potential to describe the spatial distribution of earthquake-related strain changes.This method has been successfully applied during previous research, for many types of earthquake precursor data to obtain meaningful results.Gotoh et al. [2003] and Hattori et al. [2004Hattori et al. [ , 2013] ] found that eigenvalues and eigenvectors are likely to be correlated with large earthquakes by using PCA to investigate ultra-low frequency (ULF) geomagnetic data.Similarly, Telesca et al. [2004] discovered earthquake precursor patterns in the daily variations of the principal components of eigenvalues by using PCA to investigate ULF geoelectrical data.Furthermore, Lin [2011] used PCA and image processing to identify ionospheric total electron content anomalies before a strong earthquake.C. Goltz [2001] used PCA to study the seismicity of the Southern California area and discussed the promising results in view of their implications, potential applications and possible precursory qualities.Fang Y et al. [2015] identified possible short-and medium-term earthquake precursors before the Lushan earthquake by using PCA to investigate the cross-fault data.
In this work, we apply PCA for the first time to timeseries borehole strain data that were recorded at the Guza Station.The aim of this study is to detect anomalies in the borehole strain that preceded the Lushan earthquake and to investigate any correlation among the anomalies.

OBSERVATION AND DATA
At 08:02 on April 20, 2013, a Ms 7.0 earthquake occurred in Lushan County, Ya'an, Sichuan.The epicentre was located at 30.277° N and 102.937°E. According to the data published by the China Earthquake Networks Center of the China Earthquake Administration, the earthquake magnitude was Ms 7.0 and the focal depth was approximately 13 km.The Lushan earthquake is a high-angle thrust event [Zeng X. F. et al., 2013].The Guza Station is located at the southwestern extent of the Longmenshan fault zone.At the Guza station, the deformation observation instrumentation includes a very-long-period vertical pendulum tiltmeter, YRY-4 borehole strainmeter, DSQ water pipe inclinometer, SS-Y body piercing extensometer, and DZW digitalized gravity meter.The YRY-4 borehole strainmeter was installed in October 2006.Continuous recordings have been collected from this strainmeter at a sampling rate of one sample per minute since December 1, 2006.A seasonal change rule can clearly be seen in the strain data.Therefore, the borehole strain data from the Guza Station have a normal background [Qiu et al., 2011].
The distance between the epicentre of the Lushan earthquake and the receiver at Guza Station is 72 km (Figure 1).The Lushan earthquake had a wide range of influence, and the physical phenomena taking place during earthquake preparation is very complicated, consequently we cannot exclude a-priori the possibility that the YRY-4 borehole strainmeter installed at the Guza Station recorded the strain anomaly phenomena taking place during earthquake preparation.
The YRY-4 borehole strainmeter contains four horizontally emplaced sensors to measure changes in the borehole diameter.The orientations of the four sensors are arranged at 45-degree intervals within a cylindrical case.With one additional measurement, a simple relationship among the four measurements can be obtained straight forwardly using the equation: , which is the self-consistency equation of the YRY-4 borehole strainmeter.This equation can be employed to estimate the credibility of the data.There are only

ZHU ET AL. FIGURE 1. A location map showing the Guza Station and the
Lushan earthquake epicentre.
three independent variables under plain strain conditions at or near the Earth's surface.We can therefore derive various strains from the Guza recordings.The formulas used are as follows: (1) where S a represents the areal strain, S 13 and S 24 and represent the two independent shear strains [Qiu et al., 2013a].

PRINCIPAL COMPONENT ANALYSIS
PCA is a widely used technique in data analysis.It is computationally inexpensive, it can be applied to ordered and unordered attributes, and it can handle both sparse data and skewed data.PCA is a non-parametric method that is capable of extracting relevant information from complex data sets [Gómez et al., 2005], and it often reveals relationships that were not previously suspected, thereby allowing for an otherwise unordinary interpretation.
Mathematically the data are presented in a matrix Y of m rows and n columns: (2 where m is the number of samples, and n is the dimension of the sample. First, we calculate the co-variance matrix C Y (mxm) of the dataset Y(mxn) , and the element y pq in the covariance matrix C Y (mxm) can be calculated using the following formula: (3) where x i p and x i q are the pth and qth columns of the ith row of data, respectively, and X _ p and X _ q are the averages of the pth and qth columns of data, respectively.Here N is the number of samples.
We perform the eigenvalue decomposition using the co-variance matrix: (4) where Λ is the eigenvalue matrix with λ 1 , λ 2 ...λ m (λ 1 >λ 2 > …> λ m ) and V is the eigenvector matrix whose columns are v 1 , v 2 , ..., v m .The first principal component eigenvalue and eigenvector are λ 1 and v 1 , re- spectively, which represent the principal characteristics of the signals.In this paper, the variations in the eigenvalue and eigenvector of the first principal component are investigated.
The obtained eigenvector has three dimensions in the vector space because the number of PCA dimensions is three, and it is a unit vector.To present the changes in the eigenvector more intuitively, we transform the eigenvectors to the unit spherical coordinate system, as shown in Figure 2. The eigenvector can be represented by θ and φ exclusively.

SYNTHETIC DATA
In order to verify the ability of PCA to extract this signal, we perform PCA on a synthetic data: Next, we perform PCA on the data matrix X=[X 13 , X 24 , X a ] T .Figure 4 shows the covariance matrices of the standard days and the anomalous days.The covariance matrices of the precursory anomaly signals show the similar distribution, and with a large amplitude.
As is shown in Figure 5, eigenvalues and eigenvectors effectively detect anomalies signals.
The unit spherical coordinate system of eigenvectors.
Here the eigenvectors represent the spatial distribution information of covariance matrix, and the eigenvalues represent the amplitude information of covariance matrix.From the results of synthetic data, PCA has the ability to detect such precursory abnormal signals.

DATA PROCESSING
PCA is applied to extract the anomalies of strain changes associated with the earthquake.We analyse the borehole strain data from the Guza Station that were collected from January 01 2011, to December 31, 2013.Figure 6 shows the four-component series of borehole strain data that were recorded at the Guza Station.First, the borehole strain data from the Guza Station are checked using the self-consistency equation, after which the four-component borehole strain data are transformed into three components, namely, S 13 , S 24 , and S a , through a strain conversion.The next step is data preprocessing, in which we remove the influences of solid tide and the trend reflecting seasonal variation by using a harmonic analysis.The strain changes that reflect the movements of the crust that are associated with the earthquake are regarded as the short period high frequency oscillation signal .The information regarding the strain change that is related to crustal activity is the most dominant signal in the daily variation data.
Surrounding environment of the station is one of the factors affecting borehole strain data.Most of these factors have a cycle of one day.In order to avoid time-domain aliasing, and distinguish the anomalous days easier, we chose to perform PCA on daily data.Let us consider that the three-component data (i.e., 1 day is equal to1440 points) are arranged in the form of time series data vectors.
(5) where T indicates a transpose.Then, the data matrix Y=[S 13 , S 24 , S a ] T is prepared.A time-series plot of the borehole strain data after the strain conversion are shown in Figure 7 and the data after data preprocessing are shown in Figure 8.
The co-variance matrix that consists of borehole strain data after the strain conversion C Y is then computed.The eigenvalue decomposition of C Y is performed to obtain C Y = VΛV T , where Λ is the eigenvalue matrix consisting of the eigenvalues λ 1 , λ 2 ,λ 3 (λ 1 >λ 2 >λ 3 ) and V is the eigenvector matrix with v 1 , v 2 , v 3 .In the PCA approach the eigenvector v 1 is chosen to maximize the variance in the data.We consider that the eigenvector v 1 is the most intense signal subspace [Hattori et al., 2004].Since we previously transformed the data into three components, it is possible to perform an orthogonal expansion of the signal space into v 2 and v 3 which are the second and third principal components, respectively.

RESULTS AND DISCUSSIONS
Through a field investigation and a comparison of the correlations among strain data and seismographic recordings, Qiu et al. [2013b] concluded that the abnormal changes observed at the Guza Station (several days before the earthquake) should be related to the Lushan earthquake.Chi et al. [2013] also observed abnormal strain five months before the Lushan earthquake that lasted for 3 months.Based on the results of these previous studies, we conduct the following investigation.
We apply PCA to the data after pretreatment, and calculate the first principal component eigenvalue and eigenvector.Figure 9 shows the daily variation in the first principal component eigenvalue λ.The earthquake occurrence is shown with a vertical dashed line.We calculate the average and the standard deviation σ using The variations in λ that are shown in Figure 9 illustrate that the anomalous values are mainly concentrated in two time periods, October 25-December 30, 2012 and April 15-19, 2013, the latter of which was just several days before the earthquake occurred.These results are consistent with those of Qiu et al. [2013b] and Chi et al. [2013].In addition, the anomalous values also appeared on January 28, 2011;March 24, 2011;June 23, 2011;October 7, 2011;October 9, 2011;March 29, 2012; August 31, 2012 before the Lushan earthquake. Figure 9 indicates that the anomaly that occurred during the period of April 15-19, 2013, has an adequate correlation with the Lushan earthquake.However, it seems that there is no correlation between the two abnormal time periods.
To study the relationship between these two partial anomalies, we analyse the corresponding eigenvectors.Since the number of PCA dimensions is three, the eigenvectors obtained have the same numbers of dimensions FIGURE 7. A time series plot of borehole strain data after the strain conversion.S 13 and S 24 represent the two independent shear strains, and S a represents the areal strain.

FIGURE 8.
A time series plot of data after data preprocessing.P1, P2 and P3 represent the data after data preprocessing of S 13 , S 24 and S a , respectively.
in the vector space.To express the eigenvectors more intuitively, we consider a two-dimensional plot with the angles θ and φ.The eigenvector shows the basis function of a signal subspace.First, we study the eigenvector for the several days preceding the earthquake.Figure 10 illustrates the daily variations in the eigenvector for April 2013; we find that the anomalies during the period of April 15-19 show a similar distribution in both θ and φ.This indicates that these days possess the same basis function of the signal subspace.And the covariance matrices of a few standard days and a few anomalous days have been showed in Figure 11.Covariance matrices of standard days show the random characteristics, and those of anomalous days show the similar distribution.It is significant that the main anomalous behaviour that precedes the earthquake shares the same signal pattern.
We take the eigenvector for the anomalous days preceding the earthquake as the abnormal eigenvector, and a similarity measurement based on the distance is applied to extract other abnormal eigenvectors using equation ( 6): (6) where θ and φ are the average angles of the eigenvector for the anomalous days preceding the earthquake, and θ i and φ i are the angles of the eigenvector for the days to be measured.We observe that when 0 ≤ d ≤ 15, the extracted anomalies are most trusted, and we treat them as targets.
Figure 12 indicates the results of the detection of abnormal eigenvectors over the analysed period.A large number of abnormal eigenvectors similar to those preceding the earthquake are observed during from October 2012 to January 2013.The two anomalous periods detected from the variations in the eigenvalues show a strong correlation among the eigenvectors.Similar eigenvector distributions do not appear on January 28, 2011;March 24, 2011;June 23, 2011;October 7, 2011;   Here, we must mention that the results of the anomaly detection using the eigenvalue and those obtained using eigenvector are not exactly the same.We consider that the eigenvalue and eigenvector are both obtained from the decomposition of the co-variance matrix of the data and that both contain parts of information of the data.
To avoid the loss of information and to extract the data anomalies more accurately, we perform a combined eigenvalue and eigenvector analysis.Figure 13 shows the spatial distributions of eigenvalues and eigenvectors.During the periods of October 25-December 30, 2012, and April 15-19, 2013, the abnormal days show a clustered distribution (black arrow), in the same location.Other days are scattered irregularly throughout the spatial coordinate system.
Time series plot of factors that affect soil radon activity is shown in Figure 14.This displays pressure variation, temperature variation, and borehole water lever variation during the observation period.To some extent, the borehole water level reflects the change of rainfall.The black dashed boxes corresponds to the periods of the two part of abnormal.As is shown in Figure 13, pressure variation and temperature variation did not show abnormal phenomena in the periods of the two part of abnormal.The periods of the two part of abnormal are in the dry season of this region, and borehole water lever shows the downward trend.It can basically be ruled out that anomalies are caused by pressure, temperature ,and the rainfall.
Based upon the results from the combined eigenvalue and eigenvector analysis, the spatial distribution of the pre-earthquake anomaly is very similar to the distributions of eigenvectors and eigenvalues for the period from October 2012 to December 2012 that precedes the earthquake.We infer that a similar process contributed to the formation of both anomalies and that these two anomalies are both earthquake precursors associated with the Lushan earthquake.

CONCLUSION
We have utilized the eigenvalues and eigenvectors of the first principal component of time series borehole strain data using PCA technique to study the abnormal characteristics of borehole strain data that preceded the Lushan earthquake.PCA is capable of effectively extracting abnormal features.This application of PCA to investigate on strain earthquake precursors is still in an embryonic stage.There are many problems that must yet be resolved.Because the principle of crustal movement is complex, it is difficult to determine the physical meaning of eigenvalues and eigenvectors, and because of the different observation principles and accuracy of different types of strain data, it is difficult to use a PCA method with this information to carry out a joint analysis.However, by applying PCA to describe the strain behaviour, we may be able to ensure that the PCA technique has great potential in the study of earthquake precursors.
Firstly, we use suspected precursory anomaly signals and coseismic signal to construct synthetic signals, and the signal of the standard day is replaced by the Gauss signal.The synthetic signals are shown in Figure 3, the second and fifth days are precursory abnormal signals (in black box), and the eighth day is the coseismic signal (in red box).

STRAIN
And eigenvector show a similar distribution in the precursory 3

FIGURE 6 .
FIGURE 6.The four-component time series of borehole strain data recorded at the Guza Station, China, from January 01, 2011, to December 31, 2013.

FIGURE 9 .FIGURE 10 .
FIGURE 9.The results of the daily variations in the first principal component eigenvalue.The average value is delimited by a red horizontal dashed line, and the average of more than 1σ is delimited by a red horizontal dotted line.

FIGURE 11 .
FIGURE 11.The covariance matrices of a few standard days and a few anomalous days.A is the covariance matrices of a few standard days; B is the covariance matrices of a few anomalous days.

FIGURE 13 .
FIGURE 13.The result of the spatial distribution of eigenvalues and eigenvectors.The x-axis and y-axis respectively represent the and θ angles, φ and the colour indicates the changes in eigenvalue.The black arrow denotes the suspected abnormal days, and the red arrow denotes the day of the Lushan earthquake.