The characteristics analysis of strain variation associated with Wenchuan earthquake using principal component analysis

Borehole strainmeters that are installed deeply into bedrock are capable of recording both continuous stress and strain measurements, and have consequently become an important tool for monitoring crustal deformation. A YRY-4 borehole strainmeter installed at the Guza Station recorded anomalous changes in borehole strain data preceding the Wenchuan earthquake on May 12, 2008 (UTC) (=8.0). We apply principal component analysis (PCA) to analyze borehole strain data from the Guza Station. The first principal component eigenvalues and eigenvectors are calculated. The fitted results of the cumulative number of anomalous eigenvalues demonstrate that an acceleration occurred approximately 4 months before the earthquake (from January 2008). The results of the combined eigenvalue and eigenvector analyses show that the spatial distribution of eigenvectors and accelerated occurrence of eigenvalue anomalies represents the stress evolution characteristics of the fault from a steady state to a sub-instability state in rock experiments. We tentatively infer that this process may also be linked to the preparation phase of a large earthquake.

The Wenchuan earthquake is the greatest earthquake disaster in China since the 1976 Tangshan earthquake, which occurred in the Longmenshan thrust zone on May 12, 2008 [Fu et al., 2011]. Researchers have studied the precursory phenomena of the Wenchuan earthquake. Chun-Chieh Hsiao et al., [2010] studied the ionospheric electron density variation, and found an abnormal phenomenon around noon within 5 days prior to the Wenchuan earthquake. Liu et al., [2011] used the S transform method to observe that abnormal strain signals increased before the Wenchuan earthquake and decreased after the earthquake. By means of Wavelet Decomposition and Overrun Rate Analysis,  concluded that the abnormal changes observed at the Guza Station (several months before the earthquake) could be related to the Wenchuan earthquake and the anomalous signals do not correspond to seasonal changes.
Principal component analysis (PCA) is a statistical method that is widely employed to reveal relevant information in confusing datasets [Gómez et al., 2005]. It is a non-parametric method that does not utilize deviations from previously described strain changes to determine data anomalies and has the potential to describe the spatial distribution of earthquake-related strain changes. Telesca et al., [2004] discovered earthquake precursor patterns in the daily variations of the principal components of eigenvalues by using PCA to investigate Ultra-Low Frequency (ULF) geoelectrical data. Hattori et al., [2004] applied PCA to the ULF horizontal NS component and indicated that features of eigenvalues and eigenvectors are likel1y to be correlated with large earthquakes. Lin, [2015] examined ionospheric total electron content (TEC) during the time period of the Tohoku earthquake by using two-dimensional principal component analysis, and detected two larger principal eigenvalues on March 11, 2011. In the present work, we have applied PCA to time-series borehole strain data that were recorded at the Guza Station. We aim to detect anomalies in borehole strain that preceded the Wenchuan earthquake and to analyse the cause of these anomalies.

Observations
The Guza Station is located at the southwestern extent of the Longmenshan fault zone. At the Guza station, the deformation observation instrumentation includes a very-long-period vertical pendulum tiltmeter, YRY-4 borehole strainmeter, DSQ water pipe inclinometer, SS-Y body piercing extensometer, and DZW digitalized gravity meter. The YRY-4 borehole strainmeter was installed in October 2006. Continuous recordings have been collected from this strainmeter at a sampling rate of one sample per minute since December 1, 2006. A seasonal change rule can be seen clearly in the strain data. Therefore, borehole strain data from the Guza Station have a normal background .
Kaiguang Zhu et al. At 14:28(UTC+8) on May 12, 2008, a Ms8.0 earthquake occurred in Wenchuan County, Sichuan. The epicenter was located at 31.01° N and 103.42° E. According to the data published by the China Earthquake Networks Center of the China Earthquake Administration, and the focal depth was approximately 14 km. The Wenchuan earthquake is characterized mainly by thrust motion with right-lateral-strike slip [Fu et al., 2011].
The distance between the epicenter of the Wenchuan earthquake and the receiver at Guza Station is 153 km ( Figure 1). The Wenchuan earthquake had a wide range of influences, and the physical phenomena taking place during earthquake preparation is very complicated. Consequently, we cannot exclude a-priori the possibility that the YRY-4 borehole strainmeter installed at the Guza Station recorded the strain anomaly phenomena taking place during earthquake preparation.

Strain conversion
The YRY-4 borehole strainmeter contains four horizontally emplaced sensors to measure changes in the borehole diameter. The theoretical model is shown in Figure 2.
Gauge in the cylinder directly measures the change in diameter in the corresponding azimuth that results from changes in strain state. Although the solutions are complex, the resulting formula of the relationship between the measurement and the strain changes ( ₁, ₂, ) is straight forward: The YRY-4 borehole strainmeter contains four horizontally emplaced sensors to measure changes in the borehole diameter. Self-consistency is crucial in FGBS design. According to the theoretical model (1), because the four gauges are arranged at 45° intervals, the relationship can be obtained as follow: (2) Strain variation of the Wenchuan earthquake where ( = 1,2,3,4) is the measurement obtained from each of the four-gauges. With one additional measurement, a simple relationship among the four measurements can be obtained straight forwardly using equation (2): which is the self-consistency equation of the YRY-4 borehole strainmeter. This equation can be employed to estimate the credibility of the data. In plane strain problems, all the non-vanishing strains are in a plane while all the out-of-plane components are zero identically in the problem, and borehole strainmeter can only observe two-dimensional strain changes. Hence, there are only three independent variables under plain strain conditions at or near the Earth's surface [Wu et al., 2017]. We can therefore derive various strains from the Guza recordings.
The formulas used are as follows: (4) where represents the areal strain, and ₁₃ and ₂₄ represent the two independent shear strains [Qiu et al., 2013].

Principal component analysis
PCA is a widely used technique in data analysis. It is computationally inexpensive, it can be applied to ordered and unordered attributes, and it can handle both sparse data and skew data. PCA is a non-parametric method that is capable of extracting relevant information from complex data sets [Gómez et al., 2005], and it often reveals relationships that were not previously suspected, thereby allowing for an otherwise unordinary interpretation.
Mathematically, the data are presented in a matrix of rows and columns: where is the number of samples, and is the dimension of the sample.
First, we calculate the co-variance matrix ( × ) of the dataset Y( × ), and the element in the co-variance matrix ( × ) can be calculated using the following formula: where and are the th and th columns of the th row of data, respectively, and and are the averages of the th and th columns of data, respectively. Here N is the number of samples.
We apply the eigenvalue decomposition to the co-variance matrix: where is the eigenvalue matrix with ₁, ₂ ... ( ₁ > ₂> ... > ) and is the eigenvector matrix whose columns are ₁, ₂, ... , . The first principal component eigenvalue and eigenvector are ₁ and ₁, respectively, which represent the principal characteristics of the signals, and other eigenvalues represent information about influencing factors and noise. In this paper, the variations in the eigenvalue and eigenvector of the first principal component are investigated.
We applied PCA to borehole strain data after the strain conversion ( ₁₃, ₂₄, ), and the co-variance matrix is three dimensions, the obtained eigenvector has three dimensions in the vector space, and it is a unit vector. To present the changes in the eigenvector more intuitively, we transform the eigenvectors to the unit spherical coordinate system, as shown in Figure 3. The eigenvector can be represented by and exclusively.

Data processing
PCA is applied to extract the anomalies of strain changes associated with the earthquake. We analyze the borehole strain data from the Guza Station that were collected from May 01, 2007, to December 31, 2009. Figure 4 shows the four-component series of borehole strain data that were recorded at the Guza Station. At 19:10 on September 12, 2007, a Ms7.9 earthquake occurred in Indonesia. The co-seismic strain changes of the Indonesia earthquake are also recorded by the borehole strainmeter at the Guza station.
First, the borehole strain data from the Guza Station are checked using the self-consistency equation, after which the four-component borehole strain data are transformed into three components, namely, ₁₃, ₂₄, and , through a strain conversion. A time-series plot of the borehole strain data after the strain conversion is shown in Figure 5.
The next step is data preprocessing, in which we remove the influences of solid tide (for which the periods are 24,

5
Strain variation of the Wenchuan earthquake  12, 8, and 6 hours) and the trend reflecting seasonal variation (for which the period is one year) by using a harmoni analysis. The strain changes that reflect the movements of the crust that are associated with the earthquake are regarded as the short period high frequency oscillation signal. A time-series plot of the borehole strain data after data preprocessing are shown in Figure 6. The information regarding the strain change that is related to crustal activity is the most dominant signal in the daily variation data.
The environment surrounding the station is one of the factors affecting borehole strain data. Most of these factors have a cycle of one day. To avoid time-domain aliasing, and to distinguish the anomalous days more easily, we chose to perform PCA on daily data. Let us consider that the three-component data (i.e., 1 day is equal to1440 points) are arranged in the form of time series data vectors.
6 Figure 6. A time series plot of data after data preprocessing. P1, P2 and P3 represent the data after data preprocessing of S 13 , S 24 and S a , respectively. Then, we applied PCA to P1, P2 and P3. In the PCA approach, the eigenvector ₁ is chosen to maximize the variance in the data. We consider that the eigenvector ₁ is the most intense signal subspace  Since we previously transformed the data into three components, it is possible to perform an orthogonal expansion of the signal space into ₂ and ₃ which are the second and third principal components, respectively.

Results and discussion
We apply PCA to the data after pretreatment, and calculate the first principal component eigenvalue and eigenvector. Figure 7 shows the daily variation in the first principal component eigenvalue . The black arrows denote the days of the earthquake occurrence. We calculate the average and the standard deviation using all of the values to recognize anomalous values. Those anomalous values are then defined as values that exceed the average by more than 1 .
The variations in that are shown in Figure 7 illustrate that there are few anomalous values before January 2008, and the anomalous value on September 12, 2007 was caused by the Indonesia earthquake. After January 2008, the number of anomalous eigenvalues began to increase, and this growth continued until a few months after the occurrence of earthquake.
In order to express the variation characteristics of the number of anomalous eigenvalues more intuitively, we calculate the cumulative number of anomaly eigenvalue and fit it by the sigmoidal function. Figure 8 illustrates the temporal behavior of N(t), denoted here as N (eigenvalue anomalies). It shows that a sigmoidal temporal behavior before the earthquake with a lower concavity and a subsequent sigmoidal behavior after, with an opposite concavity. We indeed notice an acceleration that occurs approximately 4 months before the earthquake (from January 2008), and a short period of earthquake quiescence that occurs before the earthquake. This process can be linked to the preparation phase of a large earthquake [Kei.,2011]. After such a large-magnitude earthquake, 7 Strain variation of the Wenchuan earthquake rearrangement of stresses in the crust commonly leads to subsequent occurrences of a large number of anomaly events [McCloskey et al.,2005;Parsons et al.,2002]. As is shown in Figure 8, there was a steep rise in the number of anomalous eigenvalues after the earthquake.
We consider that the eigenvalue and eigenvector are both obtained from the decomposition of the co-variance matrix of the data and that both contain parts of information of the data. We perform a combined eigenvalue and eigenvector analysis. The x-axis and y-axis respectively represent the and angles, and the color indicates the changes in eigenvalue, as is shown in Figure 9.
A boxplot is used to express the change of eigenvector before Wenchuan earthquake, as is shown in Figure 10,  The and angles can uniquely represent an eigenvector, and they vary with eigenvectors. As it is shown in Figure   9 and Figure  . We infer that this process may also be linked to the preparation phase of a large earthquake.

Conclusions
We have utilized the PCA technique to study the anomaly characteristics of borehole strain data that preceded the Wenchuan earthquake. The eigenvalues and eigenvectors of the first principal component of time series borehole strain data are used to analyze the characteristics of strain variation before the Wenchuan earthquake. The results of the analysis indicate that the borehole strainmeter of Guza station recorded the preparation phase of the Wenchuan earthquake, and PCA is capable of effectively extracting the features of crustal strain changes. The principle of crustal movement is complex. In the application of PCA to investigate strain earthquake precursors, it is difficult to determine the physical meaning of eigenvalues and eigenvectors. In future work, we aim to confirm the physical meaning of eigenvalues and eigenvectors. By applying PCA to describe the strain behavior, we may be able to ensure that the PCA technique has great potential in the study of earthquake precursors.