Incorporation of Multivariate Statistical Distribution of Magnitude-Distance and Monte-Carlo Simulation in Probabilistic Seismic Hazard Analysis

The classical seismic hazard analysis is based on two independent simpli ed assumptions including the statistical distribution of mag- nitude (usually Gutenberg-Richter 1958) and the distance distribution (equal probability in each point of a given source). However, the interaction between the two distributions is rarely discussed in past researches. Therefore, a joint M-R distribution has been implemented in this paper in order to shed light into these simpli ed assumptions. The Tehran metropolis is considered as the case study since it lo- cates in a highly active seismic region. Three seismological datasets were used in this study, i.e. the observed dataset, the simulated dataset based on the Han and Choi 2008 methodology, and the simulated dataset based on the EqHaz software platform. Then, the clas- sical seismic hazard analysis results are compared with the results obtained based on the joint M-R distribution. The results show that the classical seismic hazard analysis is always conservative when compared with the results based on the simulated data.


INTRODUCTION
Seismic Hazard Analysis (SHA) is an important and timely issue in the performance-based earthquake engineering (McGuire, 2004), i.e. especially in high seismic regions such as Iran plateau (Berberian, 2005). The main aim of SHA is to forecast ground shaking characteristics such as Peak Ground Acceleration (PGA), Peak Ground Velocity (PGV), spectral ordinates and so forth [Baker, 2008]. Gutenberg-Richter is one of the most important empirical-observation relationships within the classical SHA (Richter, 1958;McGuire, 2004). Several modifications were proposed in the literature in order to enhance this relationship [Cornell, 1968;Rosenblueth and Esteva, 1966;Isacks and Oliver, 1964;Bazzurro and Cornell, 1999]. Various Gutenberg-Richter distributions are depending on the treatment of the Maximum Magnitude (M max ) [Kagan, 2007]. The correlation between magnitude and distance is, obviously, not present in this relationship. The hypothesis of this research is that if whether the joint distribution of the magnitude and distance will result in different conclusions in the case of seismic hazard when compared to the classical approach? Therefore, a joint distribution of M-R has been employed in this paper in order to assess this hypothesis. The Tehran metropolis is considered as the case study in this research since this capital is located in the highly active seismic region and the SHA results in this area will be informative for engineering and research purposes. For this aim, two catalogues are gathered including (1) the observed events which are collected from the previous studies [Shahvar and Zare, 2013;Berberian, 1994], and (2) a simulated set of events based on the Han and Choi approach [Han and Choi, 2008]. All the observed and simulated events are within the 220 km rectangular area of the capital centre. The foreshocks and aftershocks are eliminated based on the Gardner and Knopoff approach [Gardner and Knopoff, 1974]. Finally, 273 observed data are available for further investigations. The whole area is divided into 11 seismic sources, and then, the seismic characteristics are calculated for each seismic region, i.e. M max , β (seismicity rate) and λ m0 (the rate of earthquakes with magnitudes higher than m 0 ) [Wells and Coppersmith, 1994;Kijko and Sellevoll, 1992]. The seismic characteristics are compared with the available findings in the literature [Tavakoli and Ghafory-Ashtiany, 1999]. The classical PSHA has been performed based on Equation (1), and the results are compared with the available SHA results in the region [Gholipour, et al., 2008].
where IM is the intensity measure, λ(IM>x) is the rate of IM>x, M is the moment magnitude, m min is the lower bound for the moment magnitude, λ(M i >m min ) is the rate of occurrence of earthquakes greater than m min from the source, P(IM>x|m,r) comes from a given Ground Motion Prediction Equation (GMPE), f(M) and f(R) are, respectively, the probability density functions for magnitude and distance. A set of events is also generated by employing the Han and Choi approach (Han and Choi, 2008) for different return periods, i.e. 10 4 , 10 5 and 10 6 years. This approach uses a random generation procedure based on the occurrence rate parameter which will be described in details in the following sections. The other seismic characteristics, such as M max and λ m0 , are kept identical to the observed catalogue. Hence, the generated catalogue has a kind of inherent epistemic uncertainty which makes it different from the observed data. The PSHA approach is, then, performed again for the simulated cases and the results are compared with the classical hazard curve as shown in Figure 1. As seen in Figure 1, both of the hazard curves are close together except in the middle part which corresponds to the moderate return period values, i.e. between 10 years and 3500 years return periods which is a relatively wide range of interest in practice. As seen in Figure 1, the hazard curve based on the observed data is always higher than the hazard curve based on the simulated events which indicate that the hazard curve based on the observed catalogue is still on the safe side. It is worth mentioning that this difference is not meaningful when Equation (1) being employed, however, the difference rise up when a joint distribution between magnitude and distance is taken into consideration. This difference is the main focus of this paper and will be discussed with details in the next sections. It is worth mentioning that the author's findings show that the SHA results based on using f m (M).f r ( R) meaningfully differ from the results found on the implementation of f m,r (M, R) within Equation (1).

SEISMIC CATALOGUE
The seismic catalogue was collected based on three data resources including: (1) the historical data [Berberian, 1994]; (2) The data provided by [Shahvar and Zare, 2013] in which contains the moment magnitude; and (3) The BHRC data available at its website (see Data and resources).
Tehran metropolis is located in 35.69N and 51.42E (see Data and Resources section). Initially, the seismic data were gathered in the region 37N to 43.3N and 49.7E to 53.1E in which provides a rectangular area with the dimension equal to 150 km around Tehran metropolis. The possible active faults, as well as the distribution of the collected seismic data, are shown in Figure 2 (see also Data and Resources section). Furthermore, it was found that a set of active faults are located in the borders of the considered region. Therefore, the part was extended to 33.8N to 37.5N and 49.2E to 53.6E which provides a rectangular area with the dimension equal to 220 km (see Figure 2).
The active faults were identified consisting of four linear and seven areal seismic sources as seen in Figure   transformed into the moment magnitude by using the Shoja-Taheri relationship [Shoja-Taheri and Naserieh, 2007]. Additionally, the distance between the event epicentres to the centre of the given region is calculated by using the Haversine relationship [Sinnott, 1984].

SEISMIC DE-CLUSTERING
Three different methodologies were implemented to eliminate foreshocks and aftershocks in the gathered earthquake database. The methodologies consist of (1) Gardner and Knopoff method [Gardner and Knopoff, 1974]; (2) Grounthal method [Van Stiphout et al., 2012]; and (3) Uhrhammer method [Uhrhammer, 1986]. The three mentioned methods use a window scheme in order to recognize foreshocks and aftershocks. Since the results based on the three pre-mentioned methodologies were quite close together, the authors decided to use the Gardner and Knopoff method [Gardner and Knopoff, 1974] for further investigations in this study. In the case of each seismic source, the Gardner and Knopoff method uses a linear regression between time and magnitude as well as a linear regression between magnitude and distance. The events which are below the regression lines are considered as foreshocks and aftershocks and were eliminated from the seismic database. However, the Gardner and Knopoff method is limited to the magnitudes up to 6.4 [Gardner and Knopoff, 1974]. Therefore, the Van Stiphout relationship [Van Stiphout et al., 2012] was implemented in the case of magnitudes higher than 6.4. Finally, after elimination of the foreshocks and aftershocks, 237 data were obtained as shown in Figure 4.

SEISMOGENIC ZONES AND MAGNITUDE-FRE-QUENCY DISTRIBUTION
The seismicity characteristics, for each seismic source, were calculated by considering historical data. The Kijko method [Kijko and Sellevoll, 1989;Kijko and Sellevoll, 1992], which has been implemented in the Kijko2001 software platform (see Data and Resources Section), was used to calculate the β and λ m0 parameters. The maximum credible earthquake is calculated by using Equation (2) [Wells and Coppersmith, 1994]. The Surface Rupture Length (SRL) uncertainty is equal to 20 percent of the fault length [Wells and Coppersmith, 1994] and at least 50 km [Biasi and Weldon, 2006]. The seismic source mechanism is obtained based on the Iranian active faults map. The results are shown in Table (1). It is worth mentioning that the current results are compared, as seen in Table (1), to the Tavakoli and Ghafory Ashtiany [1999] results as a verification procedure.

= + × log(SRL)
(2)   λ m0 is the rate of earthquakes with magnitudes higher than m0 * β is the seismicity rate obtained by the Gutenberg-Richter relationship where M is moment magnitude; SRL is the surface rupture length; a and b are constants as shown in Table (2).

SIMULATION OF CATALOGUE BY USING THE HAN AND CHOI ALGORITHM
The Han and Choi algorithm (Han and Choi, 2008) is utilised in order to generate a simulated earthquake catalogue. The algorithm procedure is shown in Figure 5. The earthquake catalogue consists of 237 events in 62 years. Therefore, to keep the seismicity rate constant, 3822 events are needed for every 1000 years.
This 62 year is a meaningful restraint in the current study, but it is unavoidable. However, the Kijko method tries to consider this lack of data in the estimation of the seismicity rate. The whole region is divided into a refined rectangular mesh with an increment of 0.1 degrees in longitude and latitude. Each cell in the mesh is defined by two indices, i.e. (i,j) in which i is the row (latitude) number and j is the column (longitude) number. Then, the annual occurrence rate of Mmin is calculated for each cell and denoted by λ 4 (i,j). The distribution of λ 4 is shown in Figure 6 in which the Frankel method [Frankel et al., 1996] was used to smooth it. The weighting matrix is written in Equation (3) based on [Frankel et al., 1996]. Then, Equation (4) is used to smooth the annual occurrence rate matrix by ten iterations [as done in Han and Choi, 2008].
For simplicity, the annual occurrence rate matrix is transformed into a set of vectors by using Equations (5) and (6). Then, the cumulative distribution function (CDF) of this vector is calculated by using Equation (7) as shown in Figure 7.
To simulate the new location of the earthquake, a random number between 0 and 1 is generated as a random CDF number. Then, the corresponding F λ is obtained based on Figure 7 (Equation 7). By using Equation (5), i and j numbers are derived which are corresponding to the row and column in the simulated annual occurrence rate matrix.
For the magnitude of the simulated catalogue, the same strategy as the location simulation is used. The only difference is that the CDF (Equation 7) of magnitude, as written in Equation (8), is replaced instead of the location CDF. The simulation process is done for three sets of catalogues consisting of 10 4 , 10 5 and 10 6 seismic events. ij = ( -1) + 4 ( ij ) = 4 (( -1) c + ) = 4 ( , )   (2).
where F M (m) is the CDF of magnitude (m); m 0 is the magnitude lower bound; λ m is the rate of earthquakes with magnitudes higher than m; λ m0 is the rate of earthquakes with magnitudes higher than m 0 .

SIMULATION OF CATALOGUE BY USING EQHAZ SOFTWARE PLATFORM
The EqHaz platform [Assatourians and Atkinson, 2013] is an open-source software which was introduced for earthquake catalogue simulation by using Monte-Carlo [Ulam, 1961] method. The input data for the EqHaz consist of (1) seismicity rate of a given fault; (2) the rate of earthquakes with M>=m; and (3) maximum magnitude. The algorithm within the EqHaz is based on the Monte-Carlo simulation method. In other words, the Monte-Carlo simulation method employs the distribution of magnitude, distance and time to regenerate new earthquake catalogues. The data based on Table (1) is used as input to the EqHaz platform, and three bins of catalogues generated including 62, 10 4 and 10 5 years.

CLASSICAL SEISMIC HAZARD ANALYSIS
The classical Probabilistic Seismic Hazard Analysis (PSHA) is performed by using Equation (1) [McGuire, 1995]. The magnitude Probability Density Function (PDF) is calculated by using Equation (9) [Cornell, 1968;Green and Hall, 1994]. The distance PDF is based on the assumption that the probability of earthquake occurrence is equal in all mesh grids. The Boore and Atkinson GMPE [Boore and Atkinson, 2008] has been used in this study. The fault mechanism is available in Table 1, and the soil shear wave velocity is obtained based on [Zare et al., 1999] and BHRC data (see Data and Resources section). The linear faults are divided into 20 km segments, and the area sources are divided into 10km X 10kmcell. Therefore, all the seismic sources are divided into 662 sections in which the distance between the cell centre and the city centre is calculated by using the Haversine relationship (Sinnott, 1984).

SEISMIC HAZARD ANALYSIS BY EMPLOYING THE MAGNITUDE-DISTANCE JOINT DISTRIBUTION
To use the joint distribution of magnitude-distance, the considered region is divided into a rectangular mesh. Each square mesh cell has a dimension equal to 10 km. The magnitude range is between M min equal to 4 and M max with an increment of 0.5 where M max is different for each seismic source (see Table 1). For each magnitude value, a Geometrical Earthquake Occurrence (GEO) matrix is assigned in which each cell contains the number of events occurred in this location. The GEO matrix dimension is the same as the area mesh as schematically written in Equation (10). (10) where NO stands for the Number of Occurrences in each cell and i is the latitude number, j is the number of longitude division, and k is the number of matrix layer that is equal to the magnitude discrete value range. For example, Equation (11) is written in the case of linear seismic source No. 1 for the magnitude between 4.75 to 5.25 and using the Han and Choi method [Han and Choi, 2008] for generation of 10 6 simulated records. (11) Due to the locations of the earthquakes that occurred in the last 62 years, there are several zero numbers in Equation (11). This fact illustrates that there is not an event in the earthquake catalogue in those cells. On the other hand, for example, 451613 earthquakes are generated in the case of Equation (11). For example, the cell which contains 2268 number value in Equation (11) corresponds to the 0.005 of the earthquake catalogue.
The M-R joint distribution PDF is obtained based on this fact that the integral of the PDF should be equal to unity. Therefore, Equation (12) is used to calculate the Event Rate Coefficient (ERC). The joint PDF is then obtained by multiplying the GEO matrix by the ERC parameter as written in Equation (13). The PSHA integral is re-written in Equation (14) by implementing the joint distribution of magnitude-distance.
where Coff.NO. is the occurrence rate coefficient, i is the number of latitude mesh, j is the number of longitude mesh, k is the number of magnitude intervals, and Number of occurrence is the occurrence rate matrix.
where F NO is the joint M-R distribution matrix.
, ( , ) where all the parameters definition are identical to Equation (1) except which is joint M-R PDF identical to F NO in Equation (13).

CLASSICAL SEISMIC HAZARD ANALYSIS
The classical PSHA is obtained based on Equation (1) and the dataset in Table ( [Gholipour et al., 2008] which was a national project. The assumptions and considered catalogue are not the same between the current study and [Gholipour et al., 2008]; however, many common points are present between them. The two curves show good agreement as seen in Figure 8a that confirm the accuracy of the current PSHA results in this study.

CLASSICAL SEISMIC HAZARD ANALYSIS BY US-ING THE SIMULATED DATA
The classical PSHA (Equation (1)) is calculated in this section by implementing the simulated data, i.e. using the Han and Choi algorithm as well as the EqHaz platform. The standard hazard curves, based on the observed data (Table 1) as well as the simulated data, 10 6 years based on the Han and Choi method, are compared in Figure 8a in which all show good agreement together. However, the hazard curve based on the simulated data is always lower than the hazard curve based on the observed data in the range of 0.05g-0.5g regarding PGA. The difference comes from the value used in Equation (9). As the simulated catalogue has significantly more data than the observed catalogue, the regression between magnitude and rate of occurrences has been changed. In other words, the seismicity is less in the case of the simulated catalogue when compared to the observed catalogue. This may be an effect of generating several large magnitude events within the simulated catalogue.
The standard hazard curves, based on the observed data (Table 1) and the simulated data (10 5 years based on EqHaz platform [Assatourians and G.M. Atkinson, 2013]), are compared in Figure 8b which are almost identical. The reason is that the EqHaz uses precisely the same seismic characteristics as the results obtained based on the observed catalogue. Therefore, it is evident that the two hazard curves are identical in Figure 8b.

CLASSICAL SEISMIC HAZARD ANALYSIS BY US-ING THE JOINT DISTRIBUTION OF MAGNITUDE-DISTANCE
In this section, the PSHA is calculated based on Equation (14) in which employs the M-R joint distribution. The hazard curves based on the Han and Choi algorithm (Han and Choi, 2008) in the case of 62, 10 4 , 10 5 and 10 6 years catalogues are shown in Figure  9a. As seen in Figure 9a, the 62 years of simulated catalogue is not reliable since the duration is too short. On the other hand, the 10 4 , 10 5 and 10 6 years catalogues result in nearly the same hazard curves. Therefore, the hazard curve based on the Han and Choi 10 6 years catalogue is used hereafter for further investigations in this study. The same data are provided in Figure 9b in the case of the EqHaz platform in which the hazard curve based on the 10 5 years EqHaz simulated catalogue will be used hereafter.
In order to distinguish the difference between Equation (1) and Equation (14), three different hazard curves are shown in Figure 10a consisting of (a) the classical PSHA identical to Figure 8a which employs Equation (1); (b) the hazard curve using Equation (14) and the 10 6 years simulated data by Han and Choi algorithm [Han and Choi, 2008]; (c) the hazard curve using Equation (14) and based on the observed data. The same data are provided in Figure 10b (1) and (14) based on the Han and Choi algorithm (Han and Choi, 2008); b) The comparison of hazard curves in the case of using Equations (1) and (14)

c) d)
As seen in Figure 10a, the classical hazard curve is upper than the joint distribution hazard curves. The reason is that the magnitude, as well as the distance distributions, are based on the observed past events, however, in the case of the joint distribution function (Equation (14)); a standard deviation is taken into consideration in each cell. This standard deviation makes it possible to consider some events in the locations that there is not any past event as well as some magnitudes that did not happen in the observed catalogue.
This justification in Figure 10b is different. As seen in Figure 10b, the classical PSHA still upper than the other two curves. However, the hazard curve based on the EqHaz platform show higher seismic hazards in high return periods when compared to the hazard curve based on the original observed catalogue. This phenomenon is vice versa in the low return period region as seen in Figure 10b. The reason is apparently coming from the difference between Equation (1) and Equation (14) which is the magnitude and distance distributions. For more elaborate with this issue, the magnitude probabilities are shown in Figure 11(a) to Figure 11(d) in four cases, i.e. the Gutenberg- Richter [1958], the observed catalogue, the Han and Choi catalogue, and the EqHaz catalogue. Figure 11(a) to 11(d) illustrate the magnitude distribution in the case of four seismic sources, i.e. line sources No. 1 and No.2 as well as area sources No.1 and No.3. The Han and Choi magnitude distribution is also nearly exponential in Figure 11(c). In general, the Han and Choi distribution is more compatible with the observed distribution when compared to the EqHaz distribution.

CONCLUSION
In this study, the multivariate statistical distribution of magnitude-distance is altered into the classical hazard analysis, and the obtained results are discussed. The classical SHA always depends on the seismicity rate parameters. For example, the classical SHA for two regions with the same seismicity rate parameters, but different earthquake catalogues will result in the same hazard curves which are seriously challengeable.
The Han and Choi, as well as the EqHaz software platform, were taken into consideration to obtain simulated earthquake catalogues. The results show that the Han and Choi method is more compatible with the observed data distribution when compared with the EqHaz results. Also, the hazard curve based on the Han and Choi algorithm is always lower than the hazard curve based on the classical SHA. This phenomenon shows that the classical SHA is still on the safe side at least within the assumptions and the case study of this paper. The main reason for this phenomenon is that, in the classical SHA, only the occurrence probability is taken into consideration in the locations in which past recorded data are available in the catalogue. However, in the case of the Han and Choi algorithm, a variation is considered around each location and magnitude. In other words, events are taken into consideration, within the Han and Choi algorithm, in the locations that there were no previous events in the observed catalogue.
It is worth mentioning that consideration of the M-R joint distribution results in meaningful different hazard curves when compared to the classical SHA hazard curves, at least within this study's limitations. More studies are obviously needed to prove this hypothesis.