Forecasting Italian seismicity through a spatio-temporal physical model : importance of considering time-dependency and reliability of the forecast

We apply here a forecasting model to the Italian region for the spatiotemporal distribution of seismicity based on a smoothing Kernel function, Coulomb stress variations, and a rate-and-state friction law. We tested the feasibility of this approach, and analyzed the importance of introducing time-dependency in forecasting future events. The change in seismicity rate as a function of time was estimated by calculating the Coulomb stress change imparted by large earthquakes. We applied our approach to the region of Italy, and used all of the cataloged earthquakes that occurred up to 2006 to generate the reference seismicity rate. For calculation of the time-dependent seismicity rate changes, we estimated the rate-and-state stress transfer imparted by all of the ML ≥ 4.0 earthquakes that occurred during 2007 and 2008. To validate the results, we first compared the reference seismicity rate with the distribution of ML ≥ 1.8 earthquakes since 2007, using both a non-declustered and a declustered catalog. A positive correlation was found, and all of the forecast earthquakes had locations within 82% and 87% of the study area with the highest seismicity rate, respectively. Furthermore, 95% of the forecast earthquakes had locations within 27% and 47% of the study area with the highest seismicity rate, respectively. For the time-dependent seismicity rate changes, the number of events with locations in the regions with a seismicity rate increase was 11% more than in the regions with a seismicity rate decrease.


Introduction
Statistical forecasting of earthquakes has proven to be of great importance over time, due to two critical aspects. First, a reliable earthquake forecast is a primary component of seismic hazard assessment as well as of any kind of seismic-risk mitigation. Therefore, forecasting can be associated not only with scientific interest, but also with social, economic and political impact [Marzocchi et al. 2003, Lombardi andMarzocchi 2009]. Secondly, earthquake forecasting facilitates the quantitative testing of any earthquake occurrence models or hypotheses ]. Statistical earthquake forecasting also differs greatly from earthquake prediction [Vere-Jones, 1995].
To estimate the probability distribution of future earthquakes, many forecasting methodologies have been proposed [Ogata 1988, Gerstenberger et al. 2005, Lombardi and Marzocchi 2009, Cocco et al. in press, Console et al. 2010]. Although there is some evidence for distance and directional dependencies of dynamic triggering earthquakes [e.g., Parsons 2002, Felzer and Brodsky 2006, Main 2006, several studies have suggested that the change in the static Coulomb failure stress (DCFS) due to an earthquake has a stronger influence on the distribution of both aftershocks and large earthquakes that might follow [e.g., Harris 1998, Stein 1999, King and Cocco 2000, Freed 2005, Steacy et al. 2005]. According to the Coulomb model, the stress perturbation due to a large earthquake not only triggers the aftershocks, which are mainly located in the area of increased DCFS, but also inhibits further earthquake occurrence in the stress-shadow areas [e.g. Ma et al. 2005]. By deriving the DCFS, it is possible to estimate qualitatively the change in seismicity rate after the occurrence of earthquakes. To quantify the impact of these stress changes on the seismicity rate, previous studies have applied the rateand-state model of Dieterich [1994] to investigate the time-dependency of earthquake occurrence [Toda et al. 2005, Catalli et al. 2008, Console et al. 2010]. In this model, a sudden increase in seismicity rate occurs where the DCFS is positive, due to the earthquake interactions; this is followed by a graduate recovery over time. The opposite is observed in the case of negative DCFS. Through this model, the evolution of the seismicity rate due to stress changes imparted by earthquakes can be estimated.
To estimate the reference seismicity rate needed in the rate-and-state model application, Toda et al. [2005] and Catalli et al. [2008] used a smoothing method that was applied to the seismicity before the first stress perturbation. Console et al. [2010] introduced a Brownian passage-time model [Matthews et al. 2002] to estimate the reference seismicity rate. The forecasting result is in all cases obtained by considering the reference seismicity rate in the seismicity rate change calculations by the rate-and-state friction model. The Brownian passage-time model requires some more information on maximum possible magnitude, earthquake recurrence time, and coefficient of variation (also known as aperiodicity) of the inter-event times on the faults being considered [Zöller and Hainzl 2007]. However, when a forecasting methodology is applied, it is desirable to impose a minimum of model assumptions, especially when limited geological and geophysical information is available.
Our approach can be separated into two steps: first, to evaluate the reference seismicity rate, we built up a timeindependent forecast using an epicenter-smoothing method [Woo 1996, Molina et al. 2001, Beauval et al. 2006] that was applied to past earthquakes. In the second step, the rateand-state friction model was applied to consider the seismicity rate change imparted by the DCFS due to recent large and moderate-sized earthquakes. By combining the results of these two steps, the time-dependent seismicity rate can be estimated. For the time-independent part, this approach required only knowledge of the epicenter and magnitude of each earthquake in a complete catalog, which records all of the events that are over a certain magnitude. Thus, this can be easily applied to most regions of the World. The time-dependent part, on the other hand, required knowledge of a certain number of free model parameters: the focal mechanisms of recent earthquakes, to calculate DCFS; the slip distribution for the main shocks, or alternatively their locations; the earthquake magnitudes; the fault plane solutions when applying empirical relationships [Wells and Coppersmith 1994]; the constitutive physical parameter of the rate-and-state friction model, Av; and the characteristic time t a . The considerable number of free parameters in such a physical model is without doubt one of its weaknesses. Catalli et al. [2008] reduced the number of a-priori assumptions in the model using physical relationships and statistical methods to estimate the parameters from information contained only in an earthquake catalog. However, Cocco et al. [in press] studied the impact of the physical model parameters and their correlations in applying such a model; they concluded that their relationships and the influence on the final results need to be taken into strong consideration. To reduce the number of a-priori assumptions in the modeling by as much as possible, we used information contained in the Italian Centroid Moment Tensor (CMT) dataset [Pondrelli et al. 2006] to constrain the focal mechanisms, we fixed Av = 0.1 bar in agreement with the literature [Toda and Stein 2003, Toda et al. 2005, Catalli et al. 2008], and we assumed t a to be the duration of aftershocks, which is defined as a magnitude-dependent parameter. This is determined by the same approach used for the declustering that is briefly described in the next section. All of the other information was taken from the selected earthquake catalog.
We tested our method for the Italian territory, which is one of the most seismically active regions in Europe. The high level of seismic hazard, combined with the high population density and the large number of vulnerable buildings, results in a high seismic risk in this region [e.g. Marzocchi 2008]. The earthquake catalog available provided us with sufficient information to build up the forecasting model for distribution of future earthquakes and to further test its performance.

The earthquake catalog
In this study, we used the catalog of Italian seismicity (Catalogo della Sismicità Italiana, CSI; http://csi.rm.ingv.it/) for a reliable list of events from 1985 to 2002, and the Italian seismic bulletin (Bollettino Sismico Italiano, BSI; http:// bollettinosismico.rm.ingv.it/) to cover the period from 2003 until the end of 2008. The BSI contains information from a sparse short-period network in Italy, the installation of which started in the early 1980's. The CSI is a later integration with data from other networks [Castello et al. 2006, Chiarabba et al. 2005]. When dealing with Italian data, it needs to be take into account that since April 16, 2005, new interactive analysis has been introduced for the BSI seismic data. Thus, any comparison of homogeneity before and after this date is difficult [for details, see Bono and Badiali 2005, Schorlemmer et al. in press].
In the CSI and in most of the BSI catalog, magnitudes are given as local magnitudes (M L ). The duration magnitude (M d ) in the BSI catalog was converted into local magnitude (M L ) using the relationship of Castello et al. [2007]: (1) We calculated the magnitudes of completeness (M c ) for the seismic catalogs over different time periods using the ZMAP program [Wiemer 2001]. Here, the M c of the CSI from 1985 to 2002 was 3.0, and for the BSI, the M c from 2002 to April 2005 was 2.5, while from May 2005 onwards it was 2.0. The period between 1985 and 2007 was considered as the «learning period», to establish the reference seismicity rate. This seismicity during the learning period is presented in Figure 1. The period from 2007 to the end of 2008 will be referred to as the forecasting period.
Furthermore, we tested both non-declustered and declustered versions of the catalog (i.e., a catalog which had been filtered for dependent events, such as foreshocks, aftershocks and swarms). Events are considered dependent when they occur within a given time and distance of previous large events, defined by the magnitude dependent distance and time window parameters given by Burkhard and Grünthal [2009] and Grünthal et al. [2009].

Methodology
We present here a forecasting approach that was based on catalog observations and physical modeling of earthquake interactions. In particular, we quantitatively analyzed the importance of introducing time-dependency into the forecasting ability of such a physical model. For this reason, we distinguished between time-independent and time-dependent steps in our approach. The results were compared to estimate the importance of the introduction of time into the forecast. It must be noted that in general, the application of a rate-and-state friction model requires knowledge of a reference seismicity rate (the estimation of which is based on the observation of seismicity in the learning period), because the expected seismicity is directly proportional to this value, as described by Dieterich [1994]. In the present study, we refer to the time-independent seismicity estimate using the more general term «reference seismicity» instead of «background seismicity», in agreement with Dieterich [1994] and the comments of Cocco et al. [in press]. In most of the rate-and-state applications [Toda and Stein 2003, Toda et al. 2005, Catalli et al. 2008, the estimated reference seismicity rate is considered as only an input into the rate-and-state friction model. Here, we also wanted to emphasize the forecasting power of the estimated reference seismicity rate, as a simple smoothed seismicity, referenceforecasting tool [Zechar and Jordan 2010], and to compare its forecasting abilities to those of the rate-and-state time-dependent model. This implies that in the present study the reference seismicity rate estimation was equivalent to the time-independent forecast.

Large-scale tectonic zonation
To derive a reliable forecast of the seismicity rate that can take into account largely differing areas tectonically, we introduced a large-scale zone model ( Figure 1). This is important mainly for estimation of the reference seismicity rate by applying an epicenter smoothing method, as is described in Section 4. The subdivision of the study area mainly followed the large-scale tectonic architecture, and it also took into account the different seismicity data characteristics.
The present study required consideration not only of the seismicity data for Italy, but also of those from the Alpine area in the north, and from the Adriatic and Dinaridic fold belts in the east. The large-scale subdivision of Italy into four sub-regions were originally introduced for catalog completeness, and they basically followed Stucchi et al. [2004]: the Po plain and adjacent mountainous areas; the main part of the Apennines; the southern Apennines and Calabria; and Sicily with the adjacent waters (Figure 1: PP, Apn, SAC and Sic, respectively). In the north, the region of the Alps was separated into: the western Alps; and the eastern Alps (Figure 1: WA and EA, respectively), since both of these areas are significantly different in their seismicity ITALIAN TIME-DEPENDENT FORECASTING

Evaluation of the time-independent seismicity rate
In the present study, we estimated the reference seismicity rate through a magnitude-and distance-dependent epicenter-smoothing method. We assumed that this smoothed seismicity is time independent for the sake of simplicity. However, the assumption of a reference seismicity is still controversial [Hainzl and Ogata 2005, Lombardi et al. 2006, Lombardi and Marzocchi 2007, even using a declustered catalog that should contain only independent events. In this respect, the choice of the time window used to estimate the smoothed reference seismicity is crucial, because long-lasting aftershock sequences can contaminate this estimation [Marsan 2003, Marsan andNalbant 2005, and references therein].
We estimated the mean annual seismicity m (M, x) at the site of interest x as a function of the magnitude M, described as [Woo 1996]: (2) where K (M, x − x i ) represents a Kernel function as a function of the magnitude M and the distance r i between the site of interest x and the epicenter of the i'th earthquakes x i , T M represents the complete catalog duration for the magnitude M, and N M represents the total number of earthquakes of magnitude M in the earthquake catalog. We adopted the approach developed by Woo [1996], where the distribution of the earthquake density was represented by the Kernel function K (M, r i ) as a function of the magnitude M and distance: (3) where PL denotes a power-law index with recommended values between 1.5 and 2.0, which correspond to a cubic or quadratic decay of seismic activity with hypocenter distance [Molina et al. 2001]. Since we found that the difference in the results was insignificent when PL was assumed to be between 1.5 and 2, in the present study we assumed PL = 2.0. H(M) is a bandwidth function that is defined as the distance between two events as an exponential function of the magnitude, which can be represented as: , where c and d are constants that can be acquired from regression in each large-scale zone. To obtain m (M, x), the Kernel function was summed over all of the events in the complete part of the earthquake catalog, and divided by the catalog duration, in agreement with Equation (2).
To accommodate the spatial variation of the seismicity, we considered different bandwidth functions for the representation of the seismicity rate in the different largescale zones defined in the previous section. The bandwidth functions were acquired through regression on the earthquakes, which occurred after the catalog completeness time in each large-scale zone (Figure 2). In the calculation for reference seismicity rate, each event in the catalog was introduced in the sum of Equation (2), considering the specific bandwidth function of the large-scale zone containing the event.

Coulomb stress-change calculations
The DCFS due to the occurrence of one or more events can be estimated through the Coulomb Failure Model (CFM) [King et al. 1994, Toda et al. 2005, Lin and Stein 2004, King and Cocco 2000], expressed as: (5) where Dx is the change in shear stress and Dv n is the change in normal stress at a given receiver fault plane, and n´ is the effective coefficient of friction. Here, n´ was generally assumed to be in the range of 0.0-0.8, since previous studies reported only modest effects on DCFS of assuming extreme values of n´ [Ma et al. 2005, Chan andStein 2009], and we had no detailed information about this parameter. Therefore, we used a fixed value of n´ = 0.4. According to the CFM, a positive stress change favors subsequent earthquake events; vice versa, a negative stress change inhibits future seismicity.
The application of the Coulomb model for the calculation of stress perturbations due to earthquake occurrences required knowledge of source parameters, such as the slip distribution and geometry of the rupturing fault. For this purpose, we adopted here a homogenous slip model with dimensions and average slip derived from the scaling laws of Wells and Coppersmith [1994] for a general focal mechanism: (6) where L is the rupture length in km, M W is the moment magnitude, W is the rupture width in km, and AD is average slip in meters. Since the scaling laws consider moment magnitude (M w ), whereas the catalog used is in local magnitude (M L ), we converted these using the relationship of Castello et al. [2007]: Generally, the receiver faults are represented as the following forms: (1) the optimally oriented fault planes in the regional stress field and the stress change caused by the main shock [King et al. 1994]; (2) the geometry of active faults in the region [Toda et al. 1998 They showed that consideration of earthquake nucleation on multiple receiver fault orientations significantly changes the predicted spatial stress-change pattern and the total number of triggered events. Thus, we introduced focal mechanisms from the INGV-Harvard European-Mediterranean Regional CMT (RCMT) catalog [Pondrelli et al. 2006] (Figure 3a) as the receiver fault references. We assumed a different receiver fault plane for each grid cell, according to the nearest available reference focal mechanism in this database (Figure 3b). This procedure is in agreement with the findings of Hainzl et al. [in press]. We estimated the DCFS on a 0.2˚× 0.2˚ grid with variable receiver faults by applying the COULOMB 3.1 code [Toda and Stein 2002].

Rate-and-state stress transfer by large earthquakes
We introduced the rate-and-state friction model [Dieterich 1994, Toda and Stein 2003, Toda et al. 2005, Catalli et al. 2008 to quantify the impact of DCFS on the seismicity rate. In this model, the time-dependent seismicity rate change R (M, x, t) due to a stress change imparted by the n'th earthquake at the location of X (DCFS n (X)) can be represented as: This relationship describes the time-dependent ITALIAN TIME-DEPENDENT FORECASTING   Èj j seismicity density for consideration of a series of earthquakes, and represents a generalization of the relationships given by Dietrich (1994), who described the rate change caused by a single stress perturbation. Here, m (M, x) is the reference seismicity rate, and R n−1 (M, x) is the timedependent estimated seismicity rate immediately before the occurrence of the n'th earthquake (i.e. R 0 = m (M, x)). DCFS n (X) is the stress disturbance caused by the n'th earthquake, and Av represents a constitutive parameter of the model, where A accounts for the direct effect of friction in the rateand-state-dependent model and v is the effective normal stress, as described by Dietrich [1994]. In the present study, Av was assumed to be 0.1 bar, in accordance with the physically reasonable ranges reported in the many srudies that have applied the method to different regions [Toda and Stein 2003, Toda et al. 2005, Catalli et al. 2008]. Here, t n represents the occurrence time of the n'th earthquake, and t na is the duration of the aftershock sequences (known also as the «characteristic time» or «relaxation time»), and it was used as the time window in the present study for the declustering (see Section 2), which is a function of magnitude. The relationship shows that the seismicity rate after the n'th earthquake is influenced by the rate just before the event, and is strongly dependent on the reference seismicity rate m (M, x).

Time-independent forecasting
We generated the reference seismicity rate, i.e. the timeindependent forecast of future seismicity, as a function of the magnitude by analyzing the earthquakes that occurred during the reference period, according to both nondeclustered and declustered versions of the catalog.
We first present the results based on the non-declusterd catalog (Figure 4). The calculation grids followed the use for the Collaboratory for the Study of Earthquake Predictability (CSEP) forecasting experiments in the Italian testing region. The highest seismicity rate was observed in central Italy and near the borders with Switzerland, Austria, and Slovenia. While the higher seismicity rate was estimated for a smaller magnitude bin, the locations of the rate peaks, however, were somewhat different in each magnitude bin. To validate our results, we compared the estimated reference seismicity rate with the distribution of earthquakes after 2007 A.D. (referred to as the forecast earthquakes) using a Molchan diagram [Molchan 1990, Molchan 1991, Zechar and Jordan 2010, as shown in Figure 5. The Molchan diagram displays the fraction of space occupied by the forecast alarm versus the fraction of failure to predict. It was constructed by considering the locations of the examination earthquakes ITALIAN TIME-DEPENDENT FORECASTING Figure 6. Distribution of reference seismicity rate acquired from the declustered catalog for different magnitude bins. Earthquakes occurring during the forecasting period are shown as blue dots.
with respect to the seismicity density rate distribution. For each event, the total area with a seismicity density rate equal to or smaller than that at the location of the earthquake was extracted and represented as a percentage of the total study area. The events were then sorted according to the percentages of area, and plotted against the event count, which was represented as the percentage of the total number of examination events. When the diagram shows a diagonal line, this suggests that there is no correlation between the forecast and the observed seismicity. When an upward arc is shown, this suggests a negative correlation, while a downward arc suggests a positive correlation, between the forecast and the observed seismicity. An optimistic result would be represented by the condition of having the lowest fraction of space occupied by alarms with the lowest percentage of failure to forecast the seismicity. When we compared the distribution of the seismicity rate obtained with the locations of the forecast earthquakes in the Molchan diagram in Figure 5, a positive correlation was evident ( Figure 5, red rectangles). All of the forecast earthquakes had locations within 82% of the study area with the highest seismicity rate. However, in more detail, we can see that 95% of the forecast earthquakes had locations within 27% of the study area with the highest seismicity rate, representing a good result for the method applied.
As mainshocks usually cause more damage than foreshocks and aftershocks, it is common practice to emphasise the forecasting of events that turn out to be mainshocks. We therefore also estimated the seismicity rate as a function of magnitude based on the declustered catalog during the reference period, to obtain a forecast for mainshocks only ( Figure 6). This forecast showed a similar seismicity rate pattern as the results for the non-declustered catalog ( Figure  4), with lower values of seismicity rate, as expected.
A comparison of the distribution of the reference seismicity rate with the locations of the forecast earthquakes in the Molchan diagram ( Figure 5, green rectangles) also suggested a positive correlation. However, the correlation was slightly worse than that for the results obtained from the non-declustered catalog ( Figure 5, red rectangles). For the declustered catalog, all of the forecast earthquakes had locations within 87% of the study area with the highest seismicity rate. Similarly, 95% of the forecast earthquakes had locations within 47% of the study area with the highest seismicity rate. This larger area in itself indicates a less successful forecast in the areas of high seismicity rate. A more significant indication of the forecast quality was seen with the systematic shift of the curve for the non-declustered catalog towards a smaller area.

Time-dependent forecasting
To include the ability of short-term forecasting of aftershocks or triggered earthquakes, we included timedependency in our forecast through DCFS incorporated into a rate-and-state friction model. As earthquakes with small magnitudes or which occurred far back in time cause insignificant changes in the current seismicity rate [Catalli et al. 2008], we considered only M L ≥ 4.0 earthquakes that had occurred during the time period from 2007 to the end of 2008. The calculated total change in the seismicity rate at the end of 2008 (end of the forecast period) is shown in Figure 7. Here, only 0.4% of the study area had a change of seismicity rate of more than 1%. When we compared the change in seismicity rate with the locations of the forecast earthquakes, the number of events with locations in regions with a seismicity rate increase at the time of occurrence of the event was 11% more than in regions with a seismicity rate decrease.
To validate the time-dependent forecasting, we used the Molchan diagram to compare the estimated seismicity rate change according to the rate-and-state friction model with the distribution of the forecast earthquakes ( Figure 8). This showed a positive correlation when the seismicity rate change was positive, i.e. more events take place in the regions with increased seismicity rate. In contrast, this showed a negative correlation when the seismicity rate change dropped. This suggests a marginal improvement in the forecasting accuracy with the introduction of the rate-and-state friction model, as the seismicity rate change predicted by the rate-and-state friction model was insignificant in comparison to the reference seismicity rate, which can be attributed to the rather long time interval considered for the forecast earthquakes.
The evolution of seismicity rate at two points near the M5.2 Tosco-Emiliano earthquake (Figure 7, green star) in 2008 is shown in Figure 9. After the slip of the earthquake, the DCFS at Point A is 0.02 bar (Figure 7, red triangle). This corresponded to a 26% seismicity rate increase immediately after the earthquake, an effect that decays with time. Then, 90 days after the earthquake, the increase in the seismicity rate was reduced to 11% above the background rate. A similar recovery process for the seismicty rate towards the background level was seen in areas of reduced stress ( Figure  7, Point B, blue triangle). The earthquake caused a -0.04 bar DCFS, corresponding to a 33% reduction in the seismicity rate immediately after the earthquake. This decrease in the seismicity rate was reduced to 19% after 90 days. This indicated that the impact of DCFS that is due to large earthquakes will be much more significant when applied to short-term forecasting on a local scale, such as for forecasting of aftershock sequences.

Discussion
In the present study, we estimated the reference seismicity rate for the study region through a bandwidth function that worked as a smoothing Kernel in the neighboring regions of earthquakes. We used this distribution of the seismicity rate to forecast the time-independent distribution of future earthquakes. The bandwidth function for each large-scale zone was acquired through regression of ITALIAN TIME-DEPENDENT FORECASTING   Figure 7). the earthquake catalog. To check the forecast uncertainty related to this regression, we compared the seismicity rates that corresponded to the upper and lower boundaries of the 95% confidence intervals of the bandwidth function in each zone (Figure 10a). Although a significantly higher rate was seen in regions with few earthquakes when using the upperboundary bandwidth functions, the differences were insignificant in the regions where most of the forecast earthquakes occurred. From a consideration of the Molchan diagram (Figure 10b), the three sets of bandwidth functions showed similar forecasting abilities.
To include the short-term forecasting ability for aftershocks or triggered earthquakes, we included timedependency in our forecast, through DCFS incorporated into a rate-and-state friction model. However, this provided only a marginal improvement in the earthquake forecasting ( Figure  7 and Figure 8). This result can be attributed to the misfit of source fault models estimated by scaling laws, or receiver faults estimated according to the nearest available reference focal mechanism [Pondrelli et al. 2006, represented in Figure  3a], or the long forecast period used in the present study. It is expected that this approach will have a significantly better performance when detailed slip-dislocation models are considered as source faults for the calculations of the Coulomb stress change [Hainzl et al. 2009], and when applied for short-term forecasting, such as for the distribution of aftershocks. A detailed analysis of the uncertainties of the model and its parameters will be an important part of our future studies. In the present study, however, our main aim was to reduce the number of a-priori assumptions in the modeling as much as possible, and to study the effects of introducing time dependency on the forecasting result.

Conclusions
The following main conclusions can be drawn from the present study: 1) the time-independent part of the model that was estimated using a smoothing Kernel function showed good forecasting ability; 2) the distribution of the seismicity rate based on a nondeclustered version of the catalog showed better forecasting performance than the declustered one, although both of these showed positive forecasting; 3) the forecasting abilities within the 95% confidence intervals of the bandwidth functions for the time-independent forecasting model were similar; 4) the time-dependent part of the model that was estimated by the rate-and-state friction model showed only marginal improvement to the forecasting accuracy.  Figure 2). Green dots, earthquakes that occurred during the forecasting period. (b) Molchan diagram investigating the correlation between the forecasted seismicity rate and the forecast earthquakes when different bandwidth functions are considered.