Short-term earthquake forecasting experiment before and during the L ’ Aquila ( central Italy ) seismic sequence of April 2009

In this paper, we compare the forecasting performance of several statistical models, which are used to describe the occurrence process of earthquakes in forecasting the short-term earthquake probabilities during the L’Aquila earthquake sequence in central Italy in 2009. These models include the Proximity to Past Earthquakes (PPE) model and two versions of the Epidemic Type Aftershock Sequence (ETAS) model. We used the information gains corresponding to the Poisson and binomial scores to evaluate the performance of these models. It is shown that both ETAS models work better than the PPE model. However, in comparing the two types of ETAS models, the one with the same fixed exponent coefficient a = 2.3 for both the productivity function and the scaling factor in the spatial response function (ETAS I), performs better in forecasting the active aftershock sequence than the model with different exponent coefficients (ETAS II), when the Poisson score is adopted. ETAS II performs better when a lower magnitude threshold of 2.0 and the binomial score are used. The reason is found to be that the catalog does not have an event of similar magnitude to the L’Aquila mainshock (Mw 6.3) in the training period (April 16, 2005 to March 15, 2009), and the a-value is underestimated, thus the forecast seismicity is underestimated when the productivity function is extrapolated to high magnitudes. We also investigate the effect of the inclusion of small events in forecasting larger events. These results suggest that the training catalog used for estimating the model parameters should include earthquakes of magnitudes similar to the mainshock when forecasting seismicity during an aftershock sequence.


Introduction
On April 6, 2009 at 01:32 (UTC), a magnitude M w 6.3 earthquake struck the Abruzzi region in central Italy.Despite its moderate size, the earthquake caused more than 300 fatalities and partially destroyed the city of L'Aquila and many surrounding villages.The epicenter of the mainshock was located a few km WSW of the city.It occurred on one of the NW-SE trending and towards the southeast of the town, but a couple of days later it migrated towards the northwest, to the Barete and Campotosto zones (Figure 1).In fact, after the mainshock of April 6, 2009, two other strong events occurred with their own aftershocks.The first one occurred about 40 hours after the mainshock, at 17:47 UTC on April 7, 2009 with a magnitude of M w 5.6, to the southeast (in the Valle d'Aterno), and the second occurred at 00:53 UTC on April 9, 2009, with a magnitude of M w 5.4, to the north of the mainshock near Campotosto, about 16 km north of L'Aquila (Figure 1).Large aftershocks can cause great damage to building structures that have already been weakened by the mainshock.Thus, it may be important, in such circumstances, to evaluate the aftershock probability in real time, to avoid losses from aftershocks.The principle of evaluating the probabilities of earthquake occurrence using point process models, which are mathematically formulated with conditional intensity functions, was framed by Vere-Jones (1998).Among different models for seismicity, the Epidemic-Type Aftershock Sequence (ETAS) model, proposed by Ogata (1988), which describes the features of earth-quake clustering of foreshocks, main shocks, and aftershocks, has become a standard model for testing hypotheses, and a starting point for short-term earthquake forecasts (Console and Murru, 2001;Console et al., 2003;Murru et al., 2009;Zhuang et al., 2004;Helmstetter and Sor-nette, 2003;Hainzl and Ogata, 2005;Zhuang et al., 2008;Lombardi et al., 2010;Marzocchi and Lombardi, 2009).In the last decades, this model has been greatly developed (see Helm-stetter and Sornette, 2002;Ogata et al., 2003;Ogata, 2004;Zhuang et al., 2005;Console et al., 2006aConsole et al., ,b, 2007)).In particular, its space-time distribution now has several different mathematical forms, developed by different researchers such as Marzocchi andLombardi (2008, 2009), Helmstetter et al. (2006), Werner et al. (2011), andZhuang (2011).
During the L'Aquila sequence, the ETAS model was used to forecast aftershocks in real time after the April 6, 2009 L'Aquila earthquake (Marzocchi and Lombardi, 2009).For the month following the L'Aquila earthquake, there was a good fit between the forecasts and observations.Subsequently, in 2012, two versions of the ETAS model were applied in real time to the seismic sequence of the May-June 2012 Emilia earthquakes (Marzocchi et al., 2012) to track the daily and weekly evolution of aftershock sequences.The models used were by Falcone et al. (2010) and Lombardi and Marzocchi (2010).However, model performance was not evaluated in their work.Nanjo et al. (2012) applied the ETAS model to Japan, together with four other models, in a retrospective way, for the aftershock sequence following the March 11, 2011 magnitude (M) 9.0 Tohoku-Oki earthquake, and obtained reliable results.They found that the model by Falcone et al. (2010) was better suited in the number forecasts compared to other models.This ETAS model is one of the models submitted to the various testing centers of the CSEP (Collaboratory for the Study of Earthquake Predictability) in Japan (Tsuruoka et al., 2012), Italy, New Zealand, and California.In this study, the primary objectives are to compare two versions of the ETAS model developed by different research groups, and to evaluate the influence of small earthquakes on predictive performance.The first version, named ETAS I, was developed by Ogata (1998) and Console et al. (2010a,b).The second, hereafter referred to as ETAS II, is used by Zhuang et al. (2005), Ogata andZhuang (2006), andZhuang (2011).Although these two versions belong to the same class of ETAS model, their details, such as how to determine background seismicity, how to deal with boundary conditions, and how to estimate parameters, are quite different.In this study, we hope to find which choices result in better forecasting performance.
To test their capability for short-term forecasting of moderate and large earthquakes in the area mentioned above, we apply these two epidemic models to the instrumental database of shallow seismicity (March 16, 2009-June 30, 2009) collected by INGV.We use the probability gain to evaluate the forecasting performance, with the Proximity to Past Earthquakes (PPE) model, which is a forward kernel estimate model, as the reference model.We are also going to verify how much information is carried by small events, compared to larger events, in forecasting the occurrence of the mainshock.Two versions of the ETAS model are considered for the purpose of understanding the role of small events in forecasting large events, in order to verify the statement made by Helmstetter (2003) and Helmstetter et al. (2005) that small events are important in triggering large earthquakes.
Another aim of this experiment is to see how the forecasting of seismicity works by usingmodels with nearly blind parameters.The formal Italy catalog (ISIDE) begins in 2005, and the forecasting period begins in 2009.Thus, there are only four years of low seismicity with which to train the model parameters.We attempt to evaluate the forecasting performance of the models when applied to such a catalog.

The PPE model
The PPE (Proximity to Past Earthquakes) model was proposed/formulated by Jackson and Kagan (1999) and named by Rhoades and Evison (2004).Essentially, it is a Poisson model with a specific method of estimating the seismicity rate, i.e., it is a smoothed seismicity base-line model.It can play the role of a spatially varying reference model against which the performance of timevarying models can be compared.For this reason, it is adopted in this study as the reference model.The Relative Intensity (RI) model by Nanjo (2010Nanjo ( , 2011) ) and the Simple Smoothed Seismicity (Triple-S) model by Zechar and Jordan (2010), also belong to this category, the only difference being how the seismicity rate is estimated.The PPE model is used as a forward likelihood method to estimate the seismicity rate function (see also Chiodi and Adelfio, 2011), i.e., it has a conditional intensity (seismicity rate) of the form (1) where i includes all events before t (possibly including events outside the study region), t 0 is the starting time, a, d, and f are model parameters, and s(m) represents the probability density function (p.d.f.) form of the Gutenberg-Richter (G-R) magnitude-frequency relation for earthquake magnitudes larger than the magnitude threshold m c , i.e., Given an earthquake catalog, which is recorded as a list in the form {(t i , x i , y i , m i ) : i = 1, • • • , N } from spatial region S and time interval [0, T ], the likelihood has the standard form (see, e.g., Ogata, 1988;Daley and Vere-Jones, 2003;Zhuang et al., 2012) (3) where m (t, x, y) = m (t, x, y, m)/s(m) is the (t, x, y)-marginal conditional intensity.

Space-time ETAS models
The ETAS model is a stochastic point-process model, and is more complicated than the PPE model.It builds on two well-known empirical laws: (1) the aftershock rate generally decays according to the Omori-Utsu formula; (2) the aftershock area grows exponentially with mainshock magnitude (Utsu and Seki, 1955), as does the number of aftershocks (Kanamori and Anderson, 1975), leading to the productivity as an exponential function of the magnitude.It also takes stationary seismicity and secondary aftershocks into account.In the , / model, we still assume that the magnitude distribution is separable from the other components, with the same density as specified by Equation (2).The expected number of earthquakes in the unit space-time window centered at (t, x, y), given the observations before t, can be written as (4) where µ(x, y) represents the spontaneous (background) seismicity rate, which is a function of spatial locations but is constant in time, and p (t, x, y, m; t i , x i , y i , m i ) is the contribution to the seismicity rate by the ith event occurring previously.In practice, the response function p (t, x, y, m; t i , x i , y i , m i ) is assumed to be separable and dependent on the differences in time and spatial locations of the initiating event, or explicitly, (5) where ( 6) is the expectation of the number of "children" (productivity) from an event of magnitude m, with parameter a describing the scaling of the aftershock productivity with the magnitude of the triggering earthquake according to e aM , (7) is the p.d.f. of the length of the time interval between a "child" and its "parent", c and p are constants, u is the elapsed time since the main event (Omori-Utsu formula), and f (x, y; m) is the p.d.f. of the relative locations between the "parent" and "children".The spatial density distribution of aftershocks has been studied by many authors.Felzer and Brodsky (2006) examined the distance distribution of earthquakes occurring in the first five minutes after 2 ≤ M < 3 and 3 ≤ M < 4 mainshocks, and found that their magnitude M > 2 aftershocks showed a uniform power-law decay with slope of 1.35, up to 50 km away from the mainshocks.From this they argued that the decay with distance could be explained only by dynamic triggering.Richards-Dinger et al. (2010) found a similar pattern with a decay slope of 1.24 ± 0.09 under the same conditions, although they propose an alternative explanation for the decay.More recently, Gu et al. (2013) confirmed a power law decay in the spatial distribution of aftershocks with an exponent of less than two.However, it must be noted that these studies considered the density distribution of the aftershocks as a whole, given the identification of a single mainshock in a sequence, whereas our models deal with the p.d.f. of the relative locations between each event and every direct offspring in the sequence,without including the effect of secondary triggering.Here we consider two forms for the spatial p.d.f.: (8) and (9) In the above formulations, the constant parameter vector is i = (A, a, c, p, D, q) for ETAS I and i = (A, a, c, p, D, q, c) for ETAS II.A further free parameter for both ETAS I and ETAS II is the b value of the magnitude p.d.f.s(m) introduced in Equation ( 2).This parameter is determined for the entire study area independently of the other ETAS model parameters, as will be explained later in this paper.In the ETAS II model, the parameter c is independent of a.A detailed description of these ETAS I parameters is given in Appendix B. Consider that in ETAS I, the exponential term depending on magnitude in Equation ( 6) is divided by the right hand side denominator of Equation ( 8) in the product of Equation ( 5).The result is that the spatial p.d.f. is equal to one at zero distance from the parent event, regardless of its magnitude, as can be seen in Equation ( 23) of Appendix B. Therefore, the total number of triggered events from a triggering earthquake depends exponentially on its magnitude.In this case, assuming a = 1.0, ln 10 has the physical meaning that the number of triggered events is proportional to the rupture area of the triggering earthquake (see Console et al., 2006b;Hainzl et al., 2008).
Among the parameters, A is the expected number of direct offspring produced by an event of the threshold magnitude, a quantifies the difference in the productive efficiency from events of different magnitude, c and p are the parameters in the Omori-Utsu formula for aftershock frequencies, D is the characteristic triggering distance, q is the spatial decay coefficient, and c is the spatial scaling factor.It is easy to see that both models described above are branching processes with "immigrants": the background (immigrant) process is a Poisson process; once an event occurs, whether it is from the background process or triggered by a previous event, it triggers a non-stationary Poisson process, specified by Equation ( 5), as its offspring process.This model is also a type of self-excitation process (Hawkes, 1971a,b).The difference between ETAS I and ETAS II is as follows.In ETAS I, the scale factor for the productivity function is the same as the scaling factor for the ( , , , ) ( ) ( , ) ( , , ; , , , ) , t x y m s m x y t x y t x y m spatial response function.ETAS I is used by Ogata (1998), Console et al. (2003Console et al. ( , 2006aConsole et al. ( ,b, 2010b,c),c), Murru et al. (2009), andFalcone et al. (2010).In this article, the conditional intensity of ETAS I given by Equation ( 8), with g and f being normalized to be integrated to one, is slightly different from its original form (See Appendix A).ETAS II was developed by Zhuang et al. (2005) and Ogata and Zhuang (2006), and is an improved version of the model in Ogata (1998).Zhuang et al. (2004) and Zhuang (2006) found that ETAS II fits the data better than ETAS I by using stochastic reconstruction and second-order residual analysis.The formulation and parameterization were slightly different from Console et al. (2003) and Ogata (1998), but were essentially the same: here we normalize the spatial response kernel, and explicitly separate l(m) from other components.If the background seismicity rate µ(x, y) is known, the model parameters, i, can be estimated by maximizing the likelihood function, which takes the same form as Equation (3).When the background seismicity rate is unknown, we can still estimate the background and clustering parameters simultaneously by some iterative algorithms.We apply the following iterative procedure to both models to simultaneously estimate the model parameters and the spatially varying background seismicity rate.
(2) Find the maximum likelihood estimate (MLE) of model parameters, with assumed back-ground seismicity rate.For ETAS I, the free parameters are (A, c, p, D, q), and a is fixed.For ETAS II, the free parameters are (A, a, c, p, D, q, c).
(3) Compute the background probability { i as the ratio between the background seismicity component and the total seismicity rate for any event i.
(5) Go to Step 2, and continue until the maximum likelihood parameters do not exhibit significant variations.

Forecasting procedure
Given the observation up to time t, to forecast whether there will be one or more earthquakes of magnitude greater than M f in the next time interval T 1 , T 2 in region S, the probability that at least one event will occur is given by (10) and the expectation of the number of events occurring in [T 1 , T 2 ] × S is given by (11) in principle, calculated based on the observation up to time t.However, since we have no observations in the forecasting period [T 1 , T 2 ], we use the approximate observations up to T 1 , and assume that there are no events occurring in between T 1 and t.As pointed out by Zhuang (2011), this procedure underestimates the number of event occurrences because the triggering of earthquakes in the forecasting time interval is not counted in this equation.This underestimation becomes worse during the activating triggering periods in the beginning stages of aftershock sequences.Such underestimation could be avoided by using simulations.However, in this study, we do not consider using simulations in order to make the comparison simpler, since simulation is not implemented in the forecasting procedure of ETAS I.In this article, we adopted a fast and neat computation method for both models: As m (t, x, y) is a decreasing function of time t to the occurrence times of subsequent events, and m (t, x, y) increases after an event occurring in the forecasting time interval, Equation ( 14) underestimates the occurrence rate.In Equation ( 15), since m (T 1 , x, y) > m (t, x, y) for all t in [T 1 , T 2 ], the above approximation compensates to some degree the underestimation that is caused by the assumption that, while calculating m (t, x, y), there are no events occurring between T 1 and t.

Scoring procedures
Given forecasts from different models, it is important to know which model performs the best in forecasting.Many methods have been proposed for testing and evaluating the significance of forecasts (e.g., Zechar, 2010;Zhuang, 2010).In this study, we only consider the information score, also called the entropy score (e.g., Kagan and Knopoff, 1977;Vere-Jones, 1998;Harte and Vere-Jones, 2005), which is a natural way to evaluate the performance of probability forecasts.Suppose that the whole space-time-magnitude window for a forecast is divided into M cells of equal size and that the forecast gives a probability p - i that at least one event occurs in the ith space-time-magnitude cell; the reference model, usually taken as the Poisson model, gives a probability of p i .The binomial score for cell i against the reference model is then defined as the logarithm of the likelihood ratio of the forecasting and reference models where Y i = 1 if there is at least one event occurrence in the ith cell, and, Y i = 0 otherwise.Similarly, the Poisson score is defined by ( 14) where n i is the number of events occurring in cell i, and I (A) is the logical function, taking on a value of one if A is true, and zero if A is false, p i,k and p - i,k are the probabilities that the ith cell has k events occurring in it, given by the forecasting model and the reference model, respectively.The quantity e B i or e B i is the probability gain for the ith interval.The total information score over all cells is defined as or (15) The information gain per unit space-timemagnitude volume is defined as ( 16) where V is the total volume of the space-timemagnitude range, B i takes on the value of B i (b) or B i (p) and D is the size of each cell.When B i takes on the value of B i (b) , G is called the binomial information gain.Similarly, G is the Poisson information gain when we use B i (b) .Such information gain varies when the division of the space-time-magnitude range of interest changes, but converges to a limit related to the forecasting potential of the forecasting model when the volume of each cell becomes infinitesimally small (Vere-Jones, 1998).

Data
The data used for this study are drawn from the Italian Seismic Instrumental and Parametric Database (ISIDE) of INGV. Figure 1 shows the epicenter distribution of 2,739 shallow earth-quakes with magnitudes equal to or larger than 1.6 and depths ≤ 30 km, reported by INGV from April 16, 2005 to March 15, 2009, and 3,007 events from March 16, 2009 to June 30, 2009 with M ≥ 2.0 in the region to be tested with all the forecast models (i.e., 12.4 • E-14.2 • E, 41.5 • N-43.1 • N).A preliminary analysis of this seismic catalog shows contamination during the first time period by man-made explosions from quarries, spatially concentrated around (42.0 • N, 13.0 • E), as shown in Figure 1.These 151 events are distinct, since their magnitudes are small (≤ 2.0) and they occur during local working hours (between 7 a.m. and 5 p.m., UTC time).Thus, the catalog used for the learning phase does not include the quarry blasts, contains 2,588 events, and is considered complete for M ≥ 1.6.
events) and 2.0 (907 events), while the target events are still of magnitude 2.0 and higher.That is, in Equation (1), the value of i goes through all the events of the cutoff magnitude, whereas, in the likelihood Equation (3), i goes through all the events of magnitude 2.0 and higher.We denote the model with cutoff magnitudes 1.6 and 2.0 by PPE1.6+ and PPE2.0+, respectively.The parameters obtained from fitting PPE1.6+ and PPE2.0+ are listed in Table 1.Comparing between the likelihoods, we can see that PPE1.6+ is better than PPE2.0+,implying that using information from smaller earthquakes improves the fitting and forecasting performance.Figure 3a shows the map of the estimated seismicity rates from the PPE1.6+model, using data from before March 15, 2009.Model ETAS I.For the learning period of this model, we use events recorded by INGV from April 16, 2005 to March 15, 2009, with M ≥ 1.6.The threshold magnitude, for both triggering and triggered events, was taken to be 1.6, and the test period is from March 16, 2009 to June 30, 2009, using events of M ≥ 2.0 (see Figure 2).The b-value is assumed to be constant over the geographical area spanned by the catalog, and is estimated independently of the other parameters.We do not use a spatially varying b-value, since the study region is already quite small for justifying a subdivision.Instead, the b-value was determined for the whole investigated area, with the maximum likelihood method of Aki (1965), to be equal to 1.12 ± 0.02.The standard deviation estimate is obtained following Shi and Bolt (1982).
Consequently, the b-value is also estimated independently of the other parameters, as it is equal to b ln 10.As described in Section 2.2, ETAS I has five free parameters (A, a, c, p, q) that can be estimated by the maximum likelihood method in Equation (3).However, we fixed the value of a to 1.0 ln 10 to reduce the number of free parameters, as done in Console et al. (2010a,c) and Falcone et al. (2010).Thus, the effective number of free parameters for ETAS I is four.Before determining the best fit of the above mentioned parameters, we determine the scaling factor d in Equation ( 24), modeling the seismicity as a pure Poisson process, so that no information about the parameters characterizing the induced seismicity is necessary at this stage.In this work, the optimal choice of d is made by trial and error.For the assessment of this parameter, we used the method of comparing the independent components of the seismicity distribution of two different periods containing roughly the same number of events.
We maximized the likelihood of the seismicity contained in half of the INGV earthquake catalog (from 2007 to 2009, M ≥ 1.6) under the time-independent Poisson model obtained from the other half of the catalog (2005)(2006).Here, d is modified until the maximum likelihood is obtained for one of the subcatalogs, using the smoothed seismicity derived from the other subcatalog.The correlation distance d was found to be 8 km, which is the value taken for the following analysis.
The maximum likelihood set of the four free parameters of ETAS I is found using the initial values of the smoothed seismicity µ(x, y) by interpolation of the gridded distribution.The initial smoothed background seismicity rate is then obtained for the whole data set.Parameter b is probably influenced by the location errors of the epicenters reported in the catalog analyzed, which is not taken into consideration in our simple algorithm.Figure 4(a) shows the map of the final smoothed estimate of the background rate, obtained after four iterations using the iterative procedures described in Step 3 of Section 2.2.learning phase, are: A = 0.2015, D = 2.132 × 10 −5 , q = 2.009, c = 2.002 × 10 −2 , p = 1.105, and a = 2.30 (fixed) (see also Table 2).The stability of our maximum likelihood algorithm was tested by starting from different initial sets of parameters, and checking that the same results are always obtained.One might notice that the parameters for ETAS I are super-critical, i.e., t = Ab/ (b − a) > 1 (see, e.g., Helmstetter and Sornette, 2002;Zhuang et al., 2004 andZhuang andOgata, 2006, for details on the criticality of the ETAS model), unless a truncated G-R magnitude-frequency relation is used, or some other similar treatments are adopted.The direct consequence of supercriticality is that the total number of events grows, on average, exponentially with time.When the forecasting interval is long, overforecasting of seismicity might happen.However, a supercritical process can still be used to forecast the aftershock production within a short period after a triggering event.In this case, the same parameters can be  used for simulation, since a supercritical ETAS process needs some time to enter the state of population explosion.In our tests, after each time step, the history of the process is updated by observation, but not by events simulated in previous steps.That is, events generated by the supercritical parameters are not put into the process history.With such a temporally piece wise scheme, forecasting by the ETAS model remains stable.Model ETAS II.In the estimation of ETAS II, earthquake data from the region (7.9˚E-18.7˚E)and (36.5˚N-47.1˚N)are used to obtain the seismicity rate.
The events in this larger region with respect to the learning area of ETAS I (that is, outside of the L'Aquila area) are used as auxiliary events to incorporate the triggering effect from events outside into the contribution to the occurrence rate inside this area (see also Wang et al., 2010).That is, the events in this larger area are not considered in the likelihood computation.However, they are considered as events that could trigger other events inside the test area.This is a different point from ETAS I, arising from the independent development of these two models.The target region (12.4˚E-14.2˚E,41.5˚N-43.1˚N) is the same as in ETAS I.The learning period is April 16, 2005 -March 15, 2009.Three versions of ETAS II are considered in this study: (1) we fit the ETAS model to earthquake events of M ⩾ 2.0 (named ETAS II-2.0+ in Table 2); (2) we fit the ETAS model to earthquake events of M ⩽ 1.6 (named ETAS II-1.6+ in Table 2) and scale the forecast seismicity rate for events of M2.0+ using the G-R magnitude-frequency relation; and (3) we fit the ETAS model to earthquake events of M ⩾ 2.0, but with events of 1.6 ⩽ M < 2.0 as auxiliary events (named Model II-2.0m in Table 2); that is to say, when calculating m(t) using Equation ( 4), i is taken over all the M 1.6+ events before t, and when computing the likelihood in Equation (3), i is only taken over the events of M 2.0+.These three versions are considered in order to understand the role of small events in forecasting large events, and to verify the statement made by Helmstetter (2003) and Helmstetter et al. (2005) that small events are important in triggering large earthquakes.The MLE of the model parameters are listed in Table 2. Figures 4(b) to 4(d) show maps of background rates estimated using ETAS II-1.6+,II-2.0+, and II-2.0m, respectively.
Daily forecasts.We assume that each forecast is for seismicity starting at 0:00 and ending at 24:00 (UTC) each day.Figure 5 gives the forecasting maps of expected intensity of events in units of events/day/deg 2 for four time windows, according to ETAS II-2.0m. Figure 5(a) shows the expected daily seismicity rate forecast for March 17, 2009.Just before the occurrence of the mainshock (April 6, 2009), the density map is similar to the background seismicity rate (Figure 4d), taking into account the difference in the color scale for rate density.
After the occurrence of the mainshock, the seismicity in the L'Aquila region is dominated by aftershocks (Figure 5c).If the forecast is carried out shortly after the mainshock, it is possible that the forecast can be improved.In Figure 5(c), this feature has been well captured by ETAS II-2.0m.Two weeks after the mainshock has occurred, the rate of aftershocks decays to a low level (Figure 5d), but is still much higher than the average level (Figure 3).

Evaluation of forecast performance
The region to be tested for ETAS I and II for temporal and spatial occurrence is (12.4˚E-14.2˚E) .(41.5˚N-43.1˚N).The area has been divided into cells of 0.1˚.0.1˚.The test period is from March 16 to June 30, 2009 for events of M ≥ 2.

Predictability of the mainshock
An important feature is that a foreshock swarm, consisting of 50 earthquakes of M ≥ 2.0, occurred before the mainshock (April 2009).This raises the question of how the mainshock can be forecast by the models considered in this study.On the day of the mainshock, PPE2.0+,PPE1.6+,ETAS I, ETAS II-1.6+,ETAS II-2.0+, and ETAS II-2.0m, give an expected occurrence rate on the order of a few events per day for the occurrence of events of M ≥ 2.0 in the test region (Figure 6, top panel).Model ETAS I seems to give the highest estimate.However, when we consider its score for all the days before the mainshock, the probabilitygain becomes much lower, using PPE2.0+ as the reference model.The total conditional probability computed by ETAS I for an earthquake of magnitude M ≥ 5.0 during the week preceding the April 6 mainshock was 0.39%.This probability was about 20 times larger than the background probability, because of the occurrence of some foreshock activity.The expected instantaneous occurrence rate density increased by several times in the few hours before the mainshock.However, this level still seems low for justifying the implementation of risk mitigation measures.Here we refer to Van Stiphout et al. (2010) for further discussions on risk mitigation.
It is arguable whether a reliable prediction can be made based on these foreshocks.However, the answer seems to be that we cannot obtain more information than what can be forecast by a clustering model like the ETAS model.The same has been reached by many researchers (see, e.g., Felzer et al., 2004;Helmstetter et al., 2003;Zhuang et al., 2008;Christophersen and Smith, 2008;Marzocchi and Zhuang, 2011).

Predictability of the mainshock
In this subsection, we evaluate the temporal performance of the forecasts made using all six models.The results are shown in Figure 7.As discussed in Section 5.1, we first divide the target space-time range into cells of size 1(day) × S × [2.0, ∞), where S is the test re- gion under consideration.
The daily forecasts are given as the expected rate in each of the cells.Figure 6 shows the occurrence rates forecast by all these models, and the actual observations of daily number of earthquakes during the testing period starting on March 16, 2009.Among them, the PPE models start at a rate similar to that of the ETAS models, and with some increments in the forecast rates after the mainshock due to the dramatic increase of the number of events in the aftershock sequence.All the ETAS models catch the patterns of the temporal variation of the aftershock rates with slight differences, except that the curve of earthquake numbers forecast by ETAS I is smoother than those produced by ETAS II.A problem that arises when modeling the occurrence of relatively small events is that these small events are generally missing from the catalog after a large mainshock, so the completeness magnitude increases significantly after the large mainshock.Consequently, one effect of missing early aftershocks after the mainshock is that earthquake forecast models overestimate the observed number of events during the first few hours or days following the mainshock.
Another effect of missing early aftershocks in the real catalogs is the underestimation of the expected subsequent seismicity rate (e.g., Helmstetter et al., 2006).However, none of these effects can be clearly noted for the catalog used for the L'Aquila mainshock, as can be seen in Figure 6.
Assuming the PPE2.0+ as the reference model, Figure 7 gives the information score for all the models for each day.By comparing the performance of the various models, and also making use of Table 3, we observe the following: (1) Inclusion of smaller events improves the forecasting results.As mentioned in a previous section, PPE1.6+ and ETAS II-20m both work better than the versions for events M ⩾ 2.0.
(2) Both types of ETAS models are superior to the PPE models, with binomial or Poisson scores of one to two orders higher.
(3) ETAS I is better at forecasting the number of  (4) Direct fitting to M ≥ 2.0 works better than using models that fit to all events of M ≥ 1.6.
However, an adjusted ETAS model (II-2.0m),which fits earthquakes of M 2.0+ but considers the triggering effect from events of magnitudes between M 1.6 and M 2.0, yields better forecasts than its counterpart II-2.0+.Possible reasons for this are: (1) the triggering effect from small earthquakes is important and cannot be neglected, as shown by Helmstetter et al. (2005); (2) since events of magnitudes M 1.6 to M 2.0 comprise a major proportion of the data set, when fitting ETAS II-1.6+ to the data, the contributions to the occurrences of these events from events of similar and larger magnitudes are considered, while events smaller than M 1.6 are not, yielding a biased estimate.

Discussion and conclusions
In Section 5.2, we saw that all the ETAS models used in this study underestimate the occurrence rate soon after the mainshock and during the aftershock sequence.One way to improve the performance of an ETAS model, especially in the case of a long aftershock sequence, is to update the maximum likelihood parameters at the end of each day, just before the computation of a new forecast for the following day.
Another possible cause of the systematic underestimation of occurrence rate during an aftershock sequence could be that our ETAS models were fit to a learning data set of relatively moderate seismic activity, since there are no earthquakes of magnitude 5.0+ in the catalog before March 15, 2009.In this case, our maximum likelihood (ML) parameters, applied to larger magnitude mainshocks, are in some sense the results of an extrapolation, which can easily cause large biases.
As to the issue of forecasting the April 6, M w 6.3 mainshock, the results reported in Section 5.1 show that the occurrence of foreshock activity some days before the mainshock raised the probability gain up to an order of 20, using the ETAS I model.This does not appear to be a great achievement towards the potential use of this information for civil protection purposes.
Considering that in this case, as often occurs before strong earthquakes, significant foreshock activity was observed just a few hours before the mainshock after midnight, a suitable strategy could be to issue forecasts in automatic and semi-real time whenever significant seismic activity is observed, instead of just once a day.
In this study, the most significant feature is that before the mainshock, all ETAS II-type models performed better than ETAS I, while in the aftershock period, ETAS I obtained a much better Poisson score than all the ETAS II models.We can see that ETAS I is a subclass of ETAS II.This means that the best model in I should not be able to outperform the best model in II.One possibility is that the seismicity behavior changes before and after the occurrence of the mainshock.To test whether the seismicity pattern changes before and after the mainshock, we fit the space-time ETAS II-2.0+ to the seismicity after the mainshock.In this case, the ML parameters are A = 0.515 event, c = 0.0143 day, a = 0.918, p = 1.20,D = 8.96 × 10 −5 deg 2 , q = 1.89, and c = 0.508.Clearly, the a-value is much larger than the one used for forecasting in the ETAS II models (a = 0.588), which is obtained by fitting ETAS II-2.0+ to the data before March 15, 2009.This also explains why ETAS I with a high a-value can produce a better Poisson score.The above comparison covers only a very active period.To make an overall forecast comparison, we should carry out a comparison for much longer data set.However, during the quiescent period, ETAS I begins to over-perform other models.
One might ask: What is the authors' advice regarding the two information gain metrics?Is one preferred under certain conditions?In our opinion, there are no problems related to the gain metrics.For the Poisson score, the problem is caused by the assumption that the forecast number of events belong to a Poisson distribution, with the mean being the integral of the conditional intensity function.This is true for Poisson type models, such as the PPE models used in this study, but the ETAS model is a clustering model, with a stochastic conditional intensity, which increases once an event occurs.That is to say, the distribution of number of events in the forecasting interval has a variance much larger than that of a Poisson distribution with the same rate.This problem has been much discussed by Lombardi (2014).For the binomial score, such a problem does not exist.The zero and one probabilities are exactly given by Equation (10).In this respect, it seems the binomial score should be preferred.
In this study, we do not consider the performance of these models in forecasting spatial locations, since the area is quite small, 1.1˚ by 1.5˚, while the location error is several kilometers.Moreover, all the models use the G-R magnitude-frequency relation for the magnitude distribution, independent of the time and location components.Thus, there are no differences between them in forecasting the magnitude distribution of future events.
In summary, to achieve better forecasts, as wellknown from the existing literature on the ETAS model, the training catalog should contain some typical examples of similar magnitudes.If not, it would be better to use a typical set of parameters from a nearby region or from a region of similar tectonic structure or equivalent seismicity levels.Other lessons from this study are as follows.(1) Making use of the earthquakes of lower magnitude works better than using only events larger than the target magnitude threshold.(2) Parameters should be updated during the forecast process, if possible .(3) Forecasts should be updated just after the occurrence of potential foreshock activity, whenever significant moderate earthquakes are observed.

Figure 1 .
Figure 1.Epicentral distribution of earthquakes during the learning (April 16, 2005 -March 15, 2009) and test (March 16, 2009 -June 30, 2009) phases reported by the INGV data center in the area under analysis, affected by the 2009 L'Aquila earthquake.There were 2,588 (M L ≥1.6) events (blue circles) during the learning period.Green dots around (42.0 • N,13.0 • E) indicate events (151) recognized as quarry blast activity, which were not considered in the analysis.Red circles indicate events with M L ≥2.0 (3007) analyzed in the testing phase.The L'Aquila sequence occurred in a silent region at the southern end of the northern Apennine extensional belt.Stars indicate the three M w ≥5.4 events that occurred in April 2009.Solid lines are mapped Quaternary faults.The Paganica Fault is indicated in blue.The 1915 Avezzano earthquake source is also shown in the map DISS Working Group, 2010 (DISS 3.1.1,2010) as a rectangular box.
b linked to the b-value by b = b ln 10.Such a formulation enables us to maximize the likelihood estimate of the seismicity rate.

4. 2
Preliminary estimation results for the learning phase PPE model.When fitting the PPE model to earthquake data, two cutoff magnitudes are used, 1.6 (2,588

Figure 2 .
Figure 2. Magnitude versus time: (a) for the learning (April 16, 2005 to March 15, 2009) (upper left panel) and (b) test (March 16, 2009 to June 30, 2009) (upper panel, right side) periods.All magnitudes are considered in the plots.The area considered spans 41.5 • N-43.1 • N and 12.4 • E-14.2 • E. Magnitude-frequency plot: (c) for the learning (lower plot, left side) and (d) test (lower plot, right side) periods.The M c is selected according to the best combination of the 95%-90% maximum curvatures where the observed data can be fit by a straight line.

Figure 5 .
Figure 5. Examples of expected daily seismicity rate forecast by ETAS II-2.0, at 00:00 UTC on the days: (a) March 17, 2009, (b) April 6, 2009 (1 hour and 32 minutes before the L'Aquila mainshock M W 6.3), (c) April 7, 2009 (the second-largest shock in the Abruzzi region occurred on April 7, 2009 at 17:47 UTC, with M W 5.6), and (d) April 20, 2009.The color scale represents the number of events in units of magnitude larger than 2.0 in cells of 1˚.1˚per day.

Figure 6 .
Figure 6.Comparison between forecast and observed rates during the test period (March 16 -June 30, 2009) for events of M ≥ 2.0, per day.The scale of y-axis is logarithmic, with the value 0.1 considered to be equivalent to 0. The top and bottom panels show the periods March 25 to April 9, 2009, and March 25 to July 1, 2009, respectively.

Figure 7 .
Figure 7. Temporal variations of daily information gains for each forecasting model, with PPE2.0+as the reference model: (a) binomial information gains for earthquakes of magnitudes 2.0+, (b) binomial information gains for earthquakes of magnitudes 3.0+, (c) Poisson information gains for earthquakes of magnitudes 2.0+, and (d) Poisson information gains for earthquakes of magnitudes 3.0+.

Table 1 .
The maximum log-likelihood parameters obtained after the last iteration, for the Parameters of PPE models estimated by the maximum likelihood method (learning phase).

Table 2 .
Parameters of the ETAS models estimated by the maximum likelihood method (learning phase).

Table 3 .
Information gains compared to Model PPE2.0+ for each model.The boldface fonts show the best models in the corresponding category.EC stands for "event cell" and NEC stands for "non-event cell".
earthquakes, while the ETAS II models are better at forecasting Yes/No for M 2.0+ or 3.0+.