Assessing ‘ alarm-based CN ’ earthquake predictions in Italy

The quantitative assessment of the performance of earthquake prediction and/or forecast models is essential for evaluating their applicability for risk reduction purposes. Here we assess the earthquake prediction performance of the CN model applied to the Italian territory. This model has been widely publicized in Italian news media, but a careful assessment of its prediction performance is still lacking. In this paper we evaluate the results obtained so far from the CN algorithm applied to the Italian territory, by adopting widely used testing procedures and under development in the Collaboratory for the Study of Earthquake Predictability (CSEP) network. Our results show that the CN prediction performance is comparable to the prediction performance of the stationary Poisson model, that is, CN predictions do not add more to what may be expected from


Introduction
Earthquake prediction is one of the ways in which seismologists make statements about the future seismic activity, usually on the basis of the observation of one or more candidate diagnostic precursors [Sykes andJaumé 1990, Pulinets andDavidenko 2014].A prediction consists in casting an alarm, i.e., a deterministic assertion that one target earthquake of a given magnitude will occur in a specified space-time window.
Such predictions appear to be prospective deterministic statements, but do not present the full picture.In fact all prediction schemes proposed to date have an intrinsically probabilistic nature [Jordan et al. 2011].To quantify this probability it is necessary to consider the possibility of raising false alarms and/or to miss some target earthquakes.When an alarm is cast for a specific space-time window, we have a hit (true positive) if a target earthquake occurs, otherwise we have a false alarm (false positive).When no alarm is cast for a specific spacetime window, if a target earthquake occurs we have a miss (false negative), otherwise a correct (true) negative.
Another way in which seismologists make statement about the future seismic activity is through prob-abilistic forecasting that consists of the estimation of the probability of one or more events in well-defined magnitude-space-time windows (e.g., Marzocchi et al. [2014], for the Italian region).According to Jordan et al. [2011], probabilistic forecasting provides a more complete description of prospective earthquake information than deterministic prediction, and, more important, it separates hazard estimation made by scientists from the public protection role of civil authorities [Jordan et al. 2014].Probabilistic forecasts can become predictions once a probability threshold is chosen [Zechar and Jordan 2008]; however, this threshold does not have any specific scientific meaning, but it has to be related to the kind of mitigation actions that are associated with the prediction [Marzocchi 2013].Regardless of these possible shortcomings, it is undoubted that earthquake predictions have a natural and immediate attractiveness for lay people, probably because a prediction is much easier to understand than a probabilistic forecast [Gigerenzer et al. 2005].In Italian language, the distinction between the English terms "forecast" and "prediction" cannot be made because only one word "previsione" exists and it has a strong deterministic connotation.
The large impact on society of earthquake forecast/prediction is a further motivation for carefully evaluating the performances of each model.This is the main target of an international initiative named Collaboratory for the Study of Earthquake Predictability (CSEP) [Jordan 2006, Zechar et al. 2010, Zechar and Zhuang 2014).In essence, CSEP promotes experiments in which prospective forecasts/predictions are compared to real observations in different testing regions.All experiments are rigidly controlled, i.e., they are replicable by anyone and modelers cannot change their forecasts/predictions in retrospect.Besides the statistical evaluation of each single model, these experiments offer also a unique opportunity to compare the relative forecasting/prediction skill of different models.As a matter of fact, claiming that one model is able to predict earthquakes may be misleading from a scientific point of view: it is much more interesting to compare the model performances with the skill of other competing models.In order to achieve a meaningful comparison, models have to provide forecasts/prediction in a proper common format.However, not all forecast/prediction models adopt such a format, so they are not all presently ready to be evaluated by CSEP experiment.This is the case of the algorithm under study here.
Since January 1, 1998, a group of researchers began to provide earthquake predictions in Italy through the use of a pattern recognition based on CN (California-Nevada, first regions of application) algorithm [Gabrielov et al. 1986, Keilis-Borok et al. 1990, Peresan, et al. 1999, Peresan et al. 2005, Romashkova and Peresan 2013].This type of prediction allows a quantitative validation of the prediction ability because the method is rigorously applied forward in time.This is certainly the most crucial aspect for the evaluation of any forecast/prediction model, because it guarantees that the results are not affected by any conscious or unconscious adjustment.In this paper we evaluate the results obtained so far from the CN algorithm applied to the Italian territory, by adopting widely used testing procedures and under development in CSEP.

The CN model in Italy and the target earthquakes
The CN algorithm is an earthquake alarm-based model that provides intermediate-term predictions for mainshocks.The algorithm casts an alarm when a time of increase probability (TIP) for the occurrence of tar-get earthquakes in one specific region is identified.The identification of a TIP is based on the observation of a set of candidate seismic precursory patterns that have been identified analyzing the past seismicity and are kept fixed [Keilis-Borok et al. 1990].Originally, the model was set up for California-Nevada (CN) region, but since then it has been adapted to many other regions of the world.The details of the application of the CN algorithm to the Italian territory can be found in Peresan et al. [1999Peresan et al. [ , 2005]], and Romanshkova and Peresan [2013] and are not described here.
The CN predictions in Italy are related to three macro-zones that cover part of the Italian territory and are characterized by a different minimum magnitude M 0 for identification of the target events.These macrozones (Figure 1) are northern Italy (N-Italy; M 0 ≥ 5.4), central Italy (C-Italy; M 0 ≥ 5.6), and southern Italy (S-Italy; M 0 ≥ 5.6).The target earthquakes are only mainshocks, i.e., aftershocks above the magnitude M 0 are not considered target events [Peresan et al. 1999].The forward predictions have been routinely performed every two months since January 1, 1998.More recently a new macrozone has been added (Adriatic region), but it will be not considered here because of the short time window for testing.The reference magnitude of the target earthquakes is taken from the UCI2001 catalog [Peresan et al. 2002].We do not perform any quality check to this catalog and we take it for granted.The CN predictions for Italy are available in a public website (http://www.geoscienze.units.it/esperimentodi-previsione-dei-terremoti-mt/algorithm-cn-initaly/cn-predictions-in-italy.html).Only the current prediction is released by the authors under a password access that has been given to a list of interested scientists.
In Table 1 we show the list of target earthquakes that occurred during a defined testing period (from 01/01/1998 to 08/28/2016, including the very recent Amatrice earthquake of 08/24/2016) reported on CN web-site.A comparison of this catalog with the data extracted by the NEIC catalog (National Earthquake Information Center; http://earthquake.usgs.gov/earthquakes/search/) for the same region, depth and testing period (Table 2), highlights some inconsistencies; for example, some earthquakes above M 0 occurred outside the three macro-zones as shown in Figure 2.
Here, we do not deepen this inconsistency and take for granted the catalog reported in Table 1 that will be used for the following analysis.

Assessing the CN predictions
We test the CN predictions by using two different statistical methods: the Molchan test (MT) [Molchan 1997, Zechar andJordan 2008] for the alarm-based model, and the parimutuel gambling score (PGS) [Zechar and Zhuang 2014] for the evaluation of the earthquakes forecasts.
The MT and PGS methods have some remarkable differences.The MT test is based on the definition of a null hypothesis that can be rejected or not according to the observed data.On the contrary, in PGS test the null hypothesis is not requested, because the score is intended to provide a rank of the models based on their relative predictive skill.

Molchan test (MT)
The MT verifies if the number of hits for the CN model (in the space-time domain) is consistent with the number of hits expected by a reference model (e.g.Poisson model).When the observed number of hits is sufficiently bigger than those expected by the Poisson model, we conclude that the model under test is significantly better than the reference model [Zechar and Jordan 2008].This test is usually represented through a diagram, using different fractions of space-time occupied by alarms.In our case, the CN algorithm sets one specific value for such a variable, so the test collapses into a Bernoulli case.Specifically, h is the number of hits for the CN model, N is the total number of target events, and the parameter of the Bernoulli distribution is given by x, that is the percentage of the space-time covered by alarms (for instance, if the alarms cover the whole interval of time and half space, then x = 0.5).This choice of the parameter is appropriate for our specific case, while it may require much more elaborated estimations in testing other earthquake prediction models [e.g., Marzocchi et al. 2003, Molchan andRomashkova 2010].
The null hypothesis H 0 under testing is: the CN model and the Poisson model have the same predictive performances.Under H 0 , a binomial distribution describes the probability P of a Poisson model to obtain h or more correct predictions, with N observed target events and x is the percentage of the space-time occupied by alarm; that is, x is the probability to observe one target earthquake in a pure random process described by the Poisson distribution.The Poisson model is chosen because it can be considered the simplest random guess forecast strategy within each zone [Kagan 2009].
The binomial distribution reads In the classical Neyman-Pearson statistical testing framework [Neyman and Pearson 1933] we can 'reject' or 'not reject' the null hypothesis, if P is less or larger that a pre-selected significance level.In practice, we set a significance level of 0.05 (this value is commonly used in science) and we reject the null hypothesis if P is less than 0.05; otherwise, we conclude that there is no empirical evidence supporting that the CN prediction capability is superior to the Poisson process.

Parimutuel gambling score (PGS)
The PGS method is a useful tool to compare the prediction performances of two models: in our case, CN and Poisson models.The PGS method works like each model is a gambler.In the space-time bins there is a gamble between models; each model bets 1 coin on this bin.Then, each model gets a win proportional to the probability assigned to the event (earthquake or not) that has occurred on the bin.In mathematical terms where W j is the gain that the j-th model obtains in the bin, k is the total number of the models and p j is the probability of the occurred event (earthquake or not) for the j-th model.The sum of the W j for each bin represents the skill of the models: the bigger the win, the better the model.The sum of the wins and losses for all the k models is always zero.
In previous works the PGS method has been used to check the performance of the likelihood-based models [Taroni et al. 2014, Zechar andZhuang 2014].In this work, we apply the same procedure to the alarm-based models.In particular, for the CN model we define p m = 1 when a target event happens during an alarm or when no target event happens during no-alarm (i.e.hit and correct negative, respectively).We define p m = 0 when no target event happens during an alarm or when a target event happens during a period when there is no alarm.(i.e.false alarm and miss, respectively).To compute the probability p m related to the Poisson model, we use the same macro-zones and the same minimum magnitude used in the CN model.We compute the Poisson rate by using the CPTI11 catalog [Rovida et al. 2011] declustered with the Gardner and Knopoff algorithm [Gardner and Knopoff 1974] until 12/31/1997 (the CN prediction experiment started on 1/1/1998).

Results
The Molchan test results are summarized in Table 3, where we shown that P of Equation ( 1) is never less than the preselected significance level; this means that we cannot reject the null hypothesis for all three macrozones with the pre-selected significance level of 0.05.
The Parimutuel Gambling Score results are shown in Table 4; the results indicate that the Poisson model is better in all three macro-zones (scoring 32.7 for the north macro-region, 36.7 for center, and 27.7 for south).Of course the scoring of the CN model has opposite values.

Conclusions
Considering the data available so far, the Molchan Test does not show that CN prediction performance is significantly better than predictions based on the stationary Poisson model.Moreover, the results of parimutuel gambling score indicate that the Poisson model is even better than CN in predicting earthquakes, as shown by the values associated to each win.
This result is similar to what obtained by the testing of other similar pattern recognition models performed at a global scale once a proper null hypothesis is used [Marzocchi et al. 2003, Zechar andZhuang 2010].
From a practical perspective, the results show that CN predictions do not add significant information that may be used to enhance societal earthquake preparedness.

Figure 1 .
Figure 1.Maps of the three Italian macro-zones related to the regionalization proposed by Peresan et al. [1999]; (a) northern region; (b) central region; (c) southern region.M 0 is the minimum magnitude for the target earthquakes in each macro-zone (for details on magnitudes see Peresan et al. [1999]).

Table 2 .
Event list extracted by the NEIC website (National Earthquake Information Center; http://earthquake.usgs.gov/earthquakes/search/)forearthquakes occurred in Italy since 1/1/1998 to 28/8/2016 and related parameters. Figure 2. (a) Location map of earthquakes (red stars) listed in Table 1.(b) Location map of earthquakes (blue stars) listed in Table 2. (c) Location map of earthquakes (magenta stars) listed in Table2that did not occur inside the macro-zones considered by the CN model.

Table 4 .
Parimutuel gambling score results (bold italic for the best model).