Comment on"Assessing CN earthquake predictions in Italy"by M. Taroni, W. Marzocchi, P. Roselli

The paper by Taroni et al. (2016) considers results of forward prediction of Italian strong earthquakes by CN algorithm with the declared intent of providing"a careful assessment of CN prediction performances... using standard testing procedures". Given the very limited number of target events within each region, however, the considered situation is non statistical, and a priori it is clear that the standard statistical methods are not effective here. The attempt to replace the standard approaches by Pari-mutuel Gambling Score (PGS) method leads to almost complete loss of information about predicted earthquakes, even for a large sample of target events. Therefore, the conclusions based on PGS, are untenable.

The paper by Taroni et al. (2016) considers results of forward prediction of Italian strong earthquakes, during the period 1998-2016, based on CN algorithm. The declared intent of the paper is to give "a careful assessment of CN prediction performances … using standard testing procedures." This is unlikely feasible goal, however, because the target earthquake data related to each individual CN sub-region of Italy (see Table 3 in the paper) are very limited. Namely, the number N of target events within each region is: N = 5 (M>5.3, North) ; 3 (M>5.5, Center); and 1 (M>5.5, South) This situation is non statistical, and a priori it is clear that the standard statistical methods are not effective here.
Let us consider the best case, from statistical point of view, provided by North region with 5 target events. Here CN has qualitatively good result: 4 successes out of 5 target events. Formally such result corresponds to the observed significance level alpha=5.8% (p-value=0.058) in random guessing, with the success probability 0.357 (Table 3 in the paper). Based on these values the authors conclude that "the model CN and the Poisson model have the same predictive performances". This conclusion needs comments: 1) The alpha estimate is unstable over the time, because the number of target events is small. In fact, the next target earthquake in the region will change the score 4/5 as follows: either 5/6, or 4/6. As a result the estimate alpha will become 2.4% or 12.5% respectively. Accordingly, in the first case the authors will come to the directly opposite conclusion. This instability is the consequence of the authors' choice to analyze sub-regions with very few data.
2) The alpha is nothing more than observed significance of the result 4/5 given percentage of space-time in alarm 35.7% (Table 3), and it doesn't represent the prediction ability of CN. In this connection, it is useful to consider the standard prediction ability index, i.e. a fraction of nonrandomly predicted events e= n ! (hit rate)τ (alarm rate) Setting aside the problem of small number of target events, we get e = 4/5-0.357= 44% It is worthy of note that such estimation of e, as large as 44%, is an extremely high value when dealing with the prediction of strong earthquakes. For example, e≈20% for M8 algorithm in the prediction magnitude 8 or larger events worldwide (Molchan and Romashkova, 2010).
3) The values of alpha and e are simply point estimations. Therefore, to judge the predictive performance of the CN method the interval estimations are necessary, especially in the case of deficiency of data. 4) As an alternative to the above mentioned classical approach, the authors consider also a gambling approach, suggested by Zhuang (2010) and applied recently by Zechar and Zhuang (2014). Their Pari-mutuel Gambling score (PGS) method applied to earthquake forecasting has been analysed in detail by Molchan (2016). Taroni et al. (2016) adapt the PGS approach to the analysis of the alarm-based CN prediction algorithm. The conclusion about predictive ability of the CN method in this case is based on the summary Pari-mutuel Gambling score T W (computed according to formula (2) of their paper). To explain this quantity, we have to introduce some notations.
Let's represent the period of CN monitoring T as a union of subintervals i Δ of length Δ . We define when alarm happens (not happens) in i Δ . The result of prediction of target events during the period T in total is given by the confusion matrix ( ) __ n n n n is total alarm time, and If Δ p is a probability of occurring of target event during i Δ in random model, then This value is interpreted as the gain of a forecaster against random guessing. Therefore, negative values of T W for a forecaster vote in favor of random guessing. The larger absolute value T W the stronger the advantage of random guessing.
Usually, to characterize the prediction ability of some method, two statistics are used: the hit rate , e N n n / + + ≈ !
In the Pari-mutuel Gambling method the concept of success is interpreted more broadly: success is counted as a correct prediction of target event or his absence (quiescence) in a given By (2), the gain + T W , associated with the "art of prediction" of target events, is limited for any bin size Δ : ( 3 ≤ for any of the three sub regions). Note, that any smooth score, based on ( τ , , − + + + n n ), also is stable as 0 → Δ , because any target event is a point object.
The situation with − T W is different. In this case, the numbers of successes − − n and failures + − n are increasing with 0 → Δ . However, in the random model the case of no target event occurrence in a small interval Δ is highly probable and therefore dividends from its prediction, according to (2), are limited: where λ is the rate of target events.
At the same time, the penalty for prediction error unlimitedly grows . As a result, the total gain in prediction of target event (i.e. successful prediction) and no target event (i.e. no alarm and no earthquake occurrence) is determined largely by the value of  Tables 3 and 4 of the paper by Taroni et al. (2016) Note that the generation of the space-time alarms is an intelligent essence of the prediction algorithms like CN or M8. And as we have shown, this essence is penalized at the highest degree by the PGS approach, because almost each bin of the alarm is interpreted as an error. Therefore, one can conclude that the results of the analysis of CN algorithm on the base of the Pari-mutuel Gambling score are irrelevant to assessing its prediction performance. The estimations of ± T W show that, under the condition τ λ << the statistic T W will provide a negative verdict about significance of any time prediction algorithm with arbitrary number of target events. To be clear, (3) means that the average time interval between target events is much larger than the time step of updating the alarm Δ , which is the case for CN algorithm.

Conclusions
A very limited amount of data is a serious obstacle for statistical analysis of CN prediction algorithm at the regional level of Italy. The attempt to replace the standard approaches by Parimutuel Gambling method leads to almost complete loss of information about predicted earthquakes, even for a large sample of target events. Therefore, the conclusions based on PGS, are untenable. As noted by Zhuang (2016, personal communication) "It seems to me that forecasting and betting should be separated". An in-depth discussion is provided in Molchan (2016) and, much earlier, in Molchan and Romashkova (2011).