A technical note on the bias in the estimation of the b-value and its uncertainty through the Least Squares technique

We investigate conceptually, analytically, and numerically the biases in the estimation of the b-value of the Gutenberg-Richter Law and of its uncertainty made through the least squares technique. The biases are introduced by the cumulation operation for the cumulative form of the Gutenberg-Richter Law, by the logarithmic transformation, and by the measurement errors on the magnitude. We find that the least squares technique, applied to the cumulative and binned form of the Gutenberg-Richter Law, produces strong bias in the b-value and its uncertainty, whose amplitudes depend on the size of the sample. Furthermore, the logarithmic transformation produces two different endemic bends in the Log(N) versus M curve. This means that this plot might produce fake significant departures from the Gutenberg-Richter Law. The effect of the measurement errors is negligible compared to those of cumulation operation and logarithmic transformation. The results obtained show that the least squares technique should never be used to determine the slope of the Gutenberg-Richter Law and its uncertainty. Mailing address: Dr. Laura Sandri, Istituto Nazionale di Geofisica e Vulcanologia, Sezione Bologna, Via Donato Creti 12, 40128 Bologna, Italy; e-mail: sandri@bo.ingv.it


Introduction
The Gutenberg-Richter Law (from now on GR Law) (Gutenberg and Richter, 1954) is certainly one of the most remarkable and ubiquitous features of seismicity worldwide.In the most common form it reads . (1.1) In the so-called binned form of the GR Law, N represents the number of events with magnitude M, while in the cumulative form N is the number of events with magnitude larger or equal to M. The constants a and b are the coefficients of the linear relationship.
As discussed in a previous paper (Marzocchi and Sandri, 2003, from now on MS03), the scientific relevance of the GR Law is linked to its apparent ubiquity, and to the theoretical implications and meaning of its possible universality.In particular, the value of b, representing the opposite of the slope of the linear relationship in eq.(1.1), is considered very important.In fact, in spite of a «first order» validity of the GR Law with a constant b-value ≈1 observed in a variety of tectonic settings, significant spatial and temporal variations in the b-value are found (e.g., Riedel et al., 2003;Del Pezzo et al., 2004;Legrand et al., 2004;Mandal et al., 2004;Ratchkovski et al., 2004;Wyss et al., 2004;Mur-ru et al., 2005;Schorlemmer et al., 2005).These possible variations are very important from a theoretical point of view, and for seismic hazard studies.We refer to our previous paper (MS03) for a deeper discussion on these issues.
As a matter of fact, a crucial aspect in inferring variations or constancy of the b-value is represented by the method used to determine the b-value and its uncertainty, given a seismic catalog, as shown in MS03.In that paper, we discussed the maximum likelihood method to estimate the b-value and its uncertainty, showing that incorrect formulae based on this method produce a bias in the b-value and an underestimation of its uncertainty.The latter implies that we might observe a fake variation in the b-value only because we have underestimated the uncertainty.Intentionally, in MS03 we did not consider the least squares (from now on, LS) method, because it had already been recognized (Page, 1968;Bender, 1983) that this technique applied to the problem of the estimation of the b-value and of its uncertainty does not have any statistical foundation.However, the LS technique is still widely employed to estimate these two quantities, both in the binned and in the cumulative form, also in recent literature (e.g., Working Group on California Earthquake Probabilities, 2003;Gruppo di Lavoro, 2004;Lopez Pineda and Rebollar, 2005).
The main purpose of this paper is to demonstrate, by means of conceptual issues, analitical formulations and numerical simulations, that the use of the LS technique in the estimation of the b-value and of its uncertainty leads to strongly biased estimates of these two quantities.For the correct estimation of these two quantities, we refer to the appropriate formulae given in MS03 and in references therein.

Estimation of b and σ b t t through the LS technique
The LS estimation consists of applying linear regression analysis to the GR Law described by eq.(1.1) in the binned or cumulative form.The parameter b should be the same in the two formulations, while the intercept a is different in the two cases.

Cumulative form of the GR Law
Despite its widespread adoption in past and recent papers (Pacheco and Sykes, 1992;Pacheco et al., 1992;Karnik and Klima, 1993;Okal and Kirby, 1995;Scholz, 1997;Triep and Sykes, 1997;Main, 2000;Working Group on California Earthquake Probabilities, 2003;Gruppo di Lavoro, 2004;Lopez Pineda and Rebollar, 2005), the use of LS technique to estimate the bvalue on the cumulative form of the GR Law does not have any statistical motivation.The strongest assumption of regression analysis, i.e., the independence of the observations, is clearly violated.In practice, the cumulative form is an integration and, therefore, it represents a filter for the high frequency noise.As a result, the uncertainty of the slope of the GR Law is certainly strongly underestimated.In order to evaluate the bias in the regression analysis applied to the cumulative form numerically, as a function of the number of earthquakes contained in the catalog, we simulated 1000 seismic catalogs, for different catalog sizes.The magnitudes M i are obtained by binning, with a fixed bin width of 0.1, a continuous random variable distributed with a probability density function (pdf) given by equation (2.1) which is valid if the catalog contains earthquakes with a completeness magnitude equal to Mmin and with a magnitude range of at least three units (see MS03).In this way, Mi is the magnitude attached to all the synthetic seismic events with real continuous magnitude in the range Mi−0.05≤M<Mi+0.05.
Then, for each synthetic catalog, we estimated the b-value (from now on, using the notation b t for the estimated value, and b for the theoretical value) by means of the LS technique applied to the cumulative GR Law described by eq.(1.1), i.e. when N is the number of events with magnitude larger or equal to Mi.
Figure 1 reports the averages of b t calculated in 1000 synthetic catalogs as a function of the number of data, for the cases b=1 and b=2.The 95% confidence interval is attached to each average.There is a clear negative bias, that decreases with the number of data and ranges from 5% to 2%.An explanation of this bias will be given in the following when we describe the effects of the logarithmic transformation in the regression analysis of the binned magnitudes.
As mentioned above, another crucial aspect concerns the estimation of the uncertainty attached to b t , here indicated by σ t b t and estimated as the average error on b t provided by the 1000  linear regressions.In particular, we are interested in evaluating if the estimation of σ t b t 2 is an un- biased estimator of the true variance of the b estimation of b t around its central value.This can be done by simulating 1000 seismic catalogs as described above.For each catalog we calculated b t and σ t b t .Then, we compared the dispersion of b t around its average with the average of σ t b t , through the Fisher test (e.g., Kalbfleisch, 1985) (2.2) The null hypothesis we tested is that the average of σ t b t 2 is equal to the variance of b t .
The results, reported in fig.2, are very interesting.In particular, the Fisher test shows that σ t b t strongly underestimates the dispersion of b, the former being at least one order of magnitude smaller than the latter.This confirms that the uncertainty on the b-value estimated through the cumulative LS method is strongly underestimated.
In general, the potential factors which can bias the estimations made through the LS technique (both in the cumulative and binned form) are the logarithmic transformation of the number of events, and the presence of the measurement errors on the magnitude.By taking into account the criticism at the regression analysis applied to the cumulative form just reported, in the following we studied the impact of these factors only on the estimation of the b-value made by means of the regression analysis applied to the binned form.

Effects of the logarithmic transformation
The expected number of events for a magnitude M is where n is the total number of earthquakes in the catalog, ∆M is the bin width used, and f(M) is given by eq.(2.1).The expected fluctuations around this value can be written as (2.4) In this frame, the observed number of events Ni for a binned magnitude Mi can be written as Ni= =νi(1 +ξi), where ξi is a random variable with zero mean and standard deviation equal to ∆Ni/νi.By taking into account eqs.((2.3) and (2.4)), we can write . (2.5) It is easy to demonstrate that ∆Ni/νi is a strictly monotonic decreasing function of pi.This means that the expected fluctuations of the variable ξ are larger for lower pi, that is for large magnitudes.
The estimation of b in the classical linear regression analysis is (see e.g., Draper and Smith, 1981) .
(2.6) If we substitute Ni=νi(1 +ξi) we obtain (2.7)If we apply the expectation operator to both sides of eq.(2.7), it is possible to verify that the bias is zero only To summarize, the application of the classical linear regression analysis to the logarithm of real data (therefore affected by errors) produces a systematic bias on the estimated slope.This is not due to the regression analysis (because the latter always provides an unbiased estimate), but it is related to its application to this type of data.
The logarithmic transformation has also another very important effect due to the discreteness of the dependent variable N. We have just shown that the dispersion of the data around the curve increases with magnitude.Since we cannot compute Log(Ni) if Ni=0, at large magnitudes, where ν is low (for example ν≤1), the LS technique can take into account only the «positive» fluctuations around the mean value, i.e. the fluctuations that increase the expected number of events.This acts as a filter for the «negative» fluctuations, that is at large magnitudes (where it can be computed) Log(Ni) tends to be overestimated compared to Log(νi).The global effect consists in the introduction of a systematic negative bias in the estimation of b, that is b t is always lower than b.Recalling that the amplitude of the dispersion around the curve is higher for small data sets, in these cases the bias should be larger.Figure 3 reports Log(ν i) (see eq. ( 2.3)) and the averages of Log(Ni) obtained by 1000 synthetic catalogs as a function of the magnitudes M i.For each magnitude Mi, the average of Log(Ni) is computed only on those catalogs for whom Ni≥ 1.At lower magnitudes, in the left part of the graph (to the left of the intersection point between the two lines in fig.3), Log(N i) tends to underestimate Log(νi).This bias, due to the second addendum in the right side of eq.(2.7), should lead to an overestimation of the bvalue.On the contrary, at large magnitudes (to the right of the intersection point between the two lines in fig.3), there is a turnover in the trend of Log(Ni) with respect to Log(νi).Here, Log(Ni) tends to be larger than Log(νi).As mentioned above, this effect is due to the filtering of the negative fluctuations of N i at large magnitudes performed by the logarithmic transformation of a discrete variable.The consequence is an underestimation of the b-value.The results reported in fig. 3 suggest that, when the larger magnitudes are taken into account, this last bias is always predominant.As argued above, this bias is reduced by increasing the size of the data set.Instead, if the larger magnitudes are excluded (e.g., Karnik and Klima, 1993), the plot Log(Ni) versus Mi shows only one bend, and the b-value is overestimated independently on the size of the catalog.
In other words, the global effect of the logarithmic transformation of a discrete variable (the number of seismic events) introduces two bends in the GR curve.It is noteworthy that these dis-crepancies from a straight line are not linked to any physical process, but only to the mathematical transformation.Similar considerations can be made by looking at fig. 4 that reports the estimations of the b-value as a function of the size of the data set.The bias is very strong for small data set and it decreases as the size of the data set increases.The bias is higher in the binned case compared to the cumulative one (compare figs. 4 and 1).This is due to the fact that in the cumulative regression there are fewer zero values in the dependent variable.As shown in fig.2, the uncertainty of the b-value made in the binned regression is still underestimated, even though it is better than the strong underestimations obtained by the cumulative regression.

Effects of the measurement errors
If we add a measurement error at the theoretical magnitude (M), we obtain the «real» magnitude as a sum of two independent random variables (2.8) where M is the earthquake magnitude devoid of measurement errors distributed with a pdf given by eq.(2.1), and ε simulates the measurement errors distributed as a Gaussian noise.Let us consider the number of the earthquakes with a binned magnitude M i, that is N(Mi).By adding a Gaussian noise to each magnitude as in eq.(2.8), some events go out of and some come into the considered bin, as shown in fig. 5.The final number is N .The number of data which go out of the bin is (see fig. 5) where mOL is the number of data which go into the adjacent left bin, mOR is the number of data that go into the adjacent right bin, and Pε is the probability that a randomly chosen event inside the bin goes out of the bin itself when a measurement error is added.Pε is independent from Mi and it depends on σε 2 (Pε→0 when σε 2 →0).
The number of data which come in from the left bin is (see fig. 5) The number of data which come in from the right bin is (see fig. 5) (2.11)Then, we have 1.1) holds, eq.(2.12) can be rewritten as .
(2.13)For b ≈ 1 and ∆M=0.1, as in most of the practical cases, the term of eq.(2.13) inside the round brackets is ≈ 0. In these case N(Mi)≈ N , therefore the calculation of b is only slightly affected by the measurement error.At the opposite, when ∆M is large and/or b>1, the term inside the round brackets cannot be neglected   Since Pε depends on the amplitude of the measurement error (σε), eq.(2.13) explains also the proportionality of the bias with σε noted in fig.6.As regards the estimation of the uncertainty of the b-value, the results reported in fig.7 show that the addition of the measurement errors does not produce any further bias.

Fig. 2 .
Fig. 2. F test values (see eq. (2.2)) for the case of figs. 1 and 4 (see text).The plus signs represent the cumulative regression, and the squares the binned regression.The dotted line represents the critical value to reject the null hypothesis at a significance level of 0.05.Note the logarithmic scale also on the vertical axis.

Fig. 1 .
Fig. 1.Average of b t (dashed line) calculated through cumulative LS technique in 1000 synthetic catalogs, as a function of the catalog size, for the cases b=1 (left) and b=2 (right).At each average is attached the 95% confidence interval.The solid line represents the true b-value.
Average of the square of the uncertainty Variance of the estimation of the value F b −.

Fig. 3 .
Fig.3.Logarithm of the expected number of earthquakes (Log(νi); see text) according to the Gutenberg-Richter Law (solid line) and average of the logarithm of the generated number of events in each of 1000 synthetic catalogs (dotted line), plotted as a function of the magnitude, for different catalog sizes (from top to bottom: 100, 1000, 10 000 events per catalog).The intersection between the solid and dotted lines represents the point where the bending changes in sign.

Fig. 4 .Fig. 5 .
Fig. 4. Average of b t (dashed line) calculated through binned LS technique in 1000 synthetic catalogs, as a func- tion of the catalog size, for the cases b=1 (left) and b=2 (right).At each average is attached the 95% confidence interval.The solid line represents the true b-value.
Figure 6 reports the effect of the added measurement errors.As argued before, the ( ) Mi O largest difference with the case without measurement errors reported in fig. 4 is for b=2, where the term of eq.(2.13) inside the round brackets is significantly different from zero.

Fig. 6 .
Fig. 6.Averages of b t (dashed lines) calculated through binned LS technique in 1000 synthetic catalogs as a function of the catalog size, for the cases b=1 (top) and b=2 (bottom).The magnitudes have measurement errors of standard deviation σε=0.1, 0.3, 0.5 (from left to right).At each average is attached the 95% confidence interval.The solid line represents the true b-value.

Fig. 7 .
Fig. 7. F test values (see eq. (2.2)) for the case of fig.6 (see text).The dotted line represents the critical value to reject the null hypothesis at a significance level of 0.05.