Reducing the subjectivity of intensity estimates : the Fuzzy Set approach

We describe a method for the encoding and the computer analysis of the macroseismic effects deduced from historical sources allowing the complete formalization of the process of seismic intensity assessment. It makes use of a multi-criteria decisions-support algorithm, based on the theory of the Fuzzy Sets. Analyzing the texts of the available sources for the 1919 Mugello (Ms = 6.2) and 1920 Garfagnana (Ms = 6.5) earthquakes, the observed effects are c1assified independently of any macroseismic scale. Each sentence reported on the sources is «decomposed» into five syntactic elementary components and represented by a set of aIphanumeric codes for further processing by computer codes. This retains the maximum adherence to the originaI sources and avoids forced interpretations and losses of information due to the need to fit each observed effect to a description of the scale. Moreover, this scheme also allows to gather equivalent effects by reassigning them the same code, and using this new c1assification in further processing. This procedure could even be useful to define a new macroseisJTIic scale on the basis of a statistical analysis of different effect occurrences.


Introduction
Intensity scales were originalIy compiled to c1assify the effects of earthquakes by direct observation and had not realIy conformed to an efficient usage with written sources.What is more they had been formulated and improved without a statistical analysis of real data but only on the basis of a qualitative comparison of some frequent effects based on expert experience.For these reasons, much of the information available in documentary sources cannot be used Mailing address: Dr. Gianfranco Vannucci, Centro di Studi per la Geologia dell'Appennino e delle Catene Perimediterranee (CNR), Via La Pira 4, 50120 Firenze, Italy; e-mai!: gfranco@ibogfs.df.unibo.it to assess the intensity degree and is actually ignored.Moreover, the intensity assessment in-c1udes subjective choices based not only on the definitions of the scale but also on implicit1y assumed criteri a (not explicit1y defined by the scale) which depend on the personal experiences and beliefs of the investigators.Thus, the same framework of effects may be different1y evaluated in terms of intensity by different investigators.
In a recent work, of which this contribution represents an abridged version, Vannucci et al. (1999) propose formalizing the intensity assessment procedure by a computer algorithm able to trace the successive steps of the intensity assignment processo The aims were both to use alI the information available on sources and to reduce the subjectivity of intensity estimates, thus giving macroseismic experts a tool to improve the comprehension of their own decisional processes.
Since experience with historical sources has demonstrated that the association of the earth-Table II.Encoding phase: sample of the correspondence lists between phrase component and the two-character codes.Note that code «O1» indicates the absence of the corresponding phrase component except for the «Predicate» column which must always be present (otherwise the sentence does not make any sense).

Quantifier
Object «houses» (code 62) can be made equivalent to «buildings» (code 63) or the predicate «to break» (code 41) can be made equivalent to «to crack» (code 42).The re-encoding can also be done using combinations of codes belonging to different columns for example: «railway tracks» (object/subject) «bent» (predicate) can be made equivalent to «railway line» (object/subject) «c1osed» (predicate).After these «re-encoding rules» are compiled, a computer program automatically makes the changes and builds a new database of observed effects.The main advantage of this procedure is that all researchers can apply their own «rules» and change them at will without modifying the originaI database.
In the selection step all the effects that are rarely observed are discarded and not processed further.In the following computation only the effects with at least five occurrences at different sites will be considered, but this threshold could even be increased to improve the reliability of the results.This selection guarantees that the computed empirical membership functions are less biased by possible anomalous cases and also reduces the danger of «overfit» (see discussion below).
In the intensity evaluation step, the MCDM algorithm is applied and the «fuzzy» intensity is computed at each site.This point can be repeated independently by using different membership schemes.The algorithm also admits the simultaneous use of multiple memberships and weighting schemes thus allowing the combination of different criteri a based on the beliefs of different macroseismic experts.This could be useful in particularly debated cases to establish a «consensus» intensity estimate taking into account all of the different opinions.
A graphical sketch on how the intensity assessment procedure works is shown in fig. 1 where the shapes of the empirical membership functions of the effects of the Garfagnana earthquake observed in Florence are reproduced.In fig. 2 the «aggregate» decision function is obtained by taking, for every intensity, the minimum membership value among all of the functions.The intensity degree chosen by the algorithm is the onecorresponding to the maximum of the decision function.

Results and discussion
To check the efficiency and the reliability of the methodology to reproduce the macroseismie expert's decisions a number of statistical estimators can be computed.These are the coeJficient oJ variation R2 of the regression of the intensity estimated by the expert with the fuzzy intensity and the average absolute dijference r'b' between the expert (lE) and the fuzzy intensity where N'Olal is the total number of evaluated 10calities.The average difference r between expert and fuzzy intensities which indicates the «offset» between the expert and fuzzy intensity estimates is also computed N\otal

L,(IE -IF) r=_l _ N'O'al
Furthermore, in order to evaluate the overall ability of the algorithm to determine an intensity value, table III also reports the number of univocal intensity determinations (N single), and the number of intensity determination (N multi-pIe) which are uncertain between two or more grades.
The first two rows of table III refer to the case when the intensity is computed using the empirical membership functions and weights estimated from the data of the same event.For both sets the small values of r and the high R2 indicate that the algorithm satisfactorily reproduces the expert intensities (within half of a degree on average).The values of r, which are positive for the Garfagnana event and slightly negative for the Mugello one, correspond to an underestimation of the «fuzzy» intensity with respect to the expert for the former earthquake and an overestimation for the latter.These differences, which nevertheless lie both largely below the average residuals, might be caused by different frequencies of various intensities for the two earthquakes.
The third and fourth rows concem, instead, the case when the fuzzy membership function and weights are computed using the data of both earthquakes put together.We can see that the scores do not vary very much with respect to the previous ones but a slight improvement of the fit for the Garfagnana earthquake and a worsening for the Mugello event can be noted.The average difference r confirms the tendency of the fuzzy algorithm to overestimate the Mugello intensities while it shows an almost perfect coincidence (on average) of the two estimates for the Garfagnana data.
Since in previous computations the data of each earthquake are used to determine the membership functions and weights used for the same event, it is possible that the good agreement might be due to overfit.This would mean that the algorithm fitted not only the average tendencies of the data but also their statistical fluctuations.To test this hypothesis, in the fifth and sixth rows we can see the results when for each earthquake, the membership functions and weights are derived from the data of the other event.The marked decrease of the R2 and the increase of r,h, (especially for the Mugello earthquake) clearly shows that some overfit is certainly present in previous computations.However, the fit remains quite acceptable for both earthquakes notwithstanding the complete independence of the leaming and testing sets.The opposite signs of r for the two events confirm the tendency indicated by previous cases with a remarkable increase in the offsets (still well below the average absolute deviations).A possible explanation of this behavior could be that, even in presence of similar effects, the expert had been more confident to assign higher degrees in the framework of a strong earthquake like the Garfagnana event rather than the weaker Mugello one.

Conclusions
The method of analysis of macroseismic effects described here shows in detail the process of intensity assessment that, in many cases, is followed by the macroseismic expert without a trace of the assumptions made and then sometimes could not be reproducible even by the same expert.In particular, the reliability of the different sources and the weights of the different effects, established by historical seismologists, can be taken into account explicitly.
This approach can be useful to reduce the arbitrariness of the intensity assignment process, and could actually cancel alI possible sources of mistakes as far as the encoded data correctly interpret the text.It may even be a useful support tool for the macroseismic expert himself who, from the comparison with the algorithm results, can improve the understanding of his own choices and decisional processes (sometimes not fully rationalized).
The ability of the multi-criteria decisionmaking algorithm to combine different membership and weighting schemes deterrnined from the intensities assigned by different experts, could be useful to obtain «objective» estimates in debated cases.
However the large amount of work needed to encode the source information in a way suitable to be analyzed by the computer algorithm still prevents a broad application of this method for example to alI the earthquakes of the Catalog oi Strong Italian Earthquakes (CFfI3).In the future, the availability of texts of the sources on computer media (as in CFTI3), together with the development of computer aided techniques for the automatic encoding of texts could significantly speed up the procedure and lead to a substantial improvement of the database.Even the application of the algorithm to the data (already encoded) of the Italian Macroseismic Bulletin of the Istituto Nazionale di Geofisica (ING) could be very interesting to integrate these data in a unified macroseismic database.
This methodology, being independent of a particular macroseismic scale, could be used, with a large enough database, to define the characteristics of a new macroseismic scale more appropriate to historical testimonies.

Fig. 1 .Fig. 2 .
Fig. 1.Shapes of the empirical membership functions of the effects observed in the town of Florence for the 1920 Garfagnana earthquake.

Table III .
Evaluation of the ability of the algorithm to reproduce the expert's intensities for different empirical membership functions and weights.