An algorithm for double difference joint hypocenter determination : application to the 2002 Molise ( Central Italy ) earthquake sequence

We have developed an original computer code for double difference hypocenter determination including an independent routine for the cross-correlation estimate of the time difference between two waveform segments, relative to different events recorded by the same stations. This computer code has been tested relocating a set of 26 events recorded by the seismic stations of the telemetered Italian national network operated by the INGV. The hypocentral solutions so obtained are characterized by standard deviations typically of the order of magnitude of 50 m, in comparison with the errors of a few kilometers characterizing the performance of the INGV bulletins. The proximity of hypocenters in several groups of events closely separated in time shows that our relocations are the result of an accurate analysis, rather than that of random errors. The method developed in this study is suitable for rapid and accurate hypocentral determination carried out by a permanent sparsely distributed network of stations, even before that mobile equipment installed in the area affected by new seismic activity allows higher resolution locations. Mailing address: Dr. Alessandra Giuntini, Istituto Nazionale di Geofisica e Vulcanologia, Via di Vigna Murata 605, 00143 Roma, Italy; e-mail: giuntini@ingv.it


Introduction
Hypocenter location is one of the main tools of seismological observation: its reliable estimate allows the investigation of seismogenic processes.It is well known that the precision of locations is controlled by several different factors, among which the accuracy of the arrival time reading of seismic waves, the geometric distribution of the seismograph network and, most important, by the knowledge of the propagation medium.
The purpose of this study is to develop and test a location algorithm capable of minimizing the influence of the velocity model used in the computation, and the influence of human errors in picking the arrival times of seismic waves.A velocity model is not available in detail because of the complicated crustal structure in some regions of the world, as in Italy.Luckily, the errors connected with the velocity model are systematic for the hypocenters that are closed spaced respect to each other, and the technique adopted in this study, the «Joint Hypocenter Determination» is based on this feature (Douglas, 1967).It improves the accuracy of the standard locations in areas of clustered seismicity, i.e.where the distance between the hypocenters is much shorter than that between the hypocenter centroid and the recording stations.In this case the propagation path between the sources and the stations can be assumed as if it were the same for all the events.The locations obtained by this method are characterized by a high relative accuracy, although the absolute location of the whole cluster remains uncertain, unless independent information allows the reliable estimate of the hypocenter coordinates for at least one event.In this case, we can perform the master event location.
In order to minimize the errors linked with the incorrect time reading of the seismograms, we adopt the cross-correlation technique between waveforms.This technique makes use only of time differences between arrival times for pairs of events at the same stations, rather than absolute time readings at single stations (Console and Di Giovambattista, 1987;Fehler et al., 2000;Waldhauser and Ellsworth, 2000;Rowe et al., 2002;Wolfe, 2002).The cross-correlation of digital waveforms allows a precision of the order of one sampling interval or even better.This method is called here as the «Double Difference Joint Hypocenter Determination» (DDJHD).
Because of the methodological nature of this work, in order to apply the DDJHD method, we have developed the algorithm independently of previous works, and written accordingly an entirely original computer code in Fortran language.

The location algorithm
Consider an event E i recorded by station Sj.The first arrival observed time can be written as where Hi, xi, yi, zi are the origin time and the rectangular hypocenter coordinates of the event, xj, yj are the coordinate of station Sj and f gives the travel time of the P-wave.Given a suitable guess of the hypocenter coordinates and assuming that the variations of the parameters are small enough, the non linear eq.(2.1) can be substituted by its first order linear expansion (2.2) In this equation we find four unknowns: the variations of the origin time and hypocenter coordinates of the event.εj is the station correction and in this case it is known or set to zero.If a suitable number of observations is available, it is possible to solve the eq.( 2.2) for a given event by the least squares method.The final solution is obtained by iteration of the process substituting the initial values by those obtained in the first trial and so on, until the new corrections for the parameters are small enough to be neglected.This method is applicable to any single event Ei, and for this reason it is known as the method of single event location.
In order to locate a spatially closed group of events, it is possible to make use of the JHD («Joint Hypocenter Determination») technique, by which every event is located respect to all the other events in the same group.The JHD procedure includes station corrections, which are station delays common to all the earthquakes and with this method we find the event locations and the values of all station corrections.This method offers the advantage of improving the accuracy of the relative locations, but the absolute locations remain uncertain.
In this work, following the Double-Difference method of Waldhauser and Ellsworth (2000), we have worked out a new procedure applying the JHD to the time differences between the seismograms of two events.
Consider an equation analog to (2.2) for a second earthquake E l, recorded by the same station Sj.By subtracting from the equation so obtained for El that for the event Ei, we obtain (2.3) Equation (2.3) is a linear expression of 8 unknowns.In this expression the station corrections are removed because we are considering two different events recorded at the same station.The left-hand side term contains an observed quantity: the difference of the arrival .
times.The right side term contains the guessed values of the origin and travel times and eight unknown quantities: the variations of the origin times and space coordinates for the two events.Similar equations can be written for each couple of events.Moving the guessed values of the origin and travel times from the right to the left hand side of the equations produces a double difference between the observed and guessed times.For this reason this method is called the method of the Double Difference Joint Hypocenter Determination (DDJHD).The set of equations can be solved by the least square method.It is not necessary that all the stations record all the events, but it is necessary that every pair of events is recorded by some common stations.Let be Me the number of events to be located and N the total number of observations.The eq. ( 2.3) can be written in the matrix form (2.4) where G is the coefficient matrix (derivatives of the travel times respect to the hypocentral coordinates), is the unknown vector, of length 4 * Me, and is the observation vector of length.
The least squares solution of the whole set of equations is given by (2.5) where G T denotes the transpose of G.
The solution of a set of equations as (2.4) may present some problems when the matrix G is singular or close to singularity.One example is given by the case when one or more events are recorded just by refracted waves, all from the same refractor in the plane layered medium.In this case all the derivatives of the travel times respect to the hypocentral depth have nearly the same value.Some problems of singularity may also occur when the observations are all constituted by direct waves, in the case that the station-epicenter distances are not markedly different.The problem is connected with the small linear dimensions of the source area with respect to the distance to the recording stations, Matrix representation of the linear equations, each of which is written for one single event(i)-event(l) couple at station(j), used in the joint hypocentral location (f, travel times; , time residual for each observation; j=1, S, station index; i=1, N, event index; S, total number of stations; N number of events).The last four rows express the constraint; in this case the fist one, at which is possible to give a weight, p, are the guessed coordinates of the master event., , ,  H x y z x y so that the take off angles of the rays from the hypocenters to the stations are very close to each other.To avoid this problem it is necessary to use a wide distribution of recording stations, so that the first arrivals are represented by both direct and refracted waves.Equation (2.4) is still intrinsically singular because the variation of one coordinate of all the events by the same amount does not affect the value of the right hand term (Console et al., 1992).We eliminate this problem by constraining the absolute location of the whole cluster of events, with the introduction of a piece of a priori information to the set of equations.Such information is given by the known location of a selected event (master event).A way by which the constraint is implemented can be that of introducing four new equations in (2.4) expressing the equality of the four unknowns to their guessed values (the coordinates of the master event).It is possible to give a weight to these four equations.This weight may be large enough so as the respective solution is affected very little by the other observations (fig.1).Another way to solve eq.(2.5) in case of singularity is using the minimum norm solution (simply eliminating zero singular values from the solution).Such solution would correspond to taking fixed the centre of mass of the hypocenter cluster.In the set of equations shown in fig. 1, we would have four rows, each of which representing a linear combination of the respective coordinate of all events.
The algorithm has been exploited in a Fortran computer code.Several tests have been carried out by synthetic data.These data have been produced assuming a given distribution of hypocenters and stations, a known velocity model, and introducing random variations simulating errors with a given standard deviation.These tests have shown the reliability of the algorithm and the stability of the results.A few eventual outliers, introduced for testing purpose, could be easily identified and eliminated from the data set.

The determination of time differences
As stated in the introduction, an important step to improve the location accuracy can be made by improving the precision by which the difference ∆T l,i, between the arrival times for a pair of events l,i at the same station Sj, is estimated.To this purpose, we use the cross-correlation of the respective waveforms.It is assumed that the value on the right hand side of eq. ( 2.3), is the time shift of one record with respect the other that yields the maximum of the correlation function in the time dominion.For the computation, the correlation function between to digital wave form segments is defined as When the two seismograms are identical R assumes the maximum value of 1, while the more different are the two wave forms, the closer it will be to the minimum value of zero.
The cross-correlation method relies on the similarity between seismograms of events that are spatially close (a basic rule for the applicability of the DDJHD); in fact we may expect that two earthquakes produce similar wave forms at the same station if their source mechanisms are virtually identical, and if their sources are located in such a way that their source-receiver paths cross the same crustal etherogeneities (Waldhauser and Ellsworth, 2000).Therefore, the correlation coefficient between the two seismograms represents itself a measure of the distance between the two events.

Description of the computer code
The outline of the computer code developed in this work is shown by the flow chart in fig. 2.
The code is designed so as during its execution the following options can be chosen: Velocity model -The choice of the velocity model used for the computation of the travel times can be done between a standard model and another model entered by the user.
Introduction of constraints -The coordinates of the event that constitutes the constraints can be entered either by an input file or by the keyboard.
Weight of the constraints -A given weight can be introduced to the constraints.In this case the user is asked for its magnitude.
Automatic execution -It is to be chosen whether the execution proceeds automatically or it stops at some particular points for the control of the operator.
As the set of equations is solved in rectangular coordinates, two subroutines are used for the conversion from geographical to rectangular coordinates (TRACOO), and back from rectangular to geographical coordinates (TRAINV).In the present application, for the conversion we have chosen as origin of the rectangular coordinate system a point internal to the area of the seismic sequence: 41.7°N and 14.9°E.
The main computer code makes calls to two subroutines.The first (DROMOC) calculates the theoretical time-differences (∆T i,j) Th making use itself of two subroutines, respectively DIRET for the direct waves and RIFRA for the refracted waves.The second subroutine (MQUADR) performs the solution of the overestimated set of equations by the least squares method.
The capital letters in the flow chart denote a single code block whose performance is explained in the following.
(A) Dealing with an iterative process, it is necessary to define the conditions by which the search for the minimum misfit is stopped at a given iteration.These conditions are: -the sum of the square of the residuals (difference between the theoretical (∆Ti,j) Th and the observed (∆Ti,j) is smaller than N ⋅ 0.0001 (where N is the total number of observations); -the sum of the square of the residuals makes repeated oscillations around a certain value; -the absolute value of the correction for all the coordinates is smaller than one tenth of the associated standard deviation; -the number of iterations exceeds 20.(B) In the case that some observations exhibit a residual larger than a given threshold, the location process can be repeated, either in automatic or interactive mode, removing such observations from the data set.
(C) At the end of the whole procedure, the final locations are obtained with all their statistical errors.

Seismological outline
We decided to test the method by its application to a set of data collected by the Italian A data set that appeared suitable for testing was represented by the numerous earthquakes (foreshocks, two mainshocks and aftershocks) occurred in the Molise region on October 31st, 2002 and the following days.It seemed interesting to show how an appropriate method of analysis could allow the retrieval of valuable information from the data collected by the permanent national seismological network, at the early phase of the sequence, before the mobile equipment could start the operation in the epicentral area.

Historical seismicity of the area
The site most affected by the Molise October-November 2002 earthquakes, the town of San Giuliano di Puglia (fig.3) belongs to an area of the Southern-Central Apennines characterized by nearly complete absence of recent seismicity, but surrounded by important seismogenic structures.They are recognized in the Gargano (60-100 km to the east), San Severo (30-40 km to the east), Foggia (50-80 km to the south-east), Benevento-Irpinia (40-80 km to the south), and Bojano Basin (40-50 km to the west) areas.These structures have produced historical earthquakes of large magnitude, whose effects have been felt in the area hit by the sequence considered in this study.The historical seismicity of the region is reported in fig. 3.
The information on historical seismicity comes from the Catalogo dei Forti Terremoti Italiani (CFTI3) (Boschi et al., 2000).For each event we consider the «equivalent magnitude» M e reported in the catalog.This magnitude is obtained from the distribution of the macroseismic intensity (Gasperini and Ferrari, 2000), and is considered roughly equivalent to the moment magnitude M W.
On 5 December 1456, the earthquake that is considered the strongest event (M e 7.1) of the last 1000 years in the region struck three different epicentral areas of the Southern Apennines at the same time, including that of the Bojano Basin.Another event (Me 6.6), whose epicenter was also located in the Bojano Basin, occurred in 1805.
Other important earthquakes reported in the catalogs are that of Benevento-Irpinia area (1125, M e 7), the San Severo earthquake of 30 July 1627 (Me 6.8), and the Gargano earthquakes of January 1646 (Me 6.1), and that of 1875 (Me 6.2).Lastly, another important earthquake that more recently affected this area was the November 23, 1980, Irpinia earthquake (M e 6.9).

The seismographic data set
On October 31, 2002, at 11:32 (UTC) a strong earthquake (Ml 5.4) hit a large zone of the border area between the Molise and Puglia regions.It had been preceded during the previous night by a small series of foreshocks of magnitude ranging between Ml 2.6 and 3.5, the first of which recorded at 01:25 (UTC).
After the mainshock, the National Seismological Network recorded during the same day of 31 October 2002 and on the morning of 1 November about 60 events of magnitude larger than Ml 2.2.At 16:08 (UTC) of 1 November another mainshock of Ml 5.3 occurred in the area.It was located about 12 km to the west of the previous mainshock.This earthquake was itself followed by many aftershocks, among which two events of magnitude larger than M l 4.0 during the same day.The earthquake series then decreased fairly steadily in time.
In this study we decided to analyze 26 events with magnitude MlF 2.9 between 31 October and 2 November 2002.Retrieving the waveforms collected in the INGV data base from the RSNC, we limited the selection to 21 stations located within 200 km from the epicentral area (fig.4); in this way, the first arrivals were represented by many Pg and some Pn phases.
For each event the following entries have been considered from the automatic INGV bul-    letin: origin time, preliminary location, magnitude and first onset arrival times for the various stations.The set of the 26 events analyzed is reported in table I and their epicenters are mapped in fig. 5.Each event has been identified by a label containing year, month, day, hour and minutes of its origin time with the following format: yymmddhhmm.For instance, 0210311033 mean the event occurred on October 31, 2002 at 10:33 UTC.

Application and results
As previously specified, in order to improve the location accuracy, this study used the crosscorrelation technique between the wave forms to improve the precision by which the differences ∆T l,i relative to pairs of events are determined.
The number of events analysed is 26 (table I) and the number of stations is 21.This gives a total number of waveform pairs equal to 26 * (26−1) * 210=11.550,but not all the events have been recorded by all the stations.Moreover, only the observations that exhibit a correlation coefficient larger than 0.7 have been considered.This implies that the events considered in the location process are close to each other (the ray paths are almost identical) because in this case their waveforms are similar, yielding large values of the largest correlation coefficient.The selection criteria are met by 3781 observations.Once the ∆T l,i are determined, the data are ready for our main problem: the location.Our preliminary tests showed that a number of observed ∆Tl,i are affected by large errors that denote them as outliers.In this respect, by deleting the observations characterized by a difference larger than 1.5 s from the average of the same pair of events, we obtain a faster and more stable convergence towards the least square solution.This further selection criterion produced the deletion of 181 observed ∆Tl,i.
Considering that the depth and epicentral coordinates of the master event influence the location of all the others, particular attention has been paid to its choice.In our case the master event was chosen among those located by the temporary network deployed by the INGV almost immediately after the beginning of the aftershock series.The density of the INGV temporary network allows the computation of hypocentral coordinates (depth included) with errors of the order of few hundreds of meters, for events inside the area covered by the network itself.
In light of these considerations the master event was chosen as the earthquake of October 31, 2002 at 23:56 (UTC), whose hypocentral coordinates are reported in table II.
The optional inputs used in the execution of the location program are: -a three layer velocity model (table III) for the computation of the travel times; -automatic removing of the outliers; -observations with residuals larger than twice the standard deviation are removed at every iteration (ISIGMA =2); -threshold residual equal to 0.15 s (the run is stopped if all the residuals are smaller than 0.15 s).
Analyzing the output file of the program we noted that observations exhibiting large residu-als (outliers) were removed in the first cycles.We noted also that since these first cycles the solutions are stable until the final conditions are met.The final solutions were obtained with lit-   The results are reported in table IV and the epicentral map is plotted in figs.6 and 7. Figure 8 shows the east-west vertical cross-section of the hypocenters.Of course, the standard deviations of the hypocentral coordinates for the master event are negligible, as they depend only on the weight given to the constraints.

Discussion and conclusions
Some features of the results allow us to evaluate the quality of the method.First, the geometrical spread noticeable in the automatic locations has been considerably reduced in our relocations.In particular, the large separation to the north-east of the epicenters of the events 3 and 6 (supposedly mislocated) disappeared.Figure 8 shows a direct comparison between the epicenters initially determined by the automatic procedures and their DDJHD relocations.It is evident that the relocation process has significantly reduced the geometrical spreading of the epicenters.
Looking at the epicentral maps of the relocated earthquakes (fig.6) it can be noted that some groups of events occur close to each other both in time and space.This is the case, for instance, of the triplets 1, 2 and 3 (foreshocks of the first mainshock) and 19, 20 and 21, and the doublet 10 and 11.They clearly denote a trend of the seismicity for clustering in families characterized by tight physical connection.
Another feature easily recognizable in the relocation is the separation of the epicenters in two groups (respectively to the west and to the east of longitude 14.88) having about the same number of earthquakes, and a general trend for the migration of the events from east to west.In fact, all the first 13 events but one belong to the eastern group (foreshocks and aftershocks of the first mainshock), and all the second 13 but one belong to the western group.The second mainshock (event 14) belongs to the western group.Event 12 can be considered a foreshock of the second mainshock.The average depth of the western group appears slightly smaller than that of the eastern one.
All these regularities denote that the timespace pattern of the relocations is not an effect of random errors.The only event whose location appears anomalous respect to the others is the second mainshock of 1 November (#14).The reason could be found in the difficulty of computing correct ∆T i,j time differences, due to the complexity of the waveforms recorded at the various stations for this earthquake.
It must also be noted that the relocations, obtained in this study using an algorithm recently developed, could in principle have been carried out just during the occurrence of the seismic series nearly in real time, should the computer tool have been ready at that time.This means that the data from the permanent national seismic network, processed by a suitable algorithm, are capable of achieving relevant geophysical information on the seismogenic structures activated at the early stage of a seismic sequence.This could provide seismologists with a clearer picture of the seismogenic phenomenon from the events (that are the most numerous and important) which occurred before the exploitation of mobile stations in the area, even during the first hours of activity.
Fig. 1.Matrix representation of the linear equations, each of which is written for one single event(i)-event(l) couple at station(j), used in the joint hypocentral location (f, travel times;, time residual for each observation; j=1, S, station index; i=1, N, event index; S, total number of stations; N number of events).The last four rows express the constraint; in this case the fist one, at which is possible to give a weight, p, are the guessed coordinates of the master event.x y z t master master master master , , ,

Fig. 2 .
Fig. 2. Flow chart of the computer code.

Fig. 3 .
Fig. 3. Map of the historical earthquakes that occurred in the study earea.

Fig. 4 .
Fig. 4. Map of the stations of the National Seismological Network used in this study.

Fig. 5 .
Fig. 5. Epicentral map of the events reported by the automatic INGV bulletin.The two mainshocks (#4 and #14) are denoted by stars.

Fig. 6 .
Fig. 6.Epicentral map of the 26 events relocated in this study.The origin of the rectangular coordinates is the point of geographical coordinates 14.90°N 41.67°E.The two mainshocks (#4 and #14) are denoted by stars.The event used as reference for the location (#10) is denoted by a cross.

Fig. 8 .
Fig. 8. East-west vertical cross-section of the hypocenters relocated in this study; Z is positive up.

Fig. 7 .
Fig. 7. Comparison between the locations of the events provided by the automatic seismic bulletin (crosses) and the locations obtained in this study (stars).

Table I .
Focal parameters of the 26 events located by the automatic INGV bulletin.The parameters of the two mainshocks are identified by bold phase.

Table II .
Hypocentral coordinates of the earthquake used as master event, obtained by the mobile seismic network.

Table III .
Velocity model used by the location algorithm.

Table IV .
Focal parameters of the 26 events located in this study.The parameters of the two mainshocks are identified by bold phase.