Advances on the automatic estimation of the P-wave onset time

This work describes the automatic picking of the P-phase arrivals of the 3*106 seismic registers originated during the TOMO-ETNA experiment. Air-gun shots produced by the vessel “Sarmiento de Gamboa” and contemporary passive seismicity occurring in the island are recorded by a dense network of stations deployed for the experiment. In such scenario, automatic processing is needed given: (i) the enormous amount of data, (ii) the low signal-to-noise ratio of many of the available registers and, (iii) the accuracy needed for the velocity tomography resulting from the experiment. A preliminary processing is performed with the records obtained from all stations. Raw data formats from the different types of stations are unified, eliminating defective records and reducing noise through filtering in the band of interest for the phase picking. The advanced multiband picking algorithm (AMPA) is then used to process the big database obtained and determine the travel times of the seismic phases. The approach of AMPA, based on frequency multiband denoising and enhancement of expected arrivals through optimum detectors, is detailed together with its calibration and quality assessment procedure. Examples of its usage for active and passive seismic events are presented.


Introduction
The picking of the seismic wave arrival can be defined as the detection of the instant of time when the first energy of the phase (the longitudinal P-phase) arrives at a seismometer.Such arrival is often identified by a change from the background noise, in energy, amplitude, frequency contents or wave polarization.For seismic signals containing also transversal waves (Sphases), picking the S-wave onset is more challenging because of its contamination by the P coda and other converted phases.Signals will have diverse, often low, signal-to-noise ratios (SNR) due to the seismic noise generated in their path towards the station and the low density of the materials they have to travel across.In fact, often the most relevant information about Earth structure will precisely be supplied by those waves traversing attenuative internal structures with a consequent low SNR.In volcanic environments seismic signals are often contaminated by both effects, noise and other type of volcanic events such as volcanic tremor, providing signals with high complexity [e.g.Chouet 1996Chouet , 2003;;Alguacil et al. 1999;Ibáñez et al. 2000Ibáñez et al. , 2003Ibáñez et al. , 2008;;Martínez-Arévalo et al. 2003;Almendros et al. 2007;Chouet and Matoza 2013].Taking into account seismic attenuation, the first arrival of the signal changes to a more emergent onset [e.g.Bianco et al. 1999;Martínez-Arévalo et al. 2005;Prudencio et al. 2015aPrudencio et al. , 2015bPrudencio et al. , 2015c]].
Automatic real-time processing of the seismic phases and noise reduction permits outperforming human analysis in terms of precision and consistency of the travel time estimations.Many approaches for automatic picking have been proposed in time and frequency domains like: the pioneer classical picker of Allen based on STA/LTA [Allen 1978] improved by [Baer and Kradolfer 1987], approaches based on higher order statistics like kurtosis or skewness [Saragiotis et al. 2002;Küperkoch et al. 2010;Langet et al. 2014], picking based on the waveforms predominant period [Hildyard et al. 2008], or wavelet analysis [Zhang et al. 2003] and multiband frequency analysis [Álvarez et al. 2013] as examples in the frequency domain.Autoregressive techniques based on the Akaike information criterion (AIC) like Takanami and Kitagawa [1988], pattern recognition systems like artificial neural networks (ANN) [Gentili and Michelini 2006] and polarization analysis (specially for the S-wave picking) have also been successfully implemented [Reading et al. 2001;Kurzon et al. 2014;Ross et al. 2014].Latest tendencies in automatic picking opt for combinations of several of the refereed techniques, selected according to specific needs and data available.Iterative, tandem or parallel approaches like, e.g., Nippress et al. [2010], Küperkoch et al. [2012], Álvarez et al. [2013] and Ross and Ben-Zion [2014], are often jointly applied to combine complementary strengthens.
Accurate picking of P and S phases constitutes the most important step for earthquake location, tomographic studies, and any further understanding of the Earth's crustal and upper mantle structure.In that scenario, the aim of the TOMO-ETNA experiment [Coltelli et al. 2016, in this volume;Ibáñez et al. 2016aIbáñez et al. , 2016b, in this volume] , in this volume] is to enlighten the internal structure of Mt.Etna volcano performing a seismic velocity tomography of the Island of Sicily [Díaz-Moreno et al. 2016, in this volume].In order to do so, a large seismic database combining active seismicity (air-gun shots) generated by the Spanish oceanographic vessel "Sarmiento de Gamboa" and the passive seismicity (earthquakes) occurring at the time of the experiment in the island [ Barberi et al. 2016, in this volume] has been registered by a dense seismic network of more than 200 stations of different types.Travel times of such active and passive phase arrivals, used as input for the tomography [Díaz-Moreno et al. 2016, in this volume], must be detected with the highest possible precision.An automated highly accurate picking of the phase arrivals is therefore mandatory: besides the demand of capacity to process the massive dataset produced, signals presenting low SNR after traversing attenuative regions are of special interest.
In the present manuscript we describe the signal processing involved in the automatic picking of 3•10 6 seismic arrivals registered during the TOMO-ETNA experiment.First, a description of the active and passive sources and seismic stations deployed for the experiment will be done in Section 2. Further we will detail the pre-processing of the waveforms registered before applying them the automatic picking algorithm AMPA [Álvarez et al. 2013] described in Section 4. Finally we will present our current working lines to improve the accuracy of the automatic picking performed and to increment the number of registers usable for the tomography.

Seismic stations and sources
Travel times of active and passive phases occurred during the experiment have to be automatically picked and combined as input for the seismic velocity tomography of the island resulting from the TOMO-ETNA experiment [Díaz-Moreno et al. 2016, in this volume].Benefits of both types of sources are combined.Active seismicity provides a large amount of data in a short period of time with a homogeneous and optimal distribution of sources and seismic stations.Meanwhile passive seismicity provides deeper structure information.
Active and passive seismicity occurred during the experiment has been registered by the dense seismic network plotted in Figure 1 (see Ibáñez et al. [2016aIbáñez et al. [ , 2016b] ] for details).A total of 267 seismic stations were used in this experiment.The distribution of this network is: 90 short period portable stations (CUBE; Figure 1, yellow dots), 17 broadband stations (BB; Figure 1, orange dots), 70 permanent stations provided by the Istituto Nazionale di Geofisica e Vulcanologia (INGV; Figure 1, red dots), and finally a total to 27 ocean bottom seismometers (OBS; Figure 1, green dots) were deployed.
The oceanographic vessel "Sarmiento de Gamboa" generated more than 26,200 air-gun shots along approximately 2560 km long (Figure 1, red lines).The source of acoustic pulses consisted of an array of two batteries of air-guns, each having 8 guns (Sercel, G-GUN II), see Coltelli et al. [2016, in this volume] and Ibáñez et al. [2016b, in this volume] for details.In Figure 2 we shows a simulation of frequency contents of the signature of the source signal produced by the air-guns, for calibration purposes, using the GUNDALF array modeling program [Gundalf 2012] particularized for the experiment configuration taking into account all air-gun interactions.Figure 3a shows a series of real air-gun shots registered by an OBS. Figure 3b depicts a shot recorded in time and frequency domains, being remarkable its similarity to the calibration simulation previously shown.
The quality of the records varies according to several facts such as the distance from the vessel to the seismic station, the nature of the structures, and the noise level.Figure 4 gives an example of different qualities of signals in order to indicate the degrees of difficulty to perform the picking of their onset.On its central and right panels it shows the records of a series of 100 airgun shots generated by the vessel while following the trajectory parallel to the shore depicted with a blue line in the left panel (from south-west to north-east).Seismic waves were recorded by four seismic stations are plotted.P-phase arrivals are expected to be in the timewindow that range from a theoretical maximum velocity of 8000 m/s to a minimum value of 2000 m/s (both plotted using pink lines).The yellow line depicted plots the theoretical arrival of the water wave travelling at 1500 m/s.Plots show that distance to the station is a determinant factor.For each seismometer, arrivals are more clearly detected when the vessel gets closer to it.Comparing stations with a first visual approximation, E-64 (closest to the shore) has the clearer phase arrivals, while stations E-93 and E-95 recorded more noisy signals with no clear first onset arrivals.
A total of 452 earthquakes, described in depth in   these 452 earthquakes were manually picked in several stations of the INGV permanent network and located.Such catalogue with locations, magnitudes, and P travel times of the earthquakes, has been used for comparing manual and automatic processing.

Signal pre-processing
Before estimating arrival times, the huge amount of continuous raw seismograms are automatically processed to improve their SNR and delimit time windows for phase picking.Firstly the four different formats of raw data corresponding to the four types of stations are converted into a unique readable format on which automatic processing will be applied.In this process, faulty registers with lack of GPS, extremely bad SNRs, abrupt offset in amplitude and other errors, are detected and discarded.Secondly, continuous registers are segmented into windows of analysis around the expected arrival times based on the catalogue of shooting times for the active database, and the earthquake catalogue for the passive data.Thirdly, band-pass filtering in the band of interest is done.All the stations used in this experiment allow the recording of the three components, east-west (X), north-south (Y) and vertical component (Z). Figure 6  According to the Fermat's principle, the vertical component is the one providing the best information of the wave arrival.Following this criterion (that can be observed in Figure 6), the signals registered in the vertical component (Z) have been used in this work for estimating the wave arrival associated to the air-gun AUTOMATIC P-PHASE PICKING  shot or earthquake.Most of the signals registered in the different stations used in the experiment have a low SNR.They are affected by strong background noise and/or non-stationary noise processes.For example, the three components in Figure 6 present low SNR, with a strong noisy artifact centered at about 18 Hz around 20 seconds of time.Taking into account de band of activity of the events we want to pick, and in order to emphasize only P-wave arrivals both for airgun shots and earthquakes, all the Z-components have been filtered using a band-pass linear phase filter in the band of 4 to 12 Hz.The group delay was compensated accordingly.This type of filter was opportunely chosen, based on previous simulations on which zerophase filters demonstrated a worse behavior in the presence of an impulsive arrival.Left panel of Figure 7 shows the recording of the vertical component in the same site and date than Figure 6, for the first 80 shots.Right panel shows the signals after filtering.Improve-ment in terms of SNR is visible.P-wave arrivals are difficult to identify in unfiltered signals, while they become clearly observed in the ones filtered.

The algorithm
The automatic P-phase picking algorithm used to calculate the travel times of the TOMO-ETNA database, named AMPA, is fully detailed in [Álvarez et al. 2013].In such work, the finest picking is obtained with AMPA working in tandem with the autoregressive techniques applied by Takanami and Kitagawa [1988].Due to the ineffective computational cost of the tandem approach for the TOMO-ETNA database, plain AMPA has been used.The picking algorithm is also available in an open-source toolkit with a user-friendly windows environment described and downloadable in [Romero et al. 2016].AMPA, standing for adaptive multiband  picking algorithm, is a picking strategy focused on determining the P-phase arrival time for signals strongly affected by a low SNR or processes involving non-stationary noises.In order to do so, AMPA performs two steps with the signal to be picked: (i) Multiband envelope detection and noise reduction, eliminating contributions below band-dependent envelope thresholds.
(ii) Enhancement of signal envelopes with durations corresponding to P-phase arrivals.
Given the need of reliability for certain picking applications like high-resolution tomographies, the travel times determined by AMPA have quality assessments to permit different trade-offs between high picking rates and quality of the picking, depending on the specific needs of the application.
Figure 8a pictures the path for step (i) of the AMPA picking process, denoting S v (n) the filtered window of signal (see Section 3), on which a P-phase arrival is expected.S v (n) is first divided into k bands of analysis through band-pass filtering.For each subband, envelope is detected (referred as x i , i=1,..., k for each of k bands), and a threshold quantization is applied to it.Right panel of Figure 8a shows an example on which six subband envelopes (z1, ...,z6) are plotted after threshold quantization.Envelope samples with amplitude smaller than a certain threshold are quantized to value 1.The final signal of this first AMPA step, named lztot, is the addition of all the bands' de-noised envelopes depicted in the lowest graph of right panel in Figure 8a.
Figure 8b shows the step of detection and enhancement of P-phase arrivals.The addition of all subband de-noised envelopes (lztot) is passed through a set of enhancing filters designed following the principles of communication theory for an optimum detector.Its impulse response should be the time reversal of the expected signal (see Figure 9, panel (a)).In order to increase the punishment for emerging noises, the impulse response is supplemented with a portion of negative response, resulting in the final impulse response h b (n) depicted in Figure 8, panel (b).Envelopes with a sharp rise of energy and slow decay corresponding to phase arrivals will be enhanced, while other envelope patterns will be minimized.The length of the window of the enhancement filter, L, is an important design parameter as the filter will enhance events with duration in the order of L. The duration of the event-related phase will be related to many factors like its magnitude, the signal propagation attenuation, the type of seismic source, etc.In order to preserve the detection of arrivals with different durations, different lengths L for the enhancement filter are considered.Figure 7 shows how filters with different lengths are used in parallel (with a sub-sequent half-wave rectifier to eliminate resulting negative amplitudes), defining the final characteristic function (CF in the figure) as the product of the analysis performed on each of them.
CF, the resulting output after denoising the multiband envelopes and enhancing impulsive arrivals with soft decays for lengths similar to the ones expected, will be used to find the picking time.Figure 8 shows three different CF outputs ((d), (f ) and (h)) for an example earthquake (c), an emerging noise (e) and an impulsive noise (g).Impulsive and emerging noises are eliminated thanks to the optimum detector.Finally, the P-phase arrival time will be detected as the instant of time when CF takes the maximum amplitude.

Calibration of AMPA
Several parameters of the algorithm must be calibrated depending on the events to be picked.The lower and upper frequencies of the band of analysis and the number of subbands (referred to as k along this section) need to be set.In the case of TOMO-ETNA the band (4 -12 Hz), used also in the pre-filtering stage, has been used both for active and passive seismicity.Also, the number of optimum filters and their lengths should be set depending on the lengths of events expected.Three filters with respective lengths of 200, 100 and 50 samples (for a sampling frequency of 100 Hz) have been used for the task described after experimental calibration.

Quality assessment
When arrival times are required for high resolution tomographies, the number of erroneous picks has to be as small as possible since residuals of outliers in the post-piking detection must be of the same order than residuals due to velocity anomalies [Nippress et al. 2010].AMPA makes 2 automatic quality assessments of the picking to define a range of reliability of the results needed depending on the specific application.Firstly, the amplitude of the characteristic function CF defined in this section (noted as Z max in Figure 9) will give an assessment of similarity to a reference impulsive arrival with slow decay and a defined length.Secondly an SNR in the surroundings of the arrival time picked is also assessed.Figure 10 shows the P-phase picking for an earthquake artificially contaminated with additive white Gaussian noise with different SNRs.On each panel, CF is plotted below the event.Picking time (T AMPA ) corresponds to the maximum amplitude of CF, named Z max  in the figure.It can be observed that parameter Z max decreases slightly as the SNR decreases sharply in subsequent subfigures, perceiving a robust behavior against noise in all of them.

Examples of automatic picking
Right panel of Figure 11 depicts the P-phase arrival of a series of 300 air-gun shots produced the 1st of July by the "Sarmiento de Gamboa".The shots were registered by the CUBE station E-53 located close to the shore ahead of Mt.Etna as indicated in the left panel of the figure.Waves registered are plotted in blue, overlapping a green dot to indicate the arrival times automatically picked by AMPA as described in Section 4.1.AMPA is applied to a window-frame of the waveform defined considering potential arrivals travelling at speed between 8000 m/s and 2000 m/s (see Figure 11, the pink marks representing the windows of analysis).The yellow line depicted represents the theoretical arrival of the water wave travelling at 1500 m/s.The consistency of both the picking (green line) and the vessel movement (yellow line), when moving away and approaching the shore, indicates reasonable P-phase arrival detections.
Figures 12 and 13 show the automatic P-phase picking for the example earthquakes of magnitude 3.7 and 0.6 depicted in Figures 5a and 5b.After processing the registers, applying AMPA and setting the minimum threshold of quality described above, the final subset of the registers and picking times usable for the tomography are plotted.For the earthquake of magnitude 3.7 registered in 78 stations in Figure 5a, 65 of the total initial 78 registers (see Figure 12) match the picking quality criteria and would be used as input for the tomography.For the earthquake of magnitude 0.6 shown in Figure 5b, only 15 out of the original 67 registers match the picking quality criteria and would be selected as input for the tomography.They are plot in Figure 13. Figure 14 shows a detailed zoom of the picking for a few arrivals of both example earthquakes.
With the aim of validating the automatic picking proposed, Figure 15 shows the difference in seconds among the arrival times picked by AMPA (T AMPA ), and by human experts from INGV (T MANUAL ) for a set of 3000 randomly selected earthquakes occurred during AUTOMATIC P-PHASE PICKING   the TOMO-ETNA experiment.The difference in picking time, calculated as T AMPA − T MANUAL , is in the range of hundredths of seconds for most of the pickings.
Compared to other automatic procedures, AMPA [Álvarez et al. 2013[Álvarez et al. , Romero et al. 2016] is an automatic P-phase picking method with a relative low computational cost, no complex configuration and effective performance in noisy environments.In Álvarez et al. [2013] AMPA is contrasted to the classical STA/LTA, alone and in combination with the autoregressive method proposed by Takanami and Kitagawa [1988].In such work, demanding P-phase picking tasks are performed using a database of natural earthquakes and computergenerated ones contaminated with severe noise conditions with positive results.AMPA outperforms STA/LTA in all cases.When combining AMPA with Takanami's autoregressive method both techniques are improved.They show complementary strengths that overcome the limitations of autoregressive methods adding accuracy to the original AMPA.

Ongoing work
Several improvements are under analysis to increase the accuracy of the picking and therefore augment the number of registers usable for the velocity tomography.Noise reduction techniques based on wavelet analysis are being considered.Also polarization analysis, especially to detect S-phase arrivals, is needed to take more advantage of the rich passive seismicity of the region.In addition, usage of information from neighbor stations is needed to make a joint analysis searching for coherent behaviors in noisy registers.
As a parallel approach, automatic recognition systems for seismic events [e.g.Benítez et al. 2006Benítez et al. , 2007;;Ibáñez et al. 2009;Cortés et al. 2013Cortés et al. , 2015] can be used to detect air-gun shots and earthquakes in continuous seismic registers.The number of active and passive events detected can be increased, not to depending exclusively on catalogues of events made by human experts.Tandem approximations combining automatic event detection prior to automatic P-phase picking can increase the quality of the picking and the tomography derived from it.Spanish Foreign Office and Italian Foreign Office.This paper has been partially supported by the Spanish projects TEC2015-68752-R (MINECO/FEDER), KNOWAVES and CGL2015-67130-C2-2.We would like to thank all private and public owners of the sites selected to deploy seismic stations for their kind and unselfish disposal to use their properties.This manuscript has been largely improved by the insightful comments of Dr. Antonio Rubio and an anonymous reviewer and by the editor José Morales.

Figure 1 .
Figure 1.Map of Sicily and Aeolian islands.Locations of short period seismic stations (yellow dots), INGV permanent stations (red dots), broadband stations (orange dots) and ocean bottom seismometers (green dots) are shown.The shooting lines of the vessel are depicted with red lines.

Figure 2 .Figure 3 .
Figure 2. Frequency contents of the active source simulated with the GUNDALF air-gun array modeling program [Gundalf 2012] under the conditions of the TOMO-ETNA experiment.

Figure 4 .
Figure 4. Example of a series of 100 air-gun shots registered in four seismic stations with different distances to the vessel.

Figure 5 .
Figure 5. (a) An example of an earthquake of magnitude 3.7 and focal depth of around 26 km.We plot a seismogram 60 seconds long recorded in 78 seismic stations and ordered by increasing epicentral distance.(b) Same for an earthquake of magnitude 0.6 and focal depth of 1.3 km.(a) (b) shows the signal associated to the three components recorded by the CUBE station E-108 on July 10, 2014.The movement of the vessel and the locations of the stations and Etna volcano are depicted with a blue line, a red spot, and a black spot re-spectively.Upper panel shows the seismogram registered 40 seconds after air-gun shot number 70 (i.e., zero time corresponds to the time when the shot took place).Lower panel presents the spectrograms associated to each component.The P-wave onset corresponds to a packet of energy in the frequency range of 6 to 8 Hz.

Figure 6 .
Figure 6.Example of standard noise register of three components X, Y and Z before filtering.Registers from CUBE-station E-109 depicted in time and frequency domains.

Figure 7 .
Figure 7.Comparison of a series of shots before filtering (left panel) and after filtering (right panel).

Figure 8 .
Figure 8.(a) Block diagram of the AMPA picking algorithm.Detail of the first multiband de-noising phase.(b) Block diagram of the AMPA picking algorithm.Detail of the P-arrival enhancement phase.

Figure 9 .
Figure 9. (a) Theoretical impulse response of an optimum detector for inputs with a sharp start and a smooth decay.(b) Impulse response of the optimum detector referred in (a), increasing its penalty for smooth starts.(d), (f ) and (h) are examples of characteristic functions (CF) obtained respectively for the example inputs earthquake -(c)-, emerging noise -(d)-and impulsive noise (g).

Figure 10 .
Figure 10.Examples of P-phase picking for an earthquake artificially contaminated with different levels of AWGN.AMPA sets the arrival time at the instant when CF gets its maximum amplitude.The effect of noise can be seen as Z max amplitude decreases two points comparing the clean signal (upper left panel) to the most challenging scenario with SNR = −8 dB (lowest right panel).

Figure 11 .
Figure 11.P-phase picking for shots produced on July 1st, 2014, as registered by the cube-station E-53.Arrival time (green dots), maximum and minimum arrival times at theoretical maximum and minimum speeds of 8000 m/s and 2000 m/s (pink line) and arrival of the water wave (yellow line).

Figure 12 .
Figure 12.Automatic P-phase picking using AMPA for the earthquake of magnitude 3.7 and depth 25.73 shown in Figure 5a.Stations are ordered starting from the closest to the most distant to the epicenter of the earthquake.

Figure 13 .
Figure13.Automatic P-phase picking using AMPA for the earthquake of magnitude 0.6 and depth 1.27 km shown in Figure5b.Stations are ordered starting from the closest to the most distant to the epicenter of the earthquake.

Figure 14 .
Figure 14.Detailed zoom of the picking of a few events from Figures 12 and 13.

Acknowledgements.
This paper has been partially funded by the following research projects: the European project MED-SUV funded by the European Union's Seventh Framework Program for research, technological development and demonstration under grant agreement No. 308665; the Spanish COCSABO project (COC-DI-2011-08); the European project EUROFLEETS2 (Seventh Framework Programme, grant agreement No. 312762) through transnational access to the research vessels "Sarmiento de Gamboa" operated by CSIC (Spain) and "Aegaeo" by HCMR (Greece); the Geophysical Instrument Pool Potsdam (GIPP) from GFZ (Potsdam) with the project (Seismic TOMOgraphy of ETNA volcano and Eolian Islands, Italy, using active and passive seismic data).We would like to thank the following supporting institutions: Dipartimento Regionale della Protezione Civile, Regione Siciliana; Dipartimento Azienda Regionale Foreste Demaniali, Ufficio Provinciale di Catania; Ente Parco dell'Etna; Unidad de Tecnología Marina -CSIC in Barcelona (Spain); Stato Maggiore Marina (Italian Navy General Staff ), CINCNAV (Command in Chief of the Fleet) and Marisicilia (Navy Command of Sicily); Coastal Guard of Messina and Riposto; to obtain support and navigation permissions for the oceanographic cruises:

Figure 15 .
Figure 15.Histogram of differences (in seconds) between automatic P-picks obtained with AMPA and manual picks obtained by experts from the Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Catania -Osservatorio Etneo, Catania, Italy.Comparison performed for 3000 registers obtained from stations of the INGV permanent network.