Ionospheric parameter modelling and anomaly discovery by combining the wavelet transform with autoregressive models

The paper is devoted to new mathematical tools for ionospheric parameter analysis and anomaly discovery during ionospheric perturbations. The complex structure of processes under study, their a-priori uncertainty and therefore the complex structure of registered data require a set of techniques and technologies to perform mathematical modelling, data analysis, and to make final interpretations. We suggest a technique of ionospheric parameter modelling and analysis based on combining the wavelet transform with autoregressive integrated moving average models (ARIMA models). This technique makes it possible to study ionospheric parameter changes in the time domain, make predictions about variations, and discover anomalies caused by high solar activity and lithospheric processes prior to and during strong earthquakes. The technique was tested on critical frequency foF2 and total electron content (TEC) datasets from Kamchatka (a region in the Russian Far East) and Magadan (a town in the Russian Far East). The mathematical models introduced in the paper facilitated ionospheric dynamic mode analysis and proved to be efficient for making predictions with time advance equal to 5 hours. Ionospheric anomalies were found using model error estimates, those anomalies arising during increased solar activity and strong earthquakes in Kamchatka.


Introduction
One of the most important tasks of ionospheric data analysis is connected with ionospheric state control along with anomaly discovery with subsequent interpretation of anomalies occurring during ionospheric perturbations [Afraimovich et al. 2000, Afraimovich et al. 2001, Nakamura et al. 2007, Pervak et al. 2008, Hamoudi et al. 2009, Liming et al. 2011, Mandrikova et al. 2012a, Mandrikova et al. 2012b].Ionospheric anomalies may be caused by increased solar activity; in seismically active regions they can also be encountered during the periods of high seismic activity [Mandrikova et al. 2012a[Mandrikova et al. , 2012b]].The Earth's ionosphere is the area of the atmosphere extending from 80 km to 1000 km and affecting radio wave propagation [Maruyama et al. 2011, Nakamura et al. 2009, Watthanasangmechai et al. 2012].The dynamic mode of the ionosphere is determined by the distribution of its parameters over altitude, atmospheric density, chemical composition, and spectral characteristics of solar radiation.Ionospheric studies are mostly based on analyzing environmental parameters, which can dramatically change with altitude and depend on a solar activity cycle, geomagnetic conditions, geographical coordinates and contain characteristic diurnal and seasonal changes [Afraimovich et al. 2000, Afraimovich et al. 2001, Nakamura et al. 2009, Watthanasangmechai et al. 2012, Danilov 2013, Roux 2012].Among the main parameters characterizing the ionospheric state are variations of critical frequency of F2-layer of the ionosphere (foF2) and total electron content.Ionospheric critical frequency data (foF2) are registered by vertical radio sounding using an impulse radar (ionosonde) and these data are represented as a time series.Variations of total electron content are obtained using readings of double-frequency ground GPS receivers [Afraimovich et al. 2000].Variations of foF2 and total electron content characterizing electronic concentration of the ionosphere largely depend on diurnal and seasonal variations of solar radiation, solar activity cycles, and geomagnetic activity [Afraimovich et al. 2000, Nakamura et al. 2007, Hamoudi et al. 2009, Mandrikova et al. 2012a, Mandrikova et al. 2012b].Abrupt fluctuations of the ionospheric electron density arising due to increased solar and/or geomagnetic activity lead to abnormal behavior of ionospheric critical frequency variations and total electron content variations [Afraimovich et al. 2000, Nakamura et al. 2007, Hamoudi et al. 2009, Mandrikova et al. 2012a, Mandrikova et al. 2012b].
Ionospheric state control and anomaly extraction are important tasks of ionospheric parameter analysis [Afraimovich et al. 2000, Afraimovich et al. 2001, Liu et al. 2008, Nakamura et al. 2009, Roux 2012, Watthanasangmechai et al. 2012, Danilov 2013, Ezquer et al. 2014, Zhao et al. 2014].Ionospheric parameters affect many spheres of our life and can have a negative effect on satellite systems and radio communication propagation.An ionospheric anomaly is a very complex phenomenon in time and space [Afraimovich et al. 2000, Afraimovich et al. 2001, Danilov 2013] and they mostly arise in periods of increased solar and geomagnetic activities [Afraimovich et al. 2000, Afraimovich et al. 2001, Danilov 2013, Roux 2012].In seismically active regions they can also be observed in periods of increased seismic activity [Afraimovich et al. 2000, Afraimovich et al. 2001, Nakamura et al. 2007, Nakamura et al. 2009, Watthanasangmechai et al. 2012, Klimenko et al. 2012].The registered ionospheric parameters contain anomalies manifesting themselves as a significant deviation (rise or decrease) from the background level.The main reason for anomalies in lower ionospheric layers (regions E and D) is ionization rate variation due to cor-puscular invasion.Variations in region F are caused by indirect factors -neutral composition and dynamic processes [Prolls 1995].Morphology and physics of ionospheric phenomena were described in many previous works and review papers [Prolls 1995, Rees 1995, Buonsanto 1999, Mendillo 2006, Danilov 2013].
The ionospheric structure, being variable and heterogeneous, is normally studied by variation analysis of environmental parameters registered by special equipment.Lack of knowledge on the structure of registered parameters and an insufficient number of formal models for parameter representation make the task in question rather complicated to deal with.Despite the fact that contemporary technologies of near-Earth space monitoring are developing rather steadily and rapidly, the capabilities to control and predict the ionospheric state are still rather limited [Nakamura et al. 2007, Mandrikova et al. 2012a, Mandrikova et al. 2012b].
The complex structure of ionospheric parameter variations results in the conventional techniques being inefficient for any kind of mathematical modelling or computer simulation.The conventional techniques mentioned above are mostly based on smoothing procedures and do not allow us to explore subtle features of data [Nakamura et al. 2007, Pervak et al. 2008, Hamoudi et al. 2009, Liming et al. 2011, Mandrikova et al. 2012a, Mandrikova et al. 2012b], the latter usually containing important information about the processes under study.Low sensitivity of the registered parameters is the reason why their processing and analysis have become even more complicated and time-consuming.One of the popular approaches to data simulation and analysis is the application of ARIMA models [Kay andMarple 1981, Huang et al. 2013].They make it possible to reveal steady characteristics of data and some possible ways of data prediction.Multiple practical research has confirmed the capability of this mathematical apparatus to deal with various applied problems in the area of geophysics [Box and Jenkins 1970, Basseville and Nikiforov 1993, Huang et al. 2013].Nonetheless, ARIMA methods have certain restrictions on working with particular kinds of data [Nakamura et al. 2009, Huang et al. 2013].
Wavelet-based approaches have been broadly applied to multiple areas of signal processing and related fields in the last twenty years [Al-Kasasbeh 2004, Mandrikova and Bogdanov 2007, Zhang et al. 2008, Li et al. 2009, Vargas-Bracamontes et al. 2009, Beltrán and Ponce de León 2010, Hafez et al. 2010, Mandrikova et al. 2010, Hafez and Ghamry 2011, Liming et al. 2011, Mandrikova et al. 2011, Wang et al. 2011, Gwal et al. 2012, Mandrikova et al. 2012b, Mandrikova et al. 2012c, Bakhshi et al. 2013, Castillo et al. 2013, Han and Chang 2013, Mandrikova et al. 2013a].These methods encompass some elements of the approximation theory [Donoho andJohnstone 1998, Mallat 1999] and filtering [Mitra 1998] and in many cases they help us to solve the tasks in question [Vargas-Bracamontes et al. 2009, Mandrikova et al. 2012a, Mandrikova et al. 2012b, Mandrikova et al. 2012c, Mandrikova et al. 2013a].Owing to a wide range of existing wavelet bases with compact support, wavelets provide special opportunities for detailed studies of complex data structure.Fast computational algorithms for wavelet decomposition [Mallat 1999, Percival and Walden 2000, Daubechies 2001] undoubtedly pose a substantial advantage for real-time signal processing, which is indispensable for quick data analysis.
The suggested technique of ionospheric parameter simulation and analysis is based on combining the wavelet transform with ARIMA models.The underlying idea is concerned with employing multiresolution analysis [Mallat 1999, Percival and Walden 2000, Daubechies 2001], which acts to decompose the original dataset into a finite number of multiscale components with different time resolutions.Application of the wavelet transform to foF2 data in order to extract trend and fluctuations was also suggested in the work by Roux [2012] and showed its efficiency for finding connections between ionospheric parameters and solar and geomagnetic activities.The components obtained by decomposition have a more simple structure compared to the original data.This paper is focused on ARIMA methods used for approximation.This approach has been applied by the authors since 2003 and was successfully adapted for anomaly extraction in subsoil radon.Identification of component models includes noise suppression and determination of stable data characteristics.In this paper we suggest joining multiscale component models together so that they will make a general parametric construction characterizing data variability in time and allowing us to make predictions.Using residual error estimates for final prediction we have developed a numerical algorithm for anomaly extraction.
Multicomponent time series models of ionospheric critical frequency foF2 (data were registered by the University of Cosmophysical Research and Radio Wave Propagation, Paratunka village, Kamchatka region, Russian Far East) and TEC (data were obtained from bi-frequency ground-based GPS receivers) [Afraimovich et al. 2000, Afraimovich et al. 2001, Erdoğan and Arslan 2009] in Kamchatka and Magadan were constructed.They confirmed the efficiency of the suggested technique and made it possible for scientists to analyze regular diurnal and seasonal parameter changes.Using the estimate of deviation from the background level ionospheric anomalies were extracted with a duration ranging from several dozens of minutes to several hours.These anomalies arise during ionospheric perturbations.Our detailed analysis has shown that the anomalies described above occur during the increased solar activity and strong earthquakes in Kamchatka (several seismic events of energy class k > 12.5 have been analyzed to make this conclusion).

Model construction
Consider a closed space V j =clos L 2 (R) 2 j { 2 j t−k :k∈Z) with scale j = 0 formed by a scaling function {∈L 2 R.
Using multiresolution decompositions to level m we can represent the original data the following way [Mandrikova et al. 2011[Mandrikova et al. , 2012c[Mandrikova et al. , 2013a]]: (1) where: f 2 -m t∈V -m , g2 j t∈W j ; -W j is a space of scale j formed by wavelet basis k)∈I j are decomposition coefficients representing multiscale details; -e2 j t= ∑ k e j,k W j,k are noise components; e j,k = 〈f ,W j,k 〉:( j,k)∉I j are decomposition coefficients.
Expression (1) can be used for representing a time series as a sum of multiscale components f 2 -m t, g2 j t, and e2 j t, where j=−1, -−m --.In order to suppress noise components e2 j t and extract components f 2 -m t and g2 j t characterizing stable characteristics of data the following sequence of steps is needed: (1) Reconstruction of the components f 2 -m t, g2 j t, and e2 j t, j=−1, -−m -to the original scale j=0 and calculation of the reconstructed components f 0 where n is a component number.
(2) Application of a conventional approach [Basseville and Nikiforov 1993] for determining an ARIMA model in order to approximate each reconstruction component f 0 (3) Performance of diagnostic tests for component models.If diagnostic tests for a component model confirm its adequacy, the component model reflects stable characteristics of data (according to ARIMA model theory).
(4) Combination of the extracted component models to create a general parametric construction (all other components from Expression (1) will be considered as noise components).The outcome of this combination is a parametric multicomponent model representing data changes in the time domain: where: - difference order of the n-th component; p n j is an autoregressive model order of the n-th component; h n j , i n j,k are model orders and parameters of moving average of the n-th component model; a n j,k are residual errors of the n-th component model; -M is the number of modelled components representing stable characteristics of data; - 2) is valid for any scale j.Thus, identification of the suggested model can be carried out without reconstructing the components to the original scale j =0 (see step 1 above).By changing a decomposition level (see Expression (1)) we can obtain various models of type (2) for representing a time series.Minimization of the residual error leads to the best multicomponent model of a time series.The resulting estimate can be further improved by applying various wavelets.

Data prediction and anomaly discovery
Prediction of s n j,k+q , q≥ 1 (see Expression (2)) determines the predicted value s n j,k at the moment t=k with time advance q.Value s n j,k+q can be found via model (2): The residual errors of the n-th component at scale j are determined as a difference between predicted and actual values of data at time point t=k+q: Model (2) represents regular changes in approximated data.If an anomaly has occurred, the data structure will be modified, but the absolute residual errors will rise.Anomaly extraction can be performed using a residual error estimate of component models when performing predictions.
Anomaly discovery in a component with number n at scale j will be based on the following condition: where T A j is a threshold given in advance (this threshold defines the presence of anomalies on scale j in data), U j is the window length on scale j.

Experimental research
The experimental research was conducted using critical frequency foF2 data collected in Kamchatka and Magadan.The critical frequency foF2 data contain missing values, which might impede further simulation and detailed analysis.To reduce errors it was decided to choose time periods with the smallest number of misses.Due to seasonal features of ionospheric processes, the data were first split into seasonal sets and then simulated separately.Below there is a detailed description of the critical frequency data simulation for winter and summer seasons.
The model construction was based on hourly values of foF2 critical frequency data for the period from 1968 to 2013.These data were obtained by vertical radio sounding of the ionosphere.Simulation of quiet variations of ionospheric parameters was performed using time intervals of a relatively calm geomagnetic field (intervals where the resulting K-index ∑ K did not exceed 24).These intervals did not contain any strong seismic events (intervals where there were no earthquakes with the energy class Ks ≥12, which happened no farther than 300 km from the station of the ionospheric radio sounding).
Taking into account the seasonal variability of an ionospheric process we performed separate data simulation for different seasons.Tables 1 and 2 contain time intervals for identification and model estimation.The solar activity was estimated using the value of radiation corresponding to a wavelength of f10.7.If f10.7 <100 the activity was taken as low, otherwise the activity was considered to be high.
By changing the decomposition level (see (1)) and applying various wavelet functions (Daubechies wavelets were taken) different models of type (2) were obtained for representing time changes in foF2.Model identification and diagnostics for different time series components (steps 2,3 in 2.1) have shown that the original time series foF2 and their approximating components of the 1st and 2nd decomposition levels have a complex structure and cannot be approximated by an ARIMA model.The reason is non-zero correlation of model residuals.Residual error minimization is the way of finding the best time series model foF2, which includes 3 (three) components for winter and summer seasons.These components have the following form in the wavelet domain: where: - , n = 2,3 have a more simple structure than the original time series f(t).In Figure 2 we can see the distribution estimates of the registered time series foF2 and the components f 0 1 (t), g n 0 (t), n = 2,3.The model parameter estimation is performed for each extracted component.
In Table 3 there are estimated model parameters

High solar activity winter season summer season
Table 1.Time intervals of foF2 data used for model identification.
Table 2. Time intervals of foF2 data used for model diagnostics.(4)  of these components for data with different time periods.Due to close parameters of the corresponding components a general time series model was determined for foF2 for winter and summer seasons: Figure 3 shows the simulation and prediction results for foF2 data from the Paratunka observatory.These results were found by model ( 5) (the period is 14. 02.2011-23.02.2011).Figure 4 also illustrates the simulation error estimates for foF2 data obtained by ARIMA methods and the suggested technique.These estimates were calculated using foF2 data from Paratunka for the following periods: 13.07.1969-21.08.1969, 26.07.1970-01.09.1970, 26.07.1972-22.07.1972, 31.05.1979-03.07.1979, 26.05.1983-11.06.1983, 22.05.1984-18.06.1984, 21.07-16.08.1987, 21.07.1989-16.08.1989, 30.07-21.08.1992, 30.07.1999-21.08.1999, 13.06.2000-28.06.2000, 18.06.2002-06.07.2002, 27.06-10.07.2005.Analysis of Figures 3 and 4 confirms the efficiency of the suggested technique and shows that the multicomponent model is suitable for making a prediction for ionospheric parameters with time advance equal to 5 hours.At some time points there are rises in prediction errors (Figure 3h,i,j).In order to analyze the prediction errors, our simulation results were compared with geomagnetic data (H-component of the geomagnetic field was used), which were needed for estimating the geomagnetic perturbation intensity for the analyzed time periods (the geomagnetic perturbation intensity was estimated using the technique expounded in [Mandrikova et al. 2013b]), R),a,b∈R,a≠0.The results provided in Figure 5 illustrate the fact that prediction error increase associated with an ionospheric anomaly is observed prior to a strong earthquake in Kamchatka (20.02.2011, energy class E=14.1, the anomaly is shown in Figure 5 by the dashed line).
Testing the suggested technique on TEC data has confirmed possible connection between ionospheric anomalies and seismic events in Kamchatka [Mandrikova et al. 2013c].Such effects in the ionosphere with a duration from several minutes to several hours arising in periods of increased seismic activity in different seismically active regions are also specified in a number of earlier works [Afraimovich et al. 2000, Afraimovich et al. 2001, Nakamura et al. 2007, Nakamura et al. 2009, (5) Klimenko et al. 2012, Watthanasangmechai et al. 2012].
The issues connected with the nature and origin of such effects are still to be studied more thoroughly.During the increased geomagnetic activity we can also observe a prediction error rise.Ionospheric anomalies extracted in periods of strong and moderate magnetic storms were large-scale ones and their duration was equal to one day or even longer, which is caused by significant changes of electronic concentation level relative to the background.They reached their maximum intensity during minor geomagnetic perturbations.It was also possible to observe small-scale anomalies connected with local fluctuations of the electron density of the ionosphere.The results described in the paper agree with those published before [Mendillo 2006, Roux 2012, Danilov 2013].Such a complex behavior of the ionosphere is illustrated in Figures 5, 7 and 8. Ionospheric anomalies discovered in periods of solar flares and ejection of coronal material mostly had small scales and arose against a background of weakly disturbed and quiet geomagnetic field.The morphology and physics of such a behavior are expounded in review papers [Prolls 1995, Rees 1995, Buonsanto 1999, Danilov and Laštovička 2001, Danilov 2013].The statistical analysis  of foF2 data for different years has shown a substantial connection between anomaly intensity and anomaly occurrence frequency in the ionosphere, on the one hand, and solar and geomagnetic activities, on the other hand.This conclusion agrees with works [Afraimovich et al. 2000, Afraimovich et al. 2001, Roux 2012].During the periods of increased solar activity the general error level goes up; when anomalies occur the deviation from the background level rises dramatically.We can confirm this by the results shown in Figure 6.
The joint analysis of the simulation results (data from Kamchatka and Magadan) reveals some general trend.The H-component of the geomagnetic field (Paratunka observatory) is shown in Figure 7 with the aim of analyzing the geomagnetic activity.When the geomagnetic activity is increased, the prediction error rises for data from Kamchatka and Magadan.At some time points the prediction errors for data from Kamchatka exceed significantly those for data from Magadan (this period is shown in Figure 7 by the dashed line), which is probably connected with seismic events in Kamchatka.

Conclusions
The suggested technique of ionospheric parameter simulation and analysis based on combining the wavelet transform with ARIMA models was used to approximate trend in critical frequency data from Kamchatka and Magadan.The experiments have confirmed the efficiency of the suggested technique and its advantages for studying regular changes in ionospheric parameters and making predictions with time advance equal to 5 hours.
The residual error estimates of the models were employed for discovering ionospheric anomalies during ionospheric perturbations with a duration from several dozens of minutes to several hours.The changes in ionospheric parameters have different scales and arise prior to and during strong earthquakes in Kamchatka.
The variation analysis of the parameters during the periods with different solar activity and comparison of simulation results with geomagnetic data have shown the connection between anomaly intensity and anomaly occurrence frequency, on the one hand, and solar and geomagnetic activities, on the other hand.During the periods of increased geomagnetic activity there is a significant rise of deviation from the background level due to anomaly occurrence.

Figure 1 .
Figure 1.Approximated components in the wavelet domain (highlighted in black).

Figure 4 .
Figure 4. Simulation error for foF2 data.Solid line: using the wavelet transform; dashed line: without using the wavelet transform.Solid line is E win MCM = K ∑ k=1

Figure 5 .
Figure 5. Detailed analysis of foF2 data for the period 09.02.2011-27.02.2011: prediction was performed using model (5) with time advance q =1; the arrow indicates a seismic event.

Figure 6 .
Figure 6.Prediction error estimates for foF2 data for different years.E year = K ∑ k=1

Figure 7 .
Figure 7. Simulation results for foF2 data from Kamchatka and Magadan for the period 16.01.2006-04.02.2006: prediction was made using model (5) with q =1; the arrow indicates a seismic event in Kamchatka (energy class k =12.7).

Figure 8 .
Figure 8. Simulation results for foF2 data from Kamchatka for the period 01.01.2006-26.02.2006.(a) Prediction error for the component f 2 -3 t in the sliding time window equal to 24 hours; (b) prediction error for the component g2 -3 t in the sliding window equal to 24 hours.