Development of algorithms and software for forecasting , nowcasting and variability of TEC

() Department of Electrical and Electronics Engineering, Middle East Technical University, Balgat, Ankara, Turkey () TUBITAK Marmara Research Center, Information Technologies Research Institute, Gebze, Kocaeli, Turkey () Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, U.K. () Faculty of Aeronautics and Astronautics, Istanbul Technical University (İTÜ), Maslak, Istanbul, Turkey () Department of Mathematics, Istanbul Technical University, Maslak, Istanbul, Turkey () Department of Electrical and Computer Engineering, Aristotelian University of Thessaloniki, Greece () Deutsches Zentrum für Luft und Raumfahrt (DLR), Institut für Kommunikation und Navigation (IKN), Neustrelitz, Germany


INTRODUCTION
Unpredictable variability of ionospheric parameters due to disturbances related to the ionosphereplasmapause system limits the efficiency of HF and other communications, radar and navigation systems by causing serious technological problems including range errors, rapid phase and amplitude fluctuations, in other words, radio scintillations of satellite signals and others.
With the future advancement of technology, the above-mentioned risks and financial losses will certainly increase unless swift measures are taken in advance.The ionospheric plasma interacts with the trans-ionospheric radio waves and modifies wave parameters such as amplitude, phase and po-larization from a broad frequency range.The travel time delay of transionospheric navigation signals is in the first-order approximation directly proportional to the TEC of the ionosphere and amounts up to 60 m for GPS signals.Strong gradients in the horizontal TEC structure as well as small-scale structures of the ionospheric plasma may seriously complicate or even prevent the resolution of phase ambiguities in geodetic or surveying networks.
Disturbances, which are of stochastic nature, cause variability in the parameters of electromagnetic wave propagation media such as TEC and foF2.Therefore monitoring and development of algorithms and software to treat disturbances in Earth-space and satellite-to-satellite communications are of crucial importance in the planning and operation of communication systems.Therefore nowcasting and forecasting of TEC value may prove beneficial for many scientific and technological studies and applications.
Near-Earth space processes are highly complex in nature being nonlinear and time varying with the parameters open to the effects of random variations in near-Earth space such as solar activities.It is known that in such cases mathematical modeling based on first physical principles is extremely difficult if not impossible.Therefore data driven models such as the NN based models are considered in connection with various near-Earth processes such as the ionospheric processes and found promising in modeling such processes (Cander et al., 1998;E. Tulunay et al., 2000E. Tulunay et al., , 2001;;Y.K. Tulunay et al., 2000Y.K. Tulunay et al., , 2001Y.K. Tulunay et al., , 2004b;;Senalp et al., 2002;Vernon and Cander, 2002).
The only requirement for the success of data driven models is the availability of reliable data which can represent the characteristics of the process to be modeled.
In this paper, the NN based Middle East Technical University METU-NN model is introduced to forecast the 10 min TEC variations during the high solar activity in the current solar cycle for the intervals ranging from 1 to h in advance by running the model in RAL using RAL data.
Forecast and nowcast of TEC values are also considered based on DLR/IKN TEC database and system.
Day-to-day and hour-to-hour variabilities of TEC are also estimated using statistical methods.
Another statistical approach which is based on the clustering technique is developed and a processing approach is demonstrated for the forecast of foF2.

BY NEURAL NETWORK BASED MODELS
The Middle East Technical University Neural Networks (METU-NN) technique to forecast 10 min values of the Total Electron Content (TEC) values up to 24 h ahead during high solar activity in the current solar cycle has been examined.The network is designed to forecast TEC data evaluated from GPS measurements from 2000 to 2001 at Chilbolton (51.8 N, 1.26 W) receiving station.An additional validation was performed on an independent validation data set by producing the forecast TEC values at Hailsham (50.9 N, 0.3 E) receiving station for selected months in 2002.The TEC problem and preparation of data are outlined, the Artificial Neural Network models as a databased approach for forecasting ionospheric processes are explained, the results with error tables, cross correlation coefficients and scatter diagrams are given, and the generalized and fast learning and operation of the METU-NN are discussed in the context of the COST 271 Action studies and applications.
Neural Network models are designed and trained with significant inputs.In our approach, the basic inputs for the model are the temporal inputs, the present TEC value, first difference, second difference and relative difference of TEC values.In addition, the models also contain intrinsic information on solar activity.The Neural Network architecture has one input layer, one hidden layer with the neurons and one output layer.Levenberg-Marquardt Backpropagation algorithm is used in training the Neural Network based models.Then the trained Neural Network is used to forecast the TEC values.

Preparation of data
TEC data evaluated from GPS measurements from 2000 to 2001 at Chilbolton (51.8 N, 1.26 W) receiving station are used for training, test and validation within the development mode of the Neural Network.An additional validation was performed on an independent validation data set by producing the forecast TEC values at Hailsham (50.9 N, 0.3 E) receiving station for selected months in 2002.Table 16.I summarizes the train, test and validation time intervals selected.
The basic criterion in the selection of the train, test and validation years is choosing the years corresponding to similar solar activity.In this work the current high solar activity time periods, i.e. years with current high sun spot number values, are selected.1) The present value of the TEC:  Among the various Neural Network structures the best configuration is found to be the one with one hidden layer.There are 8 inputs, 8 hidden neurons and 1 output in the feed-forward structure (fig.

Construction of the Neural Network based model
16.1).The Levenberg-Marquardt Backpropagation algorithm is used in training.

Results
In the operation mode, forecast of the TEC values 1, 3, 6, 12, and 24 h in advance is performed separately for the validation data sets, minute by minute.Then the root mean square, normalized and absolute error values are calculated.Also the cross correlation coefficients are calculated.The analyses and results of the TEC forecast in table 16.II covers the time interval between April and May 2002 for the Hailsham receiving station.It is seen in the scatter diagram shown in fig.16.5 that the deviations from straight line are small.Therefore the correlation coefficients are very close to unity.In other words, the Neural Network model learned the shape of the inherent nonlinearities.This result demonstrates that the model has a high sensitivity.It is also observed that the fitted line has a slope close to 45°passing through the origin.Therefore the forecasting errors are small.This fact is an indication of the Neural Network system reaching the correct operating point and demonstrates that the model has a high accuracy.In other words, the Neural Network system reaches the global minimum.

TEC DATABASE, FORECAST AND NOWCAST
As emphasized in the previous sections, it is of crucial importance to monitor the TEC.Permanent monitoring is performed by DLR for the European and polar regions.Nowcasting and forecasting of the TEC values are also performed based on the DLR database (Stankov et al., 2001;Wehrenpfennig et al., 2001;Jakowski et al., 2002a,b;Klaehn et al., 2003).

TEC database
Since 1995 DLR/IKN has been operating a new system for regularly processing data and producing TEC maps over the European region based on GPS measurements by the International GPS Service (IGS).The 30s data from the GPS stations of the European IGS network allow the determination of slant TEC values along numerous satellite-receiver links over the European area with high time resolution.The instrumental biases are separated from the observations by assuming a second-order polynomial approximation for TEC variations over the observing GPS ground station.Both TEC and the instrumental satellite-receiver biases are estimated simultaneously by a Kalman filter run over 24 h.The slant TEC data are then mapped to the vertical by applying a mapping function which is based on a single layer approximation at h s p = 400km.Finally, the observed TEC data are combined with a regional TEC model (Neustrelitz TEC Model -NTCM) in a way that the map provides measured values near measuring points and model values at regions without measurements.The advantage of this procedure is that (in case of a low number of measurements) it delivers reasonable ionospheric corrections which can be provided to users to enhance accuracy and integrity of positioning.The existing large database, containing data from all solar/geomagnetic conditions, is an optimal background for the validation of all types of ionospheric correction especially at highly disturbed ionospheric conditions where other measurement techniques (e.g., ionosondes) are limited.

Nowcast
The computed European TEC maps (comparable to WAAS and ESTB ionospheric correction maps) cover a region of 32.5°N to 70°N in latitude and -20°to 60°E in longitude.The measurements have a routine time resolution of 10 min.Former verification studies by independent data sources (EISCAT, ionosondes) have shown that the absolute errors of the estimated TEC values are less than about 2-3 TECU.Furthermore, DLR/IKN has developed the software modules for deriving ionospheric grid errors in the EGNOS System Testbed (ESTB) in real-time.

Forecast
In addition to the nowcast data, forecasts of TEC, based on regular and reliable GPS measurements, would also be very helpful to improve the surveying practice.Auto-and cross-correlation procedures have been recently developed for predicting both the critical frequency and the TEC, strongly relating the short-term forecast to present and future geomagnetic activity.Preliminary results of these methods/procedures have already been tested and reported for the one-dimensional case when forecasting is performed at a given location based on GPS-TEC measurements, solar and geomagnetic activity indices.If such a prediction is made at several locations in a given region, then instantaneous maps of the forecast can be constructed covering the region of interest.The short-term forecast method is capable of delivering a forecast up to 24 h ahead based on a prediction of the 'quite-time behaviour' of TEC and a subsequent correction on the relative deviations of the measured TEC from its median (quiet-time) values.These deviations, if large enough, are related to the perturbations induced by the eventual geomagnetic storm developing at the same time.This method relies on the long GPS-TEC time-series data.Research and development activities continue according to the envisaged combined nowcast and forecast service as in fig.16.10.

TEC VARIABILITY
Day-to-day and hour-to-hour variability is a permanent feature of the ionosphere.Therefore a statistical approach is also necessary for forecasts and predictions (Rawer, 1993;Rawer et al., 2003).Moreover, it is known that the variability from hour-to-hour or from day-to-day of the Total Electron Content (TEC) could be estimated from the diurnal variation of the relative deviation of the hourly daily value with respect to the corresponding monthly-median value (Kouris and Fotiadis, 2002).That is from the expression where dT is the relative deviation, T d stands for the hourly daily value of TEC and Tm for the corresponding monthly-median value.Using Faraday rotation TEC data and also GPS measurements made in Florence and Matera (Italy), respectively we have calculated quartile and decile levels of variability at each hour/month/year.Figures 16.11 and 16.12 report upper and lower quartiles and deciles counted at some selected hours of each month using all available data measured during years of low and high solar activity, respectively.It can be seen that the variability in TEC is higher at months/years of high solar activity    than during corresponding months/years of low activity (Kouris et al., 2004).Moreover, the variability in TEC is higher after midnight up to before dawn than during the other hours of the day.
Figure 16.13 reports the diurnal variations of quartiles and deciles of TEC relative deviations at low (left) and high (right) solar activity and different seasons.The dependence of variability on season is evident.The variability in TEC exhibits higher values during winter and at nighttime, whereas at summer the variability could be assumed practically constant with time of day.It is also evident from the same figure that the variability is higher at high than low solar activity.Precisely, the variability in TEC is usually close to that in foF2 at low solar activity, whereas at high solar activity it overlaps that of maximum electron density.
Finally, we may state that the variability in TEC for the 90% of the time does not exceed (in absolute value) practically the 40% of the corresponding monthly median value at any time of day and any month except for enhancements in winter, i.e. when very disturbed conditions occur.

CLASSIFICATION OF THE MONTHLY MEDIANS OF THE IONOSPHERIC CRITICAL FREQUNECY foF2 USING CLUSTERING TECHNIQUE
Cluster analysis is used to classify monthly medians of the ionospheric critical frequency foF2 data for 13 stations in Europe during the period 1958-1998.The algorithm used agglomerates the data consisting of 4801 samples of daily variations into 6 sets of sizes ranging from 1334 to 431 samples, characterized mainly by R12 and seasons.The data from these stations were not completed and we also disregarded data samples with missing values and worked with a total of 4801 samples of daily variations indexed by the month, year and station.Based on our previous experience, we eliminated the longitudinal dependency by a local time shift (Mizrahi et al., 2002).We overlooked any dependency on the geomagnetic coordinates and identified the dependency on the calendar year by a dependency on R 12, hence over looking any possible dependency on atmospheric conditions (Mizrahi et al., 2002).The work aimed to be a preprocess to obtain a forecast curve for the ionospheric critical frequency foF2.
The clustering algorithm we use starts with a random element from the sample and finds those samples that lie in a certain neighborhood (here 22%) with respect to the L2 norm.We then repeat the procedure after removal of the samples lying in this cluster from the whole data.This procedure gave a total of 27 clusters, 15 of which containing less than 30 elements (less than 0.62% of the data).A qualitative study of these tiny clusters showed that clusters containing less than 10 elements were rather related to bad data and they were overlooked.As a first step we studied the larger 12 clusters, containing 1296 to 34 elements.
The representative curves were characterized by 4 parameters, width (w), baseline (b), center (c) and peak amplitude (a).Based on these parameters, we merged the 16 clusters (the larger 16 of the 27 clusters), obtaining 6 groups.
In order to study the structure of each cluster, we obtained histograms of the distribution of the stations (latitude dependency), months (seasonal dependency) and years (R12 dependency) inside each of the 6 clusters, as shown in figs.16.14 to 16.22.From figs.16.14 to 16.16, (13 stations are ranged from low latitude to mid and high latitude), we can see that cluster 3 is linked to high latitude stations while cluster 4 contains low latitude stations.
Samples in other clusters seem to have a uniform latitude distribution.In figs.16.17 to 16.19, (12 months from January to December), we can clearly see that cluster 4 represents equinox conditions, cluster 1 and 2 represents summer, while cluster 3, 5 and 6 contain winter data.Day-to-day and hour-to-hour variability of TEC are also estimated by using statistical methods.
Another statistical approach based on clustering technique was developed and a processing approach was demonstrated for the forecast of foF2.
As a result of the studies on which this paper is based, data driven and statistical tools were developed for forecasting, nowcasting and investigating the variability of TEC.The methods developed can be used for characterizing the electromagnetic wave propagation medium for the purposes of radio system planning and operation.
The construction work of the Neural Network based model is carried out in the development mode.It is composed of «training phase or learning phase» and «test phase» (Y.K. Tulunay et al., 2004a).Data sets of same month, different year are used for training and validation phases within development as in table 16.I.For fast learning of the process with the huge sized input data, the «Levenberg-Marquardt Backpropagation» algorithm is used within training in the development mode.As the training advances, the training error starts to decrease, and it eventually reaches zero, which corresponds to the memorization.Memorization means the loss of the generalization capability of the Neural Network.To prevent memorization, the training is halted, and independent validation data are used.Errors are calculated.The decrease in the validation error is noted.Training is restarted, and the training cycle is repeated.When the gradient of the error in the validation phase becomes near zero, a «stop training» signal is produced, and thus the training is terminated.The model is then ready for its actual use in the operation mode for forecasting of the TEC.In the operation mode the validation data are used for calculating the errors, point by point, to measure the performance of the model.The value of the TEC at the time instant k is designated by f(k).The output is f(k+h).It is the value of the TEC to be observed h hour(s) later than the present time, and h is 24 at most.The 8 inputs used for the Neural Network are as follows:
Figure 16.2 exhibits the TEC values versus the order of data points in April and May 2002.Superimposed on a solid line are the 1 hour advance forecast values of the TEC.Table 16.III gives the daily solar-terrestrial indices for the times of interest, i.e. 5-7 April 2002 and 18-22 April 2002.Figures 16.3 and 16.4 are the enlarged portions of some data points of fig.16.2, i.e. the diurnal variations of the observed, and forecast TEC values during 18-22 April 2002 and 5-7 April 2002 respectively.Those portions are selected from the data of fig.16.2.That is, the horizontal axes are expanded.Figure 16.5 is the scatter diagram of the forecast and observed TEC values for April and May 2002.
Figure 16.6 exhibits the TEC values versus the order of data points in April and May 2002.Superimposed in a solid line are the 3 h advance forecast values of the TEC.

Fig. 16
Fig. 16.3.Observed GPS-TEC results for disturbed solar-terrestrial conditions (dotted), and 1 h ahead Forecast (solid) TEC values for the enlarged portion of the time of validation period: 18-22 April 2002.Fig.16.4.Observed GPS-TEC results for quiet solar-terrestrial conditions (dotted), and 1 h ahead Forecast (solid) TEC values for the enlarged portion of the time of validation period: 5-7 April 2002.

Figure 16
Figure 16.7 and 16.8 are the enlarged portions of some data points of fig.16.6, i.e. the diurnal variations of the observed, and forecast TEC values during 18-22 April 2002 and 5-7 April 2002 respectively.Figure 16.9 is the scatter diagram of the forecast and observed TEC values for April and May 2002.It is seen in the scatter diagram shown in fig.16.5 that the deviations from straight line are small.Therefore the correlation coefficients are very close to unity.In other words, the Neural Network model learned the shape of the inherent nonlinearities.This result demonstrates that the model has a high sensitivity.It is also observed that the fitted line has a slope close to 45°passing through the origin.Therefore the forecasting errors are small.This fact is an indication of the Neural Network system reaching the correct operating point and demonstrates that the model has a high accuracy.In other words, the Neural Network system reaches the global minimum.

Fig. 16
Fig. 16.8.Observed GPS-TEC results for quiet solar-terrestrial conditions (dotted), and 3 h ahead Forecast (solid) TEC values for the enlarged portion of the time of validation period: 5-7 April 2002.

Fig. 16 .
Fig. 16.11.Plots of quartiles and deciles of TEC variability at randomly selected hours.Data measured at years of low solar activity.

Fig. 16 .
Fig. 16.12.Plots of quartiles and deciles of TEC variability at randomly selected hours.Data measured at years of high solar activity.

Fig. 16 .
Fig. 16.13.Diurnal variation of quartiles and deciles of TEC variability at low (left) and high (right) solar activity and different seasons.

Fig. 16 .
Fig. 16.14.Histogram of the stations from low latitude to mid and high latitude (37.9N to 67.8N), for Cluster 1 and Cluster 2.

Fig. 16 .
Fig. 16.15.Histogram of the stations from low latitude to mid and high latitude (37.9N to 67.8N), for Cluster 3 and Cluster 4.
Fig. 16.16.Histogram of the stations from low latitude to mid and high latitude (37.9N to 67.8N), for Cluster 5 and Cluster 6.

Table 16 .
I. Selection of the time periods for the input data.