K-type geomagnetic index nowcast with data quality control

A nowcast system for operational estimation of a proxy K-type geomagnetic index is presented. The system is based on a fully automated computer procedure for real-time digital magnetogram data acquisition that includes screening of the dataset and removal of the outliers, estimation of the solar regular variation (SR) of the geomagnetic field, calculation of the index, and issuing of an alert if storm-level activity is indicated. This is a timecontrolled (rather than event-driven) system that delivers the regular output of: the index value, the estimated quality flag, and eventually, an alert. The novel features provided are first, the strict control of the data input and processing, and second, the increased frequency of production of the index (every 1 h). Such quality control and increased time resolution have been found to be of crucial importance for various applications, e.g. ionospheric monitoring, that are of particular interest to us and to users of our service. The nowcast system operability, accuracy and precision have been tested with instantaneous measurements from recent years. A statistical comparison between the nowcast and the definitive index values shows that the average root-mean-square error is smaller than 1 KU. The system is now operational at the site of the Geophysical Centre of the Royal Meteorological Institute in Dourbes (50.1oN, 4.6oE), and it is being used for alerting users when geomagnetic storms take place.


Introduction
There is ongoing demand for services that can provide real-time assessment of the (global and local) geomagnetic activity, which has been identified as being of importance to space weather research and modeling, exploration geophysics, radio communications, satellitebased positioning/navigation, etc. [Kohl et al. 1996, Datta-Barua et al. 2005, Burger et al. 2006, Stankov et al. 2006, Stankov and Jakowski 2007, Warnant et al. 2007a, Warnant et al. 2007b, Warnant et al. 2007c, Stankov et al. 2009; and the references therein].Monitoring services depend largely on a reduction of the solar and geomagnetic observations to generate reliable indices of activity.There are several internationally recognized indices that are currently in use, some of the most widely used being the K indices [Bartels et al. 1939, Chapman and Bartels 1940, Knecht 1972, Mayaud 1980, Jacobs 1987, Menvielle and Berthelier 1991, Siebert 1996].The K indices are quasilogarithmic indices that characterize the 3-hourly range in transient magnetic activity relative to the regular "quiet-day" activity for single-site locations.K-derived planetary indices that are derived from K indices measured for a planetary network of geomagnetic observatories, such as Kp or am, provide convenient measures of the global geomagnetic activity.Despite their imperfections, these planetary geomagnetic indices are much used in upper-atmosphere science, together with the solar activity indices -the solar sunspot number (Rz) and the solar 10.7-cm radio-flux index (F10.7)[Akasofu and Chapman 1972].
Originally, the K indices were scaled manually [Mayaud 1967] from analog (photographic) magnetograms with a standard recording speed of about 20 mm/h.With advances in electronic equipment and computer technologies, however, the data acquisition and processing, plus the subsequent production of the K indices, have gradually shifted to using these new (digital) technologies [Van Wijk and Nagtegaal 1977, Niblett et al. 1984, Riddick and Stuart 1984, Hopgood 1986, Wilson 1987, Menvielle 1990].Several computer-based methods for calculating the K indices have been developed along the way [Jankowski et al. 1988, Pirjola et al. 1990, Nowozynsky et al. 1991, Sucksdorff et al. 1991, Menvielle et al. 1995, and the references therein].Following thorough examinations with the help of manually scaled indices [Menvielle et al. 1995, Bitterly et al. 1996], it was shown that the method of linear elimination (also known as the Finnish Meteorological Institute (FMI)/Finnish method) delivers the best results, and also of great importance, it is directly applicable to any observatory.
While the computer-based derivation of the K indices was a major step towards higher operational efficiency and lower costs, the nowcast system, i.e. automated data acquisition and processing together with the production of the geomagnetic activity index in real time, is now mandatory for the variety of space weather services that are being developed around the World [Berthelier et al. 1996, Takahashi

Subject classification:
Geomagnetic activity index, Real-time computing, Nowcast system, Geomagnetic field variations, Instruments and techniques. et al. 2001, Andonov et al. 2004, Wing et al. 2005, Viljanen et al. 2008].However, there are several important issues related to the production of geomagnetic indices.Although the 3-h K-derived planetary geomagnetic indices have had a basic role in geomagnetism and aeronomy for a very long time, and although they still provide good representations of large-scale geomagnetic and ionospheric disturbances, they appear not to be accurate enough when representing disturbances on a smaller scale.From this perspective, the local K index (derived from the nearest magnetic station/s) fits the observations of small-scale disturbances better than global K-derived planetary indices, and so they might be considered as a better choice.Moreover, the 3-h interval between the determination of two consecutive indices is much larger than the shorter characteristic time of localized ionospheric phenomena and effects on global navigation satellite system (GNSS) applications that are of particular interest to us.
The purpose of this study is to present our experience in developing a nowcast system for the local operational K-type geomagnetic index calculation (K-LOGIC) that addresses the above-mentioned issues.K-LOGIC complements our previous and ongoing developments in space weather monitoring at the Geophysical Centre of the Royal Meteorological Institute (RMI) in Dourbes (50.1ºN,4.6ºE)[Jodogne and Stankov 2002, Stankov et al. 2004, Warnant et al. 2007a].As the proposed nowcast index does not strictly follow the calculation of the K index, as defined by Bartels et al. [1939], it is appropriate to called it a proxy K-type index.However, for convenience, it will henceforth be referred to as the K index.
This paper is organized as follows.First, the procedure for real-time K index calculation is presented in detail.Next, a description of the K-LOGIC system is given, together with a (preliminary) assessment of the system performance.Finally, all of the results are discussed in view of further developments and applications.

Algorithm for K-index calculation in real time
Following on the established principles of real-time computing [Mellichamp 1983, Heitmeyer andMandrioli 1996], the algorithm for K-index calculation in real time (Figure 1) was designed to be easily maintained and adaptable to different observatories, and to fulfill the following tasks: input data screening and cleaning, establishment of the SR variation curve, actual K-index calculation, and quality control of the data input and processing.Every single K value is assigned a quality flag, and depending on this flag and the purpose of using the index, the value can be either accepted or ignored.

Data screening
A common problem associated with digital magnetometer data acquisition is the presence of noise, data gaps and spurious non-physical data values (outliers/spikes).
Such anomalous records arise from various technicalities, the most frequent of which is the degradation in instrument precision due to external factors, e.g.problems with the electrical supply or «cultural noise».In magnetogram records from geomagnetic observatories, the gaps/outliers usually occur in isolation or in small clusters; however, there are cases when their occurrence is more frequent and for prolonged periods of time.The presence of outliers in data samples can lead to inflated error rates and/or substantial distortions of parameter and statistical estimates.
We deal with bad data by calculating the standard deviation of the latest 3-h-long data sample (1-min measurement records of the geomagnetic field components) that enters the algorithm, and only after removing the median diurnal variations.In our procedure, the number of standard deviations (sigma, v) from the local means that define a spike is set to 2.5v.The spikes are then discarded (no interpolation allowed), and their position is treated as a gap.Different limits were tested and it was found that (after comparison with post-processed K values using the FMI method) the imposed ±2.5v limit discards the majority of the bad data points while not rejecting naturally caused perturbations.A more complicated approach can include wavelet analysis that is directly applicable to magnetograms [Mendes et al. 2005].As a precaution, the maximal number of consecutive data points considered as a single spike event is set to 5 (i.e. in the time domain, given the 1-min resolution, a spike should not last longer than 5 min).In case of a longerlasting irregularity, this is treated as a sequence of spikes.Also, a hard-flag threshold for the maximal number of spikes is set to 10%.Nevertheless, these thresholds are not to be considered as the ultimate choice, as this depends on the error that can be tolerated in the final product.

Determination of the solar regular-variation curve
The establishment of the current 24-h solar regular variation (S R ) in a given geomagnetic field component is one of the key modules in the K-index calculation procedure.Obtaining the S R curve in real time by directly applying the FMI method is not possible because it requires data measurements from a variable-length time period that extends on both sides of the current moment in time.In the case of increased geomagnetic activity, the interval might extend for several hours into the future.There is a way around this obstacle: for example, to apply the method using data up to the immediate past, which obviously comes at the expense of accuracy.Alternatively, assuming that the SR curve is very slowly changing from day to day, it is possible to select the magnetically quietest days in the most recent month, and after calculating the monthly medians for each hour of the day, to derive an approximation that is suitable for use as a «quiet-day curve» [e.g.Takahashi et al. 2001].
We adopted a similar approach based on measurements from the previous 27-day period.The derivation of the SR curves will be demonstrated next, with actual measurements from the Dourbes observatory (see Figure 2).Figure 2a shows the raw 1-min-measurement values of the H component as recorded during the months of June and December, 2008.To eliminate the secular geomagnetic variation, the daily median value of the H component was calculated and then subtracted from the original records for each day, to obtain the relative deviations.Figure 2b shows the superposed 1-day plots of these relative deviations of the H component.Large perturbations are clearly visible, which are detected during disturbed magnetic conditions.However, patterns in the daily variations already appear, with generally lower values observed in the morning hours.Next, measurements from the quiet geomagnetic days only (i.e. from only those days when K<3) were selected, and the corresponding measurements were plotted after cleaning out the bad data (Figure 2c).The data dispersion is now clearly smaller and the diurnal trend is well established.The calculated median values (Figure 2d) are actually the anchor points of the S R curve we need to determine.Approximation (based on the medians) is needed if we want to obtain S R values at 1-min steps.The S R curve itself is obtained after the application of a simple polynomial approximation; however, Fourier transform, cubic spline, Chebyshev, or other approximations are possible alternatives.It is clear how the S R variations change with season, with the largest daily amplitudes observed in the summer.This requires that for a nowcast system, the established S R curve is updated on a regular basis, at least once a month.It should be noted that the S R curve patterns differ from station to station, especially when different stations are located at different magnetic latitudes.

K-index calculation
The actual K-index calculation is performed only after the data has been cleaned and the S R curve established from past records.Having determined the S R approximation values, the regular diurnal variations (represented by this approximation) are then subtracted from the corresponding 1-min measurements of the horizontal (H and D) components (Figure 2e).In this way, the calculated residuals are ready for determination of the ranges, i.e. the maximum value of the component minus its minimum value within every 3-h interval.After determination of the ranges, their conversion into a K-index value (Figure 2f ) is straightforward, given the limit-of-range-classes (LRC) table (Table 1).The K value at each step is assigned to the end of each window.A quick comparison with the definitive K (Kdef ) values is also provided (Figure 2g).We used a sliding window of 3 h, so a K-index value is obtained every 1 h, although each is based on data from the previous 3 h.The root-mean-square (RMS) error is around 1 KU, and is generally larger during geomagnetic storms.More details are presented in the performance-assessment section for the nowcast system.

Quality control
As mentioned above, our objective was to develop a nowcast service that can produce the K index for further use in higher-level products, such as alerts/warnings to be promptly dispatched to users when the K-index value suggests the presence of active/storm geomagnetic conditions.Hence, the Kdef value is not the (sole) purpose of the procedure presented here.Nevertheless, the procedure should be developed in such a way as to ensure the best possible outcome, i.e. a nowcast value as close as possible to the Kdef value.this perspective, it is necessary to have a quality control (QC) system that can immediately assess the input dataset quality and the nowcast output, i.e. the K-index value.Naturally, the immediate assessment of the K index is a challenge, because a highly reliable «close-to-definitive» value of K cannot be produced earlier than several hours (or even days) in the future.By then, however, the current nowcast value will be practically useless for the purpose of alerting the users.Therefore, we need to perform QC based on the information available up to the current moment only.
The key concept of the QC system implemented (see Table 2) is based on the expectation that a complete and sound dataset would provide the ideal platform for a top quality, reliable, closest-to-definitive value of the K index.
The QC system should reflect on both the total length of the data gaps (the shorter the gaps, or none, the better) and the time elapsed from the latest gap (the more distant the gap in the past, the better).The extent of data cleaning (treatment of the outliers, in particular) should also be included in the QC system.Here, the outliers are removed from the input dataset (note that an alternative approach is to try interpolation) and are, ultimately, treated as gaps.In this way, handling the outliers is translated into handling data gaps only, which substantially facilitates the QC system.In the nowcast algorithm, the first QC estimation follows the data screening (gap identification and removal of outliers).

K-index nowcast at a single observation site -system development and evaluation
A nowcast system (K-LOGIC) for real-time processing of geomagnetic measurements, estimation of the K index, and dissemination of the results has been developed.

System layout and development
The K-LOGIC system has a complex, modular design (see Figure 3) that is aimed at timely data acquisition, and fast production and distribution of the K-index value, while allowing for easy installation of further software developments.The key modules are the system control (including timing, quality, and communication), data acquisition (linked to the Instruments and Measurements Facility [IMF]), database data processing (input screening/cleaning, actual K-index calculation, post-processing), reference matrices (QC and LRC), communication display and dissemination.These modules are briefly outlined next.This is a time-controlled system (rather than an eventdriven system) that produces a regular output (the K-index value, data quality/processing evaluation, alerts) with a specific time resolution.The current update rate is set to 1 h; the data acquisition and pre-processing phases are started immediately after the hour mark.Theoretically, the time resolution can be increased up to 1 min (as we are using 1min-measurement records); however, such a high resolution is not needed at the moment because user applications that require such a rate have not as yet been identified.
The data used in this study come primarily from observations carried out at the geomagnetic observatory in Dourbes, Belgium (IAGA code: DOU).Variations in the fieldvector components and modulus are routinely recorded digitally.Observations are distributed via Intermagnet.Here we use 1-min-vector magnetic-field (H, D, and Z components) data as obtained directly from the instruments, i.e. not treated, which means that gaps and spikes are present in the input dataset.The precision is 1 s for time, and 0.1 nT for the field components.
Given the pre-determined time resolution of nowcast (currently set to 1 h), the data-acquisition module updates the data buffer with the measurements available since the previous run.After that, the data pre-processing starts.This involves identification of data gaps and spurious values, providing information for QC, and the updating of the buffer with clean data.The main task of the data-processing module is to compute the ranges (max-min) of the horizontal components, select the largest one, and produce the K-index value.All of the data-processing phases are monitored closely by the QC system, to immediately assess the quality of the data input and processing (based on the QC reference matrix).If the K value exceeds the threshold for a geomagnetic storm (K = 5) and the dataset used for calculating the K value is of high quality (QF >8), an alert is generated and sent to the user.These alerts represent some of the higher-level products that are generated by the system.Other products envisaged are the definitive K index and various statistical analyses that might be requested by the user.These tasks are performed in the post-processing unit.Finally, it should be noted all of the interactions with the user are not direct, but via the Communication Display and Dissemination (CDD) facility.In this way, the system is more flexible to respond adequately to the particular user needs.

System performance evaluation
The performance of the K-index nowcast has been tested in terms of: a) efficiency in identifying and handling gaps/spurious/noisy input data; and b) accuracy of the output K index achieved.The evaluation has covered various levels of geomagnetic activity, from regular to extreme.A preliminary assessment of the performance is presented here.
For evaluation purposes, the nowcast process was simulated by the feeding of original 1-min digital magnetogram records (in blocks of 60) into the operational system, and the monitoring of the production of the K index, together with the corresponding QC flag.Digital magnetometer measurements data from Dourbes are available from the year 2002 onwards, so we have had at our disposal an uninterrupted dataset for a period covering both high and low solar activity, which includes several geomagnetic storm events that are suitable for case studies.The nowcast-estimated index (Kest) was compared with the corresponding values of Kdef, to obtain the accuracy of the nowcast output.The Kdef used in the accuracy assessment was produced at the end of each year from manually cleaned magnetogram records using the method of adaptive smoothing [Jankowski et al. 1988, Nowozynski et al. 1991].we assess how the nowcast system handles the most frequently occurring technicalities.In retrospect, many of the problems would be relatively easy to correct, such as, for example, to remove spikes and/or to interpolate over gaps if their size is small.However, in real time, the task is obviously much more complicated.

K-TYPE GEOMAGNETIC INDEX NOWCAST
The data gaps are relatively easy to identify and process.Unfortunately, if a gap is longer than 3 h, or if there is no data from the previous 1 h (Figure 4a), there is not much that the system can do about this, apart from setting the quality flag to its lowest values, at 1 or 2. In this case, an index value is not produced (K is set to -1) until the data flow is recovered.The K-value production is resumed at the first opportunity; however, due to the missing data from the preceding couple of hours, the quality is initially poor.This is expected to gradually improve in time if there are no further gaps.
Figure 4b shows how the system processes the data during a rare technical problem that occurred in August, 2009, which affected the measurements for an extended period of 3 days.Since human intervention may not be immediately possible to fix a technical problem at any specific observatory, the procedure should be able to immediately assess the situation and handle it according to the objectives.In this case, there are no gaps, but the data is extremely noisy, with regular values that are frequently mixed with outliers of different magnitudes.The system succeeded in identifying all of outliers, but because of the pre-set threshold on the maximum number that can be removed from the input dataset, incorrect K values were produced.However, the QC flag immediately signaled the decrease in quality.Thus, the system ignores these values, and no alerts/warnings are sent to the users in such situations.
Again, it should be noted that the nowcast system is designed to operate autonomously, with little or no human intervention.In all cases, we rely on both the K value and the data-quality assessment to prepare alerts for ongoing disturbances/storms.The system performance has been satisfactory in all of the situations experienced so far.

Nowcast evaluation
The K-LOGIC nowcast has been evaluated during all levels of geomagnetic activity, including (major) geomagnetic storms (Figure 5).The results confirm that nowcast is sensitive to even slight magnetic disturbances, and responds adequately.The residual error during quiet and mildly disturbed geomagnetic conditions rarely exceeds 1 KU.However, the error appears to increase during major storm events, particularly in their initial phases.
We also calculated the residual error, i.e. the Kest minus the Kdef value, over the entire period of available instantaneous (raw) and definitive data (2002)(2003)(2004)(2005)(2006)(2007)(2008).The summarized results (Figure 6) show that almost 50% of the nowcast Kest values yield the correct K value, and more than 90% of the total nowcast Kest values are within 1 KU difference from the correct, Kdef value.The nowcast however, tends to overestimate low geomagnetic activity values, and in contrast, to underestimate the high end of the geomagnetic activity.
The average residual error is estimated at about 0.63 and the RMS error is about 0.95, over the entire period, i.e. both estimates are less than 1 KU.It should be noted that these errors may arise partly because Kdef is calculated once over 3 h (no overlapping of the time intervals), whereas Kest is calculated every 1 h (albeit based on the latest 3 h of data).Another contribution to the error from the use of QF = 6 as the lowest threshold for inclusion in the statistical analysis, i.e. there is a probability of input data gaps (in the 3-h intervals) as large as 25%.
The definition of the standard K index implies that the day-to-day variability of the diurnal variation should also be accounted for.As mentioned above, we assume that the diurnal variation slowly changes from day to day, effectively considering it constant over a month-long period.This is another reason behind the occurrence of large Kest-Kdef residuals that can exceeding 2 KU.There is clearly a tradeoff that might be necessary to accept when in need of real-time estimates, as it is not possible to directly apply the standard rules for K-index estimation.To further investigate the issue, we sorted the residuals with respect to season and local time.No particular seasonal or local-time distributions of the large Kest-Kdef residuals was observed (Figure 7).However; it is clear that during daytime, the nowcast tends to overestimate the K value.This overestimation is most substantial in the pre-noon hours, when the average differences reach about 0.8 KU.These overestimations persist through all of the seasons, and they can also be seen in the error histogram (Figure 6).

Discussion and conclusions
The space weather research community aims to develop reliable services for nowcasting and forecasting the ionospheric effects on the present-day technological systems (e.g. on GNSS-based applications), and a key component in these efforts is the nowcast and forecast of the geomagnetic activity.Also, a great variety of models rely on the traditional K indices as input parameters.However, the 3-h time interval between consecutive K-index computations is now  as a serious obstacle, as many phenomena are already known to have shorter time constants.As a result, these indices have been targeted for further, high timeresolution developments.We offer here a K value produced every 1 h using data from the preceding 3-h window.
As mentioned in the previous sections, a clean dataset is a pre-requisite for high-quality calculations of the K values.Therefore, the efficient handling of any eventual technicalities is very important.Some of the problems can be solved easier if there is data from an observatory that is in close proximity.We investigated the dataset from another Belgian observatory [Rasson 2007], Manhay (50.3ºN, 5.7ºE), and found that the simultaneous use of both datasets can improve the integrity of the nowcast service.The component ranges observed at these two stations were highly correlated, so if the main observatory used for nowcast (Dourbes) is experiencing a decrease in quality, the system can immediately verify (with the corresponding dataset from the other station) whether there is indeed a technical problem.Eventually, if the problem persists, it would be possible to replace the main input data with the corresponding data from the other station.This approach would guarantee the service continuity/availability.
Several methods of nowcasting and forecasting geomagnetic indices from space-based observations of the solar wind have been developed over the last decade.In particular, the methods of Andonov et al. [2004] and Kutiev et al. [2009] can be used to further improve the here-presented nowcast service and the alert module of the procedure.
Last, but not least, the QC flag (QCF) matrix can be elaborated further to include the assessment of the recent nowcast accuracy, based on comparison with the «close to definitive» K value, as soon as this value becomes available.

Figure 1 .
Figure 1.Flow chart of the algorithm for real-time calculation of the K index, with data quality control.

Figure 2 .
Figure 2. Calculations of the K index in real time demonstrated on 1-min digital magnetogram records from June, 2008 (left panels), and December, 2008 (right panels), for the Dourbes station.(a) Raw 1-min measurements of the H component.(b) H diurnal variations (relative, with daily medians removed).(c) H diurnal variations (relative, cleaned) during quiet days (K<3) only.(d) median (solid circles), mean (+) and standard deviations (vertical bars) of the diurnal variations (relative, quiet days).(e) Observed H component deviations from the SR curve (SR not shown: approximated through medians in (d)).(f ) Calculated K index.(g) Residual errors (Kres), i.e.Kest minus Kdef.

Figure 3 .
Figure 3. Flow chart of the K-index nowcast and alert system.

Figure 6 .
Figure 6.Evaluation of the K index nowcast for the period 2002-2008, for Dourbes.Left: Histogram of residual errors, as Kest minus Kdef.Right: Scatter plot of Kest versus Kdef.

Figure 7 .
Figure 7. Evaluation of the K index nowcast for the period 2002-2008, for Dourbes.Diurnal variations of the differences between Kest and Kdef for the three seasons (as indicated).The mean values for each hour are shown (+), with the corresponding standard deviations (vertical bars).

Table 2 .
Quality control flag (QCF) descriptions with respect to data input, screening and processing.