Validation strategy for satellite observations of tropospheric reactive gases

Satellite observations of tropospheric reactive gases are an integral part of the earth observing system but require continuous validation by independent measurements. For short-lived tropospheric species, the large variability in space and time results in specific challenges for validation, often combined with the scarcity of appropriate validation data. In this paper, the need for validation is discussed, previous work on validation of satellite observations is briefly reviewed, and the challenges and possible approaches for current and future validation networks are evaluated.


I. INTRODUCTION
ver the last two decades, satellite observations of tropospheric composition have become possible using nadir viewing spectrometers operating in the UV, visible, near infrared, and thermal infrared spectral range.Using measurements from instruments such as GOME, SCIAMACHY, OMI, GOME-2, IASI, TES, and MOPITT, global maps of the spatial distribution of many of the most important tropospheric reactive gases including O 3 , H 2 O, CO, NO 2 , SO 2 , HCHO, CHOCHO and BrO can be retrieved.These data for the first time provide a global observational view of tropospheric chemistry, and have led to important insights into the relevance of trace gas emission sources, the transport and chemical transformation of reactive species in the atmosphere and the temporal and spatial scales involved.They have also been applied in studies on air pollution, the assignment of emissions sources and strengths and their changes over time [e.g. Martin, 2008, Wagner et al., 2008, Burrows et al., 2011].With improving spatial resolution of the sensors and maturity of the retrieval algorithms, applications to chemical weather forecast and routine air quality monitoring will soon become possible.As any remote sensing observation, satellite data on tropospheric species needs to be validated using independent data with known and documented uncertainties, in order to understand and characterise the capabilities and uncertainties of the satellite measurement, assess data quality, and provide users with quality indicators enabling them to judge the fitness of the data for their purpose.For satellite observations of stratospheric composition, in particular for ozone, validation has been performed on a routine O basis for several decades, and in many cases, both the independent measurements and the methods applied are close to being mature.However, the situation for tropospheric species in general and in particular for the reactive trace gases is not as advanced, and in fact, many of the relevant data products in this field are still poorly validated.In this manuscript, an attempt is made to discuss the challenges and limitations of validating satellite measurements of reactive tropospheric gases, to review the achievements reached so far, to investigate possibilities to overcome the current limitations, and to formulate recommendations for a validation strategy for the current and future suite of European tropospheric space sensors.The discussion will mainly focus on data products from UV/vis nadir sounders but the concepts are also applicable to tropospheric NIR and TIR data sets II.THE CHALLENGE When trying to validate satellite observations of tropospheric reactive gases, a number of challenges become apparent that make this a more difficult task than expected.The first characteristic of reactive gases is their large variability in space and time.This is the direct result of their reactivity which leads to a short atmospheric lifetime and this, in combination with often strongly localised sources, results in highly varying atmospheric fields.This is true for both the horizontal and the vertical direction.For instance, species such as NO 2 are mainly residing in the boundary layer of polluted regions and will not even be well mixed within this layer.A large spatial variability in combination with atmospheric transport also leads to concentrations rapidly changing in time and as such poses a problem for validation as localised validation measurements are not representative for larger areas and time differences between satellite and validation meas-urement have to be small to ensure comparability.A direct result of the large variability is the presence of strong concentration gradients, which complicate validation as the exact position and time of validation measurement has a large impact on the result.In contrast, the satellite measurements usually average over larger areas, smoothing the gradients and not reproducing the variability of the reference measurement.The short atmospheric lifetime in combination with active photochemistry also leads to diurnal variations of the atmospheric concentrations of some species such as NO 2 , often enhanced by diurnal variations in emissions for example during rush hour or because of the higher probability of thunderstorms and lightning, in the afternoon.Again, this necessitates a good temporal coincidence of satellite and validation measurements.In contrast to the stratosphere, the spatial distribution of tropospheric species is strongly influenced by the distribution of emission sources such as cities, power plants, biogenic sources, wild fires, sea ice etc., which are very inhomogeneous and often in regions not well accessible.A validation network well representing the variability of atmospheric conditions therefore has to be wide spread, including also difficult to probe regions such as rain forests, cities or the polar sea ice region.The special characteristics of tropospheric retrievals also have important impacts on the validation needs.In most cases, the sensitivity of the retrieval varies strongly with altitude, usually with lowest values close to the surface where validation measurements are often located.As a result, the retrieval depends critically on a-priori data including the vertical profiles of the species of interest and of temperature, surface reflectance and emissivity, and also the presence of clouds and aerosols.For proper validation, ideally these input data will also have to be validated in order to be able to decide if any difference observed is linked to the measurement itself or to the ancillary data used.An additional challenge is the small signal often obtained for tropospheric species, either because their abundances are small or because it is difficult to separate the tropospheric from the stratospheric signals.In many cases, the validation measurements themselves are also not as accurate and precise for these small signals as one would like, adding the uncertainty of the validation data to that of the satellite measurement.Considering all the above points, an ideal validation measurement for tropospheric species should provide the vertical profile of the species at different times of the day, for all seasons, at good spatial sampling and covering an area typical for a satellite observation.It should also provide a good coverage of all atmospheric situations, have decent observation statistics and sufficient accuracy (bias, precision) and cover also the quantities needed as additional input in the retrievals.Unfortunately, the typical validation measurement falls short in one or even many of these aspects, and in some cases, there exists nearly no independent validation data to compare with.

III. CURRENT VALIDATION WORK
In spite of the difficulties, many studies have been performed validating tropospheric satellite products.These studies have been documented in the literature, for example in the SCIAMACHY book [Gottwald and Bovensmann, 2011], the ACCENT-AT2 book on Remote Sensing of Tropospheric Composition from Space [Burrows et al., 2011] and the AURA validation collection [Schoeberl et al., 2008], but also in many individual articles which are too numerous to be referenced here.Arguably the best situation exists for tropospheric ozone validation, where data from the ozone sonde network as well as lidar profiles mainly from the ground but also from aircraft and in-situ aircraft measurements can be applied [e.g.Verstraeten et al., 2013].While these data provide vertical profiles at relatively high frequency and good systematic error and precision, they still lack coverage in the southern hemisphere and at low latitudes, in spite of specific programs like SHADOZ [Thompson et al., 2007] which in recent years have increased the number of sonde stations in the tropics.For CO, in-situ aircraft observations are the main source of validation which is assisted by ground-based Fourier Transform Spectrometer (FTS) observations providing column data in cloud free conditions [e.g.Emmons et al., 2009, Kerzenmacher et al., 2012].Acquired at a limited number of stations set up initially for stratospheric monitoring, these data sets are sparse and usually do not include pollution hot spots or biomass burning areas.Also, regions with challenging retrieval conditions (dark surfaces) are not covered adequately.However, due to the relatively long lifetime of CO, coincidence criteria do not have to be very strict and satellite data can be linked to validation observations by using backward trajectories.For tropospheric NO 2 , different validation approaches have been taken.A large number of surface in-situ measurements of NO 2 are taken within national and local air quality networks.By using assumptions on the vertical distribution of NO 2 , these data can be converted to tropospheric columns to be used for validation of satellite measurements [e.g.Ordóñez et al., 2006, Boersma et al., 2009].While providing good statistics at least locally, such comparisons suffer from the large uncertainty introduced by the conversion from surface mixing ratios to columns, from the lack of accuracy of the instruments employed and from sampling issues when using extremely localised road side measurements.Tropospheric columns (and coarse vertical profiles) can be measured by Multi-Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) instruments from the ground, and these data are well suited for validation [e.g.Irie et al., 2008].However, the number of such stations is small and many of them are located in clean air regions.Although DOAS measurements average in the vertical and to a lesser degree also in the horizontal direction, their measurement volume in the troposphere is still much smaller than that of current space instruments.This lack of spatial representativeness and the fairly large uncertainty of individual observations is a problem for validation unless many instruments are distributed over a larger area.The validation of stratospheric columns, that are used to infer the tropospheric column from satellite total column data, can be done using zenith-sky twilight DOAS measurements [e.g.Ionov et al., 2008, Peters et al., 2012].The spatial variability has been addressed by airborne observations using both in-situ and DOAS remote sensing measurements [e.g.Heue et al., 2005, Martin et al., 2004] but these are very limited in number.Recently, the use of car mounted MAX-DOAS has also been demonstrated for validation [Shaiganfar et al., 2011], providing spatial coverage at low cost.Vertical profile information is currently mainly available from dedicated aircraft flights using in-situ instruments, complemented by MAX-DOAS which mainly resolves the lowest layers.For other absorbers such as BrO, SO 2 , HCHO, CHOCHO and IO, even less validation is available, and it is nearly completely limited to a small number of in-situ observations on the ground and in aircraft and to a few active and passive DOAS and FTS observations [e.g.Vigouroux et al., 2009, Heue et al., 2011, Choi et al., 2012, Dix et al., 2013].For SO 2 , a more extensive network of DOAS instruments has been set-up in the EC project NOVAC [Galle et al., 2009] for monitoring volcanic emissions, and this could possibly be used to validate satellite data.

IV. STRATEGY FOR THE FUTURE
Any strategy for the future has to address the gaps identified in the current validation activities.The requirements for a good validation strategy are simple -continue acquiring new data, go to the right places, take the right measurements at the right time, accumulate enough data, include validation of ancillary data and facilitate data access.Several actions can and should be taken to move into this direction.Most importantly, continued operation of existing networks such as the DOAS, MAX-DOAS, lidar, FTS, and ozone sonde networks needs to be secured and maintained.Unfortunately, many of the stations in these networks do not have secure funding and the number of measurements taken is currently declining, further limiting our ability to validate tropospheric satellite observations.As many of the stations are in clean air regions as was appropriate for their original purpose of upper atmospheric observations in the context of the Network for the Detection of Atmospheric Composition Change, NDACC (previously known as Network for the Detection of Stratospheric Change, NDSC) or the Global Atmospheric Watch programme of the World Meteorological Organisation, WMO-GAW, they should be complemented by stations in regions where validation of tropospheric species is needed, for example in pollution hot-spots, biomass burning regions, or areas with large biogenic emissions.Also, there is a lack of stations in the Southern Hemisphere.In some cases, existing networks such as WMO-GAW could be augmented by additional instrumentation such as MAX-DOAS instruments to make them (more) useful for validation while making use of existing infrastructure and experience.
One interesting possibility is the establishment of a small number of end-to-end reference sites (or so-called primary sites), dedicated to the validation of tropospheric data and the intermediate steps of their production.These stations should be equipped to provide high quality measurements of all the quantities needed for validation including the input quantities for the retrievals under different conditions.They should be operated having validation in mind by taking year-round measurements at the right times of day and with a perspective of long-term operation.By strategically placing them on different continents (US, Europe, Asia), such stations could be used to validate both Low Earth Orbit (LEO) satellites and the new generation of geostationary (GEO) observatories to be launched in the coming years.By ensuring that the same LEO satellites are validated by all stations, these validated LEO instruments could serve as transfer standards between the three GEO satellites which do not have overlapping measurements.While networks of stations provide good statistic and long-term validation, campaign based validation is essential for more detailed analysis.These campaigns should take place in regions close to relevant observations which are not yet validated, for example regions with large spatial gradients, pollution transport, biogenic emissions, shipping emissions, biomass burning, lightning, and bromine explosions.Sometimes, experiments of opportunity such as emission reductions for Olympic Games [e.g. Mijling et al., 2009] or changes in legislation affecting pollutant emissions [e.g.Kim et al., 2006] can also be performed.Often, such campaigns can combine different aims such as addressing a science question (e.g.TRANSBROM campaign [Krüger and Quack, 2013]), an instrument intercomparison (e.g.CINDI [Piters et al., 2012]) or a multi-platform experiment such as DISCOVER-AQ (http://discover-aq.larc.nasa.gov/)with validation, while other campaigns are fully dedicated to validation (e.g.SCIAVALUE [Fix et al., 2005]).However, it is essential to ensure that the needs for validation are addressed from the planning to the execution of measurements and finally the analysis of data.In this context, campaigns already planned by other groups can be used for validation purposes by adding instrumentation for example for column measurements or profiling and ensuring that validation aspects are taken into account when designing the measurement programme.In some cases, new developments with potential applications for validation can be supported.Examples are the recent construction of NO 2 sondes [Sluis et al., 2010] to be used in a similar way as O 3 sondes that can fill an important gap in the atmospheric observation system from ground.The Pandora systems of small and flexible remote sensing instruments [Herman et al, 2009] could be used to extend MAX-DOAS networks.Recent developments of highly sensitive Cavity Enhanced or Cavity Ring Down Spectroscopy (CRDS) instrumentation promise better detection limits for a number of species, and smaller and cheaper lidar and FTS instruments could facilitate larger numbers of observations albeit at reduced accuracy.A more radical approach could integrate a large number of small and cheap sensors, usually based on solid-state detectors for crowd measurements with low precision but excellent sampling statistics.A large potential for validation measurements lies in the use of existing platforms such as cars, trains, ships or commercial aircrafts.By mounting small and automated instruments to these vessels, good spatial coverage and statistics can be achieved without having to cover the large costs of vehicle operation.The usability of such platforms for validation measurements is limited by the constraints imposed by the platform opera-tors which are often not in line with validation requirements, but as an additional data source, such measurements can provide an important contribution.In addition to existing platforms, unconventional platforms can become very useful for validation measurements, for example ultralight aircraft, unmanned aircraft, zeppelins, tethered balloons or buoys in the ocean and sea ice.All these platforms have the potential to extend the range of validation measurements either vertically or to regions not usually accessible by other observations.One approach to validation which is not linked to independent measurements is the use of chemical data assimilation systems to assess the consistency of a satellite data set.As the data assimilation system implements the chemical and dynamical processes in the atmosphere, it can be used to detect internal biases, spatial offsets and temporal changes in the assimilated data sets.For example, degradation in instrument performance leading to bias in a data product can be pickedup in the quality control of a data assimilation system.This has been successfully used in the past for CO observations from MOPITT and IASI in the MACC assimilation system, and can potentially be extended to other species.In addition, the use of chemical transport models in general and data assimilation systems in particular can increase the number of co-locations usable for validation as the model effectively interpolates in time and space [e.g.Klonecki et al., 2012].Such techniques work best for long-lived species which are better constrained by transport and chemistry in the model, limiting the applicability of data assimilation for species such as NO 2 or BrO.Finally, comparisons with model data should always be used with care, and can only complement, not replace validation based on real observational data.As important as collecting new data is facilitating access to existing data.This includes data sets assembled for validation purposes as well as data from campaigns and observational networks.Ideally, all these data sets should be available for validation through a unified portal, providing a uniform interface and data protocol as well as consistent metadata and formats.Such activities are currently pursued in the context of several initiatives and programs including GEOMS (Generic Earth Finally, an essential part of a validation strategy is also to ensure the conservation of know-how.As space-borne missions and validation projects come and go, the team of validation scientists is constantly changing, and there is a component of periodic reinvention of the wheel as new generations of scientists encounter the challenges of satellite data validation.It is, therefore, important to provide consistent guidelines to validation groups including information on validation aims, reporting strategies, documentation, fitness for purpose considerations as well as display and interpretation of validation results.In addition, common language and approaches based on metrology should be enforced in validation for the treatment of vertical and horizontal resolution, the treatment of time mismatches, the correct use of appropriate terminology and standards, the differentiation between type A and type B errors, and error reporting in general.

V. CONCLUSIONS
Validation of satellite observations of tropospheric reactive trace gases is a challenging task, both because of the intrinsic variability in the atmospheric fields to be observed and because of the peculiarities of remote sensing of the troposphere.It is an on-going activity that needs continuous support for long-term measurements, campaigns, data analysis, and development of new capabilities.For tropospheric species in particular, it is crucial for validation measurements to have good and adaptive spatial coverage, coverage of the right quantities, and a combination of long-term observations and dedicated campaigns for specific atmospheric events.A multi-tiered approach is needed to improve on the current situation where there are by far not enough validation means available for tropospheric species.This can be only overcome by substantial investment in money and time to establish a more troposphere oriented and more comprehensive validations system.Otherwise, we will not have the infrastructure, people, and data needed for tropospheric validation of the many current and upcoming European missions such as GOME-2, IASI, Sentinel-5P, Sentinel-4, and Sentinel-5.