Some remarks about lidar data preprocessing and different implementations of the gradient method for determining the aerosol layers

The determination of atmospheric aerosol layers from lidar returns is possible through automated algorithms. This product is useful, for example, in monitoring the Boundary Layer Height (BLH) as well as volcanic plumes. Aerosol layers are usually detected using the gradient method, i.e. by finding the inflection points of the range-corrected backscattered signal. These points can be either calculated as the minima of the numerical derivative of the signal, or by zero-order Digital Wavelet Transforms. Since for low signal-to-noise ratios the numerical derivative is very prone to noise-induced fluctuations, a moving average is often performed. We demonstrate why this procedure should be avoided in the gradient calculation. Finally, an alternative approach to the Digital Wavelet Transform is proposed, giving the same results but lowering the computational times by about one order of magnitude.


Introduction
During the last years both research lidar and ceilometers networks have been set up worldwide [Flentje et al. 2010b] to monitor the vertical structure of aerosol and clouds.More than 1000 ceilometers have been operating in Europe in 2012, producing a very huge amount of data.Although the raw signal contour plot for visual inspection is an important output of these systems, some quantities are calculated by automated algorithms (cloud bottom, cloud coverage).Among them, a very important quantity is the Boundary Layer Height (BLH), often also called Mixing Layer Height or Mixing Height, used as a first approximation of atmospheric dispersion capability.
On the other hand, the retrieval of the BLH is not straightforward since there are not direct methods for its measurement.A thorough discussion about the problems related to the determination of the BLH can be found in Seibert et al. [2000].Elastic lidars allow the de-termination of the BLH under some assumptions that will be discussed later on.At the basis of this method, however, there is the assumption that the aerosol is produced at ground and dispersed by turbulence within the PBL.
The BLH is one of the most important quantities related to the Atmospheric Boundary Layer, having many theoretical and practical applications.This quantity, in fact, is important in air pollution modelling, in aviation applications and evaluation of model performances.In spite of its importance, the definition of MLH is not straightforward.According to the COST710 Action final report, "the practical determination of the MLH, and sometimes even its definition, is not trivial, and there are many associated practical and theoretical problems".
For our purposes, the following definition, from Seibert et al. [2000] may be assumed: "The mixing height is the height of the layer adjacent to the ground over which pollutants or any constituents emitted within this layer or entrained into it become vertically dispersed by convection or mechanical turbulence within a time scale of about an hour".
Because of the importance of the BLH, many parameterization have been developed and implemented in Numerical Weather Prediction models [Arya 1981].In fact, the turbulence in the PBL is a sub-grid process that cannot be explicitly resolved by the NWP, and must be somehow parameterized.The experimental determination of the BLH is then important for the evaluation of the performances of the different parameterization employed in the numerical weather forecast models [Bei et al. 2010, Hu et al. 2010].
There are a few ways to identify the mixing layer

Subject classification:
Data processing, Volcano monitoring, Volcanic effects, Atmosphere: composition and structure, Instruments and techniques.
from aerosol elastic backscatter signals.Some employ the temporal variance of the signal [Hennemuth and Lammert 2006] or a threshold on the backscatter coefficient [Melfi 1985], but the most used criterion is to search for inflection points in the backscatter coefficient profile.This method is referred to as the gradient method, proposed initially by Endlich et al. [1979], and since then widely used to detect the MLH.In fact, the other methods are not suitable for automated detection by low power systems.Moreover, while the variance method performs well under convective regimes, the detection of stable layers becomes hard, as witnessed by the fact that most works intentionally refer only to the unstable boundary layer [Hennemuth and Lammert 2006, Lammert and Bosenberg 2006, Pal et al. 2010].As a consequence, the gradient method is the most used for MLH detection, and much effort is done to make it more efficient.Using the gradient method is also possible to infer the thickness of the entrainment zone [Flamant et al. 1997].Unfortunately, however, the main drawback of this method is given by the possible presence of multiple aerosol layers, since strong gradients are found at the top of each layer.Recent works [Haeffelin et al. 2011] have demonstrated, through an in depth comparison among different algorithms, that the major uncertainty comes from the attribution of the BLH to a layer than the determination of the layers itself.The best way to automatically attribute the BLH to the right layer is still under investigation [Angelini et al. 2009, Di Giuseppe et al. 2011, Haeffelin et al. 2011, Pal et al. 2013].It is worth to sketch quickly that the results of the gradient method differ from a system to another also depending on the wavelength employed.In fact, the relative contribution of aerosols and molecules strongly depend on the laser wavelength, as discussed for example in Haeffelin et al. [2011].In any case, the importance of the detection of all the aerosol layers is a key step in every gradient-based algorithm and should then be performed at better and in the most efficient way.Furthermore, the determination of aerosol layers is also useful in monitoring volcanic plumes, although because of their sporadic nature such observations are still less systematic than estimations of the BLH.Nevertheless, the development of ceilometer networks worldwide can make this technique very powerful in monitoring the status of volcanoes [Sassen et al. 2007, Flentje et al. 2010a].

Discussion of method and results
Strictly speaking, the gradient analysis should be performed on the backscatter coefficient, which represents the volume backscatter cross section of the total aerosol amount.However, this requires a calibration of the signal over the attenuated molecular backscattering profile.This is not convenient for automated retrievals and, moreover, the low power of the ceilometers coupled to the typical use of wavelengths in the infrared region does not allow this easily.Anyway, the logarithm of the Range-Corrected Signal (RCS) is a good alternative, being roughly proportional to the total backscatter coefficient, apart from the attenuation due to extinction.
The lidar equation can be written in the form: Where B represents the background signal and C the lidar constant, which takes into account a variety of factors (power of the emitted pulses, efficiency of the collection and detection system), and b m , b a represent the backscatter coefficient of molecules and aerosols, respectively, and T the atmospheric transmissivity given by: , being a the extinction coefficient, and where i represents either the molecules or the aerosols.The term O(z) should be introduced to take into account the fact that the overlap between the lidar beam and the telescope field of view is not usually uniform in space, but is instead a function of the distance.O(z) is typically zero in the close field, and reaches 1 at long distances when the laser enters completely in the receiver's field of view, and the full overlap is achieved.This means that almost all lidars are blind at very short distances (from tens to hundreds of meters), and hence the detection of very low layers is sometimes impossible.This represents often an insurmountable problem when shallow PBLs have to be detected, like during nighttime and under very stable atmospheric conditions in general.Anyway, these problems lie outside the purposes of this paper, and O(z) can be considered as a constant.In practice, we will consider only the lidar equation after the full overlap.
The range-corrected signal is usually calculated as: , so that the lidar equation becomes: Since T m is a very smooth function, (at 855 nm, for standard atmosphere, it equals 0.987 between ground and 4 km), no influence on gradients is expected and it can be neglected.Also T a may be generally neglected.In fact, while the backscatter coefficient may show rapid variations in case of superimposed layers, the transmissivity depends on the integral of the extinction coefficient, and usually it varies smoothly.Since we are concerned in detecting the gradients of the signal, such approach actually works well.As an example, we can consider a typical optical thickness of 0.1 at 870 nm: the transmissivity is 0.82 and hence, neglecting the correction, this leads to an underestimation of about 20% at the top of the layer.Moreover, T is a monotonic function and therefore neglecting it cannot introduce large errors in the detection of the inflection points.Hence, we can neglect the transmission coefficients, and we can write, assuming an exponential distribution of the molecular density, and if C is expressed in opportune units: where N 0 is the Loschmidt constant at standard conditions (i.e. the molecular density in air), H the scale height for molecular density (~ 8 km), and represents the Rayleigh backscatter volume cross section.
For aerosol free atmosphere, b a =0 and the equation above becomes: Since ~5 10 -33 m 2 sr -1 at 855 nm and N 0 ~2.7 10 25 m -3 , the first addend on the right becomes the of the order of −15, while within the boundary layer z/H spans from 0 to 0.5.
Hence, if expressed in RCS, the lidar attenuated molecular backscattering profile is approximately a straight line, with a small angular coefficient that makes ln(RCS) a good choice for gradient detection.Another interesting feature of such quantity is that the slope of the attenuated molecular backscattering profile does not depend on the wavelength of the system.On the contrary, the aerosol contribution of course does.In presence of aerosols (b a >0 and dependent on wavelength) the RCS is still greater than in the case of pure molecular echo, at least as long as the aerosol extinction can be neglected.
Though this representation of the lidar equation is not convenient to retrieve the aerosol backscatter and attenuation, it is useful for the application of the gradient method.
In spite of the definition, however, it should be noted that the determination of the inflection points by gradient method can be determined in a variety of ways, i.e. the direct numerical differentiation, the Digital Wavelet Transform (DWT) or even through best fitting procedures with Gaussian or error functions.
For systems having a limited signal-to-noise ratio, the direct differentiation of the lnRCS may introduce unrealistic gradients.For this reason, spatial and temporal averages are introduced to improve the signal-tonoise ratio.

Signal averaging
Both temporal and spatial averaging of the lidar signal are widely used to improve the signal-to-noise ratio.The problems related to this procedure is the lost of information of small-scale events, such turbulence or microphysics.Orlanski's classification of atmospheric processes [Orlanski 1975] showed a continuous cascade of spatial and temporal scaled from planetary waves to turbulence.The processes that involve the PBL are quoted in Table 1.
From Table 1 it is clear that different averaging will point out some patterns and will completely mask the processes at smaller scales.Hence, suitable averaging windows must be carefully exploited to highlight the processes investigated.The average can be performed either using moving average or a block average.Advantages and drawbacks of these approaches will be now discussed.
The running average may be employed to improve the signal-to-noise ratio of the lidar data without reducing the resolution.This is true only apparently, since the running average may introduce artifacts such spurious maxima and saturation plateau around peaks (Figure 1).Of course, the points averaged may be weighted by suitable filters (i.e.binomial, Gaussian), but in this case the effectiveness of the smoothing in removing noise lowers considerably.
The other procedure to increase the signal-to noise ratio consists in lowering the temporal resolution performing a block average over the profiles.Since the standard deviation of the background signal (SSD) can be thought as noise in lidar measurements, to understand the effect of temporal averaging on the spatial standard deviation of the signal between 6 and 7 km has been calculated for several temporal averaging intervals, spanning from 1 to 240 min.It has been found that with a 30 min-average, the SSD falls closer than 10% to its minimum value, and after 60 min, falls below 5%.These results are shown in Figure 2.
After 200 min the SSD raises slightly, likely because of changes in the overall status of the atmosphere.
In the spatial dimension, the signal is often smoothed with range-dependent windows (from 50 to 200 m) and then the direct differentiation is performed on smoothed data.
Anyway, it is possible to demonstrate that the moving average smoothing is completely not useful for gradient calculation, since the resulting data are as noisy as in the case of no smoothing.
To demonstrate that, let us consider a function f (k) the ±n-point window centered around the k-th point of the discrete domain of f.The smoothed function f ' is obtained as an average over the considered window: At the point k+1, the smoothed function is obtained as: Hence, the difference f '(k) − f (k+1) becomes: The gradient is then expressed by the difference of two points, far 2n+1 points one from the other.The first effect of this operation is that only one point is used for the gradient calculation, strongly limiting the effectiveness of the aveaging.Furthermore, the points used lay as far as the window width, causing the values of the gradient to become meaningless.This is visible in Figure 3, where the red solid curve shows the gradient of the function smoothed over a moving average, compared to the DWT and a third algorithm that will be described hereafter.

Wavelet algorithms
Algorithms based on the use wavelets are used to overcome this problem, being them especially effective for noisy data.These methods employ the convolution ( ). n f k j 2 1 1 1 The effect of running average on close peaks.Around 5 a spurious peak ('ghost') appears although the original signal showed a minimum in this zone.These effect may affect both temporal and spatial averages, for example when small clouds are detected.product of the function and a Haar wavelet (also known as D2, the 0-th order Daubechies wavelet) and have been abundantly described in the literature [Brooks 2003, de Haij et al. 2007, Morille et al. 2007].Anyway a short description of the WCT will be given.
The Haar wavelet is defined by the equation: It is characterized by a step-like appearance (Figure 4), and the convolution of such function with the signal shows maxima in correspondence of inflection points.The advantage of this technique is that, being based on an integral rather than a derivative, it is less sensitive to noise, and hence it is widely used to analyze low power lidar profiles.The dilation of the wavelet (i.e. the size of its non null part) can be varied within the characteristic scales of the signal variations, so that the average of these multiple wavelets allow the so-called "multi scale approach" which is useful for better discriminate the noise-induced inflection points.The dilation of the specific algorithm implemented spans from 90 to 360 m.The main drawback of this method is that no determination is possible in that region closer to the ground than the half dilation of the largest wavelet.For our algorithm, in particular, no layer detection is possible below 180 m, which is definitely too high for most nocturnal layers.Anyway, it must be noted that this problem usually arises in paral-lel to the lack of overlap in the close field and in many cases it does not extend really the blind zone of a lidar.
The DWT is then compared with a threshold, and all the maxima exceeding this threshold are stored.The same is done for the numerical derivative.In order to take into account a temporal consistency of the layers, three consecutive profiles are grouped (representing hence a lapse of time of 45 min), all their inflection points are joined together and sorted by increasing height.Then, the consecutive points whose distance exceeds a season-dependent threshold are considered belonging to different layers.Such clustering allows to detect the mean height and the thickness of each layer.This latter is intended as twice the standard deviation of  the heights of the points belonging to the layer.Since clusters containing only one point are excluded, this method lets one exclude some isolated spurious layers.The average values of the DWT or the first derivative maxima relative to all the points of each layer are also computed, since those may be included among the parameters to be considered for the attribution.Anyway, the computation of the DWT is quite time consuming, since the product of the wavelet for the function (even though only the window of finite values of the wavelet can be considered) must be calculated for all the points of the function.This requires a big amount of operations, although this is not usually prohibitive for modern computers.However, if a lot of profiles must be analyzed and a multiscale approach is adopted, a quicker algorithm is certainly desirable.
Here we will illustrate a different algorithm to calculate the gradient, which is as robust as the DWT, but decreases the computation time of about one order of magnitude.It will be referred to as DBA, since it is based on the Derivative of Box Averaged data.
Let us now consider the same window sizes used for the moving average and the multiscale approach in the DWT.
If the function f is now averaged over disjoint windows (the spatial resolution is reduced by a factor of 2n+1), the difference of the function between two contiguous points becomes similar to the Wavelet transform.The scheme is presented in Figure 4: if the function f is averaged over the two windows W1 and W2 and then the difference is computed, the result is the same of the convolution of the wavelet function H defined with the same dilation.
After the calculation of the gradient with the box average, an interpolation is necessary to retrieve the same resolution of the signal.This is also necessary to compare gradients at different dilations.Both linear interpolation and nearest neighbor approximation have been used, and the results do not change appreciably.At the end, as usually done with DWT, the average of the different windowed gradients is calculated.This multiscale approach helps refining the calculation of the gradient maxima, and almost the same results of the DWT are obtained.
To evaluate the usefulness of the method, the algorithm has been implemented in Matlab environment, and the computation times of both techniques have been considered.For the dilations used (15, 18, 21, 24 points) the ratio between the average computation times was about 10.This means that 'jumping' the moving window, especially at larger scales, does not worsen the results of the wavelet transform.
The computation times of the DWT and the DBA for 8 whole days of measurements, consisting in 96 different ceilometers profiles per day have been calculated.The ratio of the mean calculation times is about 10.If the smallest dilations are discarded, this ratio raises again, since the larger the windows the less calculations are required for the block average.

Conclusions
The lidar Range-Corrected Signal (RCS) is widely employed in automated analysis for the detection of the atmospheric Boundary Layer Height (BLH), and, more in general, for detection of aerosol layers, including clouds.In this paper it has been demonstrated that ln(RCS) is a convenient quantity for the application of the gradient method, since in absence of aerosol it behaves as a straight line with small slope.The unhelpfulness of moving average has been demonstrated in gradient detection, since the cancellation of most terms makes the derivative extremely noisy and poorly meaningful.Finally, an algorithm based on the Derivative of Block-Averaged data (DBA) for aerosol layer detection is compared with the classical Digital Wavelet Transform.This algorithm is built starting from the strong analogy between the derivative of block-averaged data and the Haar wavelet transform, but allows to save about the 90% of operations when detecting the inflection points: this aspect is helpful in the automated processing of large datasets.The DBA algorithm can be applied to other kinds of analysis, wherever saving computational time is often mandatory.The ratio between the mean computation times is 10.91 for the first case and 10.56 for the latter.This is dependent on the dilations used, since the time save for larger windows is higher than for small windows.

Figure 2 .
Figure 2. Effects of time averaging on the noise.At 30 (60) min.the 90% (95%) of the lowest standard deviation on the signal is found.The SSD is the average of seven different profiles obtained throughout the day, for different days.
Figure 3. Upper panel: Signal (green) and smoothed signals using different window sizes.Bottom panel: The gradient of the signal as calculated bt DWT, Derivative on Moving Average and Derivative on Block Average.In all cases, the multi-scale approach was the same, spanning from 9 (15) to 24 points of dilation with an increment of 3 each time.Each resulting curve is the average of the six profiles obtained at different dilation, as explained for example in deHaij et al. [2007].The vertical lines represent the top of the aerosol layers detected when the functions in the bottom panel exceed a threshold at its maxima.The threshold in this case is set at 0.05.

Figure 4 .
Figure 4.The effect of the block average over the windows W1 and W2 and the DWT on the function f.

Figure 5 .
Figure5.Computation times, in seconds, for multi scale DWT and DBA, using linear interpolation (green) and 'nearest neighbor approximation' (red).The ratio between the mean computation times is 10.91 for the first case and 10.56 for the latter.This is dependent on the dilations used, since the time save for larger windows is higher than for small windows.

Table 1 .
THE APPLICATION OF GRADIENT METHOD TO LIDAR DATA Scales Orlanski [1975]processes involved in the PBL.Adapted fromOrlanski [1975].