Noise modelling and estimation of hyperspectral data from airborne imaging spectrometers

The definition of noise models suitable for hyperspectral data is slightly different depending on whether whiskbroom or push-broom are dealt with. Focussing on the latter type (e.g., VIRS-200) the noise is intrinsically non-stationary in the raw digital counts. After calibration, i.e. removing the variability effects due to different gains and offsets of detectors, the noise will exhibit stationary statistics, at least spatially. Hence, separable 3D processes correlated across track (x), along track (y) and in the wavelength (λ), modelled as auto-regressive with GG statistics have been found to be adequate. Estimation of model parameters from the true data is accomplished through robust techniques relying on linear regressions calculated on scatter-plots of local statistics. An original procedure was devised to detect areas within the scatter-plot corresponding to statistically homogeneous pixels. Results on VIRS-200 data show that the noise is heavy-tailed (tails longer than those of a Gaussian PDF) and somewhat correlated along and across track by slightly different extents. Spectral correlation has been investigated as well and found to depend both on the sparseness (spectral sampling) and on the wavelength values of the bands that have been selected.


Introduction
Airborne imaging spectrometers exhibit characteristics somewhat different from those of conventional multi-spectral scanners (Landsat TM, SPOT, etc.).In particular, noise processes may be non-Gaussian, due to non-linearity of sensors and to pre-processing of raw data, and correlated, both spatially and spectrally, because of the mechanism of opto-electronic scanning.Hence, most analysis methods, conceived to effectively operate in the presence of Additive White Gaussian Noise (AWGN), may become inadequate to the present case.By exploiting a Probability Density Function (PDF) capable of representing also non-Gaussian distributions, auto-correlated non-Gaussian noise models may be described as a generalisation of auto-regressive models.The noise parameters, specifically variance, Correlation Coefficients (CCs), and shape factor of the Generalised Gaussian (GG) PDF are straightforwardly estimated from the observed data by means of robust statistical procedures based on multivariate regressions.
Concerning exploitation of noise analysis methods in application fields, the knowledge of the parametric noise model may be used to develop data-dependent spectral decompositions, either orthogonal (Principal Components Analysis-PCA), or non-orthogonal, like Projection Pursuit (PP) (Jimenez and Landgrebe, 1999) matched to the noise.Given the analytical, as well as statistical, intractability of highly dimensional data (Jimenez and Landgrebe, 1998), a reduction in dimension of the data space may be achieved by projecting hyperspectral pixel vectors onto a sub-space orthogonal to that of the noise, like in the Noise-Adjusted PCA (NAPCA) (Lee et al., 1990).
Spectral decompositions often represent a necessary pre-processing step in object detection applications.In fact, they may be used to remove the spatially variant signal due to the scene background (background removal).Such a step simplifies the derivation of the detection algorithm in that it can be designed to operate over the residual clutter that can be considered (almost) spatially stationary.Noise reduces the effectiveness of these techniques in a manner that depends both on the statistical distribution of its intensity and on its correlation properties.In this context, noise models whose parameters are adaptively tuned to the image sequences represent a valuable tool to analyze the effects of noise on the performance expected from background removal algorithms.
The parametric noise model is also useful to estimate the theoretical entropy bounds of reversible hyperspectral data compression methods, as well as to measure the spectral information content (Aiazzi et al., 2001), i.e. the information the sensor would yield in the case of an ideally noise-free process of sensing and digitisation of spectral radiance.
Experimental results of noise modelling and estimation, carried out both on simulated noisy images and on data acquired by the Visible In-fraRed Scanner (VIRS-200), are reported and discussed to corroborate the assumptions underlying the proposed model and to validate the calibration procedure (Barducci and Pippi, 2001) undergone by the raw data.

Noise modelling
This section will focus on estimation of noise variance from true images.Unlike coherent or systematic disturbances, which may occur in some kind of sensed data, the noise is assumed to be due to a fully stochastic process.Solutions will be devised for estimating the noise typically introduced by multispectral scanners and especially by hyperspectral imaging spectrometers.Let us assume for the noise an additive signal-independent model g(i, j, k)=f (i, j, k)+n (i, j, k) (2.1) in which g (i, j, k) is the recorded intensity at pixel position (i, j) of the kth spectral band and f (i, j, k) the kth component of the spectral reflectance at (i, j).Both g(i, j, k) and f (i, j, k) are regarded as non-stationary non-Gaussian (auto)correlated stochastic processes.The term n (i, j, k) is a zero-mean Gaussian process independent of f, stationary along (i, j) but not along k and correlated, both spatially and spectrally.The variance is σ 2 n (k), and the CC's are ρx and ρy, across and along the track, respectively, and ρλ in the spectral direction.The CCs are assumed to be constant throughout.
The variance of (2.1) can be easily calculated as thanks to the independence between radiance and noise components and to the spatial stationarity of the latter.From (2.2) it appears that σ 2 n (k) can be estimated by averaging σ 2 g (i, j, k) in homogeneous areas of the kth band, where σ 2 f (i, j, k)≡0 , by definition.The homogeneity requirement is crucial because the radiance components are generally correlated with one another, besides being spatially autocorrelated.Since an area may be homogeneous in one band, but not in another, in the following the analysis will be carried out on separate bands, notwithstanding they may exhibit noise components that are dependent on one another.For sake of clarity the index k will be omitted to indicate a generic band.

Estimation of noise variance and correlation
The standard deviation of the noisy observed band g(i, j) may be stated in homogeneous areas as σg(i, j)=σ n .
(3.1) Therefore, (3.1) yields an estimate of σn, namely σ t n , as the y-intercept of the horizontal regres- sion line drawn on the scatter-plot of σ t g versus µ t g , in which the symbol ˆ denotes estimated val- ues, and calculated only on pixels belonging to homogeneous areas.Although methods based on scatter-plots were devised more than one decade ago for speckle noise assessment (Lee and Hoppel, 1989), the crucial point is how to reliably identify homogeneous areas.To overcome the drawback of a usersupervised method, an automatic procedure was developed (Aiazzi et al., 1999a), based on the fact that each homogeneous area originates a cluster of scatter-points.All these clusters are aligned along a horizontal straight line having y-intercept equal to σn.Instead, the presence of signal edges and textures originates scatter-points spread throughout the plot.
The scatter-plot relative to the whole band may be regarded as the joint Probability Density Function (PDF) of estimated local standard deviation to estimated local mean.In the absence of any signal textures, e.g., the image is made up of uniform noisy patches, by assuming that the noise is stationary, the PDF will be given by the superposition of as many unimodal distributions as there are patches.Since the noise is independent of the signal, the measured variance does not depend on the underlying mean.Thus, all the above expectations will be aligned along a horizontal line.
The presence of textured areas will modify the «flaps» of the PDF, which will still exhibit aligned modes, or possibly a watershed.The idea is thresholding the PDF to identify a number of points belonging to the most homogeneous image areas, large enough to yield a statistically consistent estimate, and small enough to avoid comprising signal textures that, even if weak, might introduce a bias by excess.Now let us calculate the space-varying (auto) covariance of unity lag along either of the coordinate directions, say i, The term C f (i, j; 1, 0) on right-hand of (3.2), is identically zero in homogeneous areas, in which σg(i, j) becomes equal to σn.Thus, (3.2) becomes Hence, ρx, and analogously ρy, may be estimated from those points, lying on the covarianceto-variance scatter-plots, corresponding to homogeneous areas.
To avoid calculating the PDF, the following procedure was devised: 1. Within a (2m+1)×(2m+1) window sliding over the image calculate the local statistics of the noisy image to estimate its space-varying ensemble statistics: -Average gr (i, j)≡µ t g (i, j) -RMS deviation from the average, σ t g (i, j), with (3.5) -Cross-deviation from the average C t g(i, j;1,0), given by (3.6) 2. Draw the scatter-plots either of σ t g (i, j) versus µ t g (i, j), to estimate σn, or of C t g(i, j; 1, 0) versus σ t 2 g (i, j)⋅σ t 2 g (i+1, j), to estimate ρx. 3. Partition the scatter-plot planes into an L×L array of rectangular blocks.
The succession attains a steady value of the parameter after a number of terms that depends on the actual percentage of homogeneous points.The size of the partition and the stop criterion are non-crucial: a 100 ×100 array of blocks and stop after processing 2÷10% of points, depending on the degree of inhomogeneity of the scene.The size of the processing window, i.e. (2m+1)× ×(2m+1), is non-crucial if the noise is white.Otherwise a size 7×7 to 1 1 ×11 is recommended, since too small a size will underestimate the covariance (Aiazzi et al., 1999b).
It is noteworthy that, unlike most methods which rely on the assumption of white noise, the scatter-plot methods, besides being easily adjustable to deal with signal-dependent noise, can accurately measure also correlated noise, and thus have been preferred in this context.

White noise
Once the most homogeneous image pixels have been made available by the scatter-plot procedure, the PDF of the noise may be estimated (Aiazzi et al., 2002) by calculating the histogram of the following amount: (4.1) only on those pixels recognized to be homogeneous.The correcting factor under square root in (4.1) is easily calculated by imposing that the PDF has variance σ 2 n , since g−g t yields an estimate biased by defect of the noise variance, due to the statistical dependence of the noise components affecting g(i, j) and gr(i, j).

Correlated noise
When the noise is spatially correlated (4.1) no longer holds, because of the statistical dependence of n(i) from the previous noise samples.Let us assume for the stationary zeromean Gaussian noise a separable first-order Markov model, uniquely defined, in one dimension, by the ρ and the σ 2 n (4.2) in which εn(i) is a white Gaussian random process having variance .From (4.2) it stems that the noise covariance of lag m is . (4.3) If (4.2) and ( 4.3) are utilized to calculate the correlation of the noise affecting g and gr on a homogeneous window, the estimated noise at pixel (i, j) can be written as in which 2m+1 is the length of the sliding window.Notice that (4.4) reduces to (4.1) when the CCs are both zero.

Generalised Gaussian PDF
A model suitable for describing non-Gaussian noise may be achieved by varying the parameters ν (shape factor) and σ (standard deviation) of the Generalised Gaussian Density (GGD), which is defined as (4.5) in which (4.6) and Γ(⋅) is the Gamma function, i.e.Γ(z)= , z> 0. Since Γ(n)=(n−1)!, when ν=1 a Laplacian Law is obtained, while ν=2 yields a Gaussian distribution.As limit cases, for ν → 0, p(x) becomes an impulse function, yet having broad and flat tails and thus nonzero σ 2 variance, whereas for ν → ∞, p(x) approach- es a uniform distribution having variance σ 2 as well.The shape parameter ν rules the exponential rate of decay: the larger the ν the flatter the PDF; the smaller the ν, the more peaked the PDF. Figure 1a shows the trend of the GG function for different values of ν.
The matching between a GGD and the empirical data can be obtained either through an original method, recently developed by the authors (Aiazzi et al., 1999c), based on fitting the entropy of the modelled source to that of the empirical data, or by the simple method introduced by Mallat in 1989 and reinvented six years later (Sharifi and Leon-Garcia, 1995).This method substantially falls into the category of moment methods, being based on matching the moments of the data set with those of the assumed distribution, in order to find out its variance and shape factor.For a GGD, the ratio of mean absolute value to standard deviation is a steadily increasing function of the ν The entropy method (Aiazzi et al., 1999c) cannot be utilized around ν=2 because the entropy function takes its maximum in the Gaussian case. Figure 1b shows the entropy function and Mallat's function plotted versus the shape factor ν. As it appears, inversion of the former is not possible around ν=2; instead its slope is greater than that of Mallat's function for ν<1.Hence, the entropy method is more accurate than Mallat's method for ν<1.

Simulated data
The procedures for information assessment described in Aiazzi et al. (2001) will be firstly validated on true image data corrupted with simulated additive zero-mean Gaussian noise, both white and spatially correlated.Zero-mean Gaussian noise with σ 2 n = 400, either with ρx= = ρy= 0, or with ρx= ρy= 0.65, was superimposed to a grey-scale 8-bit image, referred to as Peppers.The test image and its noisy versions are portrayed in fig.2a-i.Table I summarizes the results obtained.

VIRS-200 data
The data set comprises a hyperspectral image collected by the Visible InfraRed Scanner   3a,b.The CCs of the noise were estimated by means of scatter-plots of local statistics calculated on 11×11 windows.This relatively large window size is dictated by the requirement of ergodicity, which allows unbiased statistics to be estimated from a single realisation of the random process.CCs are underestimated if the window is too small, because a large correlation may not be captured by a small window.On the   (Barducci and Pippi, 2001), since the sensors in the array moved along track exhibit different offsets and gains at other side, the window size should not be too large, to prevent textured pixels from being considered in the analysis.
A 2% of points was utilized in the VIS bands; 0.6% for the NIR bands, since a larger amount of pixels would inject textures (inhomogeneities) into the calculation of the PDFs.The binary mask of selected points is shown in fig.3c,d, for the VIS and NIR band, respectively.The CCs measured are ρ t x=0.50 and ρ t y=0.70 for the VIS band; ρ t x=0.55 and ρ t y= 0.65 for the NIR band.Similar values were found for the remaining 18 bands.Spectral correlation has been investigated as well and found to depend both on the sparseness (spectral sampling) and on the wavelength values of the bands that have been selected.Results are not reported here both for sake of conciseness, and as irrelevant for intra-band noise analysis.
The noise PDFs were calculated for the two sample bands and plotted in fig.4a,b together with generalised Gaussian PDFs having same variance and shape factor as the distribution of the noise samples estimated by means of (4.4).As it appears, the degree of matching of the theoretical and empirical distributions is impressive.The VIS band exhibits a hyper-Gaussian PDF (ν = 1.57), while the NIR band is approximately Laplacian (ν=1.18).In both cases tails are heavier than those of a Gaussian PDF (ν= 2).The noise variance is slightly higher in the NIR wavelengths than in the VIS wave-lengths, as expected from theory.After calibration, also aimed at removing the variability effects due to different gains and offsets of detectors, the noise exhibits statistics stationary in space, but not spectrally.Therefore, an analysis varying with wavelength of noise variance and shape factor was carried out.Figure 5a,b highlights how the two parameters, standard deviation and shape factor, change with the wavelength.The fact that noise tails becomes heavier as the wavelength increases is peculiar of push-broom scanners and mainly depends on the de-striping procedure.Ideally, Gaussian PDFs should be obtained, same as it happens for single-detector whisk-broom spectrometers, like AVIRIS (Aiazzi et al., 2001).The non-ideal equalisation of detectors' gains makes the noise to be slightly non-stationary.Hence, the superposition of many Gaussian PDFs having variances different from one another, originates a global PDF having tails longer than those of a Gaussian density.

Conclusions and developments
A robust procedure for noise estimation in a sequence of hyperspectral bands has been described and assessed.Noise PDFs and CCs in each spectral band are automatically calculated from the data.Knowledge of the noise PDF, especially if not heavy-tailed, may help exactly circumscribe the range of noise values, with the benefit that the loss of performance expected on application tasks and due to the noise can be accurately modelled and hence predicted to a large extent.
The procedure described in this paper could be profitably utilized to assess the calibration accuracy of the data, especially for de-striping, whose performance can be measured as Gaussianity of the noise in each spectral band.Eventually, spectral correlation of the noise, which is low for sparse bands, but can significantly grow when hundredths of bands are delivered by the on-board instrument, can be measured by devising a further bivariate regression between homogeneous pixels in a couple of consecutive bands.

Fig. 1a ,
Fig. 1a,b.a) unity-variance GG function plotted for several ν's; b) entropy function and Mallat's function of a unity-variance GG PDF as a function of the shape factor ν.
200), along the coast of Tuscany, in Central Italy.The image is constituted by 20 narrow bands (2.5 nm) out of 240 recorded at different wavelengths, programmable in the range 400÷1000 nm.The size of each image is 512 pixels across track.The raw data were originally 12 bit.All the data have been radiometrically calibrated, and thus expressed as radiance values, and packed in a 16-bit word.De-striping is particularly crucial

Table I .
True and estimated noise parameters for original and simulated noisy versions, white and correlated, of the test image Peppers.each wavelength.A couple of bands, one in the visible (VIS), another in the near-infrared (NIR) wavelengths are shown in fig.