prediction and simulation of multiple monthly stream-flow series

The logarithms of monthly stream-flows are usually found to have a Normal distribution. Stream-flow series are auto-correlated up to a given time lag s. Moreover stream-flow series of the same region are cross correlated.

SUMMARY. -The logarithms of monthly stream-flows are usually found to have a Normal distribution. Stream-flow series are auto-correlated up to a given time lag s. Moreover stream-flow series of the same region are cross correlated.
Prediction and simulation can be accomplished by a similar technique. The method lias been used to reconstruct the missing data of the streams of the Emilia Romagna Region of Italy and to produce synthetic multivariate series there from.

INTRODUCTION
The present paper addresses itself to the following problems: 1 -reconstruction of missing data of monthly stream-flows at measuring stations; 2 -prediction of monthly stream-flows at several stations on the basis of past observations at the same stations; 3 -simulation of simultaneous monthly stream-flows at several stations All the above problems are solved by the same algorithm based on certain results of Normal Multivariate Analysis. Section 2 recalls the mathematical theory used. Section 3 formulates the problems in terms of the theory. Section 4 discusses the estimation of the parameters required in the theory. Finally section 5 discusses the results obtained in a case study concerning the streams of the Emilia-Romagna Region of Italy.

THE THEORY
One. makes use of the following theory: Given a Normal random vector Y with expectation rn and covariance matrix S if one partitions Y as follows: such partition induces the following partitions in m and S where Sn and S22 are the covariance matrices of Yi and 1' 2 and S12 = S'21 is the cross covariance matrix of the vectors Yi and Ys. (' means transposed).
The following result liolds (Anderson 1958): the conditional distribution of l*i given Y2 = is Normal with the following parameters: Where CM means covariance matrix.
is an optimal predictor for Yi in the usual A A sense. In fact if Yi (i), Yi(i) are the t lh elements of Yi, Yi respectively, one gets con.
It is also clear that in Bayesian terms the above result completely specifies the distribution of the states of nature so that an optimal predictor, which is optimal with respect to a general loss function, may be determined. Tlie series {Yi(t)} are both auto-correlated and cross correlated up to a given maximum time lag, *. Let us consider the Normal random vector: Y(<*) = (Yi(t*-s), Yi (<*-s +1), ..., Yi (i*+s) Y 2 (<*-s), r. («*-s + 1), ..., Y 2 (t*+s), ..., Y"(i*-s), r" (/*-s+1), ..., Y" (i*+s))' [5] Apparently the vector Y (t*) contains all the information stored in the series { Yi(i)}, ¿ = 1, 2, . . ., n, about missing data at time t*. Thus if the parameters, i.e. the expectation and the covariance matrix of Y(t*), are known, an optimal reconstruction of the missing data can be obtained, according to the theory of section 2. The estimation of the parameters is discussed in section 4. Let us note, now, that the same theory can be used to predict at time i* the values of the multiple series at time /* + l. In this case vector 1 will be: and one will seek the conditional expectations of Yi(i*+1), Y 2 (i* + 1), ..., Y B (t*+l) As to simulation, this is obtained by making, at any time step, a prediction on the basis of values already simulated and adding lo the predicted values a vector = of randomly generated residuals with mean 0 and a covariance matrix specified by formula [3]. Using a random number generator one obtains a vector r ; with the identity as covariance matrix. An s having a desired covariance matrix li is obtained as follows: s = IV; where I s is such that IM" = 15.

THE ESTIMATION OF THE PARAMETERS
The parameters one needs to know are {E Yi(t), i 1,2, <€/} and {cov (Y t (U), Yj(U)); i and j --1,2,..., »; U and UGI}. If a set of simultaneous observations at the measuring stations exists, these parameters can be estimated on the basis of certain assumptions concerning the expectation and covariance structure of the series.
The expectation of each series can be assumed to have annual periodicity so that the parameters are optimally estimated by the 12 monthly sample means.
As to the covariance structure, in general, cov (Y,(t), Yj(t+r)) = y (i, j, t, r) [7] An assumption of stationarity: cov (Y,(t), Yj(t+r)) = y (i, j, r) [8] is not realistic since it has been shown Chow 1972, Torelli 1973) that it fails when i=j i.e. in the univariate case. In this case Torelli and Chow have shown that the stationarity hypothesis holds for the standardized series where E Yi(t) and var (!',(<)) have annual periodicity and are estimated by the sample means and variances (*).
In other terms the above transformation eliminates seasonal effects in the covariance structure of the univariate series. It has been assumed by this writer that the above transformation eliminates seasonal effects also in the cross covariances of the multivariate series as well, so that: cov (Z, (t), (<+r)) = y (i, j, r) [101 It will be noted that in general y (h ), >') # y (h h ~ >') [ii] and that

y (i, h n = y (h h ->•) l.i-l
Given the advantages of the standardization, this transformation has been adopted. The y (i, j, r) are then consistently estimated by the proper sample covariances. It is then possible to reconstruct, making use of the results of section 2, the value of Zi (/*). The reconstructed values of Zt(t*) are easily transformed in the reconstructed values of Xt(t*).
(*) It must be noted the same results obtained by Torelli and Chow on the Sangamon R. at Monticello, III., have been found by Torelli using the same tests on a number of streams in Italy. (See foi instance Torelli, in press).

A CASE STUDY: EMILIA ROMAGNA
The method described in the previous sections has been tested in a study of the streams of the Emilia Romagna Region of Italy. In fact, this study has been carried out in the framework of Emilia-Romagna Regional Water Plan. It is based on the stream-flow data measured at the gauging stations of the Servizio Idr'ografico Italiano. These stations are 83 in number. The period of measurement at the various stations vary from 1 year to 18 years. By the method presented in this paper it is possible to obtain for all the sections a complete record from 1921 up to date. Alternatively simulated (synthetic) multiple series may be produced, which display the same statistical properties, expectation and covariance structure, of the measured series. The reconstructed series or the simulated series constitute the documentation on which water resource planning and hydraulic design may be based.
Reported here are the reconstructions of the data at stations 1, 2,4 (see Fig. 1), based on observations at station 3, in various periods. The results of the reconstruction are shown in Fig. 2, 3 and 4. The solid line refers to the observed values, the broken line to the reconstructed values and the dotted line to the standard deviation of the reconstruction. The values at time t* are reconstructed, using the observations at time i*-l, t*, i* + l at the guide station. Fig. 5 reports the duration curves of the observed and reconstructed series. To use the observations at time t*-2 and i*+ 2 as well does not seem to bring much improvement, given the low value of lag 2 covariances. The mean square error of the reconstruction at stations 1, 2 and 4 are 14.03,0.51 and 136.48, respectively. If one considers that the mean square deviations from the monthly mean are 63.09, 1.56, and 679.37, respectively, one may conclude that about 75% of the variance of the processes is accounted for. Fig. 6 shows the predicted values at station 2 based on past observations at the stations 3, 2, 3, 4. The solid, and broken lines have the same meaning as before. The poor quality of the prediction is due to the poor lag 1 correlation of the series. In fact y (i, j, 1) is about 0.3. This is due to the nature prevalently impermeable of the ground. However the predictor works considerably better of the monthly means (dotted line). The covariances at lags -1, 0, and 1 between the standardized value at stations 1, 2, 3 and 4, and the standardized value at station 3, are shown in Table 1. The Author wishes to thank IDEO. Ser. and IDROTECNECO for the permission to publish this article.