Application of an ensemble earthquake rate model in Italy, considering seismic catalogs and fault moment release

We develop an ensemble earthquake rate model that provides spatially variable time-independent (Poisson) long-term annual occurrence rates of seismic events throughout Italy, for magnitude bin of 0.1 units from M w ≥ 4.5 in spatial cells of 0.1° × 0.1°. We weighed seismic activity rates of smoothed seismicity and fault-based inputs to build our earthquake rupture forecast model, merging it into a single ensemble model. Both inputs adopt a tapered Gutenberg-Richter relation with a single b -value and a single corner magnitude estimated by earthquakes catalog. The spatial smoothed seismicity was obtained using the classical kernel smoothing method with the inclusion of magnitude dependent completeness periods applied to the Historical (CPTI15) and Instrumental seismic catalogs. For each seismogenic source provided by the Database of the Individual Seismogenic Sources (DISS), we computed the annual rate of the events above M w 4.5, assuming that the seismic moments of the earthquakes generated by each fault are distributed according to the tapered Gutenberg - Richter relation with the same parameters of the smoothed seismicity models. Comparing seismic annual rates of the catalogs with those of the seismogenic sources, we realized that there is a good agreement between these rates in Central Apennines zones, whereas the seismogenic rates are higher than those of the catalogs in the north east and south of Italy. We also tested our model against the strong Italian earthquakes ( M w 5.5+), in order to check if the total number ( N -test) and the spatial distribution ( S -test) of these events was compatible with our model, obtaining good results, i.e. high p -values in the test. The final model will be a branch of the new Italian seismic hazard map.


Introduction
Probabilistic Seismic Hazard Analysis (PSHA), quantifying the likelihood that ground shaking will exceed certain interesting engineering quantity in a given period of time, provides to society important information on construction standards for risk mitigation. The PSHA models are largely based on the methods of Cornell [1968].
For PSHA, these earthquake rate models are combined with Ground Motion Prediction Equations (GMPEs) to compute the hazard curve for a specific ground motion intensity measure over a given period of time.
In recent years, especially in the most earthquake-prone areas of the world, including, for example, California, Europe, and Italy, a growing number of seismic hazard models have been based on information from tectonics and active faulting The PSHA models currently applied in California [Field et al., 2014] combine fault-based Earthquake Rupture Forecasts (ERFs) with GMPEs to estimate the probability of exceeding a specified shaking intensity, such as PGA (Peak Ground Acceleration). However, empirical ERFs rely on many uncertain assumptions and are difficult to test observationally, due to the long recurrence times of large earthquakes. For the California region, Hiemer et al. [2013] built an alternative stochastic earthquake source model by applying the kernel density estimation technique to both past seismicity and fault moment release, with the latter estimated on the basis of slip rates on mapped active fault structures.
Subsequently, Hiemer et al. [2014] implemented this approach on a European scale for harmonizing seismic hazard, as a result of the large-scale community effort made within the European Union project SHARE [Seismic Hazard hARmonization in Europe, Woessner et al., 2015]. They used the European Database of Seismogenic Faults [EDSF; Basili et al., 2013], which includes both crustal faults and subduction zones of the Calabrian, Hellenic and Cyprus arcs. They also used an earthquake catalog that combines macroseismic data and instrumental seismological data for the period 1000-2006. For PSHA in Italy, Valentini et al. [2017] built a seismic model integrating the distributed seismicity model with the fault source model.
Similarly, to this last work, we developed a model that extends the classical kernel-smoothing method, used for spatial event distribution [Frankel, 1995] with correction for time-varying completeness magnitude [Hiemer et al., 2014]. The model also includes geological information: the annual moment rate on the mapped faults based on their deformation rates. A spatial uniform distribution of the b-value and the corner magnitude was assumed for the whole Italian country. We called our approach, that merges two different seismicity models in one single model, "ensemble model" [Marzocchi et al., 2012].
The ensemble model allows to merge, in an appropriate way, the seismic rates obtained from three different databases into a single source model in order to cover the entire Italian area, as shown in the following sections.
Finally, we tested our model using the statistical tests for the total number of forecasted events (N-test) and their spatial distribution (S-test) to verify their its compatibility with the Italian seismic catalog .
The model presented in this study was developed with the aim of being applied as a branch of the logic tree to be used for the new Italian seismic hazard map (called the MPS19 project) [Meletti et al., 2019b].

Input Data
Three different seismicity datasets are considered in order to cover much of Italy and to provide a longer time span: Italian Parametric Earthquake Catalog (CPTI15) that spans from 1000 to 2014, Instrumental seismic data (from 1981 to April 30, 2017) [Gasperini et al., 2013;Lolli et al., 2014 and2015] and Database of Individual Seismogenic Sources version 3.2.1 [DISS Working Group, 2018]. We excluded from our analysis a portion of the CPTI15 catalog, spanning from 1981 to 2014, as for the following period (1981( -April 30, 2017 we used the Instrumental seismic catalog joining them together. The latter catalog was used for its lower completeness magnitude as explained in the next section. In the analysis, only the shallow seismicity within the area called "areaCPTI15" (Figure 1 and Figure 2] was considered. A depth <= 30 km is selected for both seismic catalogs, as in this depth range there have been the most common damaging earthquakes in Italy. We excluded the Etna volcanic zone and the earthquakes of the Southern Tyrrhenian subduction as treated by two others working groups of the MPS19 project. These two models will also be branches of the logic tree to be used for updating the new hazard map.

Seismic Catalogs
In this work we combine two different catalogs to obtain a longer and a more complete catalog.
The first one is the updated and revised version of the Parametric Catalog of Italian Earthquakes [CPTI15, release 1.5, Rovida et al., 2016], released for the participants of the MPS19 project together with the Italian Macroseismic  Albini et al., 2001 andAlbini et al., 2002] and Statistical Analysis of the Completeness (SAC) [Albarello et al., 2001]. Two results are therefore obtained, depending on the chosen completeness criterion. These time-intervals of completeness have been computed for the six Italian sub-regions analyzed in the study and shown in Table S1a,b of the Supporting Material [Meletti et al., 2019a]; where it is possible to find all information regarding the completeness time intervals for each of the two sets (section S1).
The second dataset is an Instrumental catalog that covers the period 1981 -30 April 2017, and includes small events with completeness magnitude of M w 3.0 (furnished by the authors of the catalog) that allows a more uniform spatial coverage compared to the CPTI15 database. The use of smaller events can improve forecasting performance [Helmstetter et al., 2007;Schorlemmer et al., 2010]. Both the CPTI15 and the Instrumental catalog use the moment magnitude M w , some of these obtained by conversion from macroseismic intensities (CPTI15) and some others from local magnitude, M L [Instrumental catalog, Gasperini et al., 2013]. For the purposes of our PSHA, both catalogs (CPTI15 and Instrumental) were declustered with the Gardner and Knopoff method [1974] and the events relative to the Etna area were removed (the Etna volcano area was treated differently for the new Italian map of seismic hazard).
The use of alternative completeness criteria (HAC and SAC) leads to two different final catalogs, named through the manuscript with two acronyms as CAT_H, for the CPTI15 catalog according to HAC + Instrumental Catalog 3 Ensemble earthquake rate model in Italy Figure 1. Seismicity map obtained by combining the declustered CPTI15 and Instrumental catalogs. In this selection of seismic data, the CPTI15 according to HAC (1,267 events from 1000 to 1980, depth <=30 km) was considered. The CPTI15 is shown with blue and black colors (blue dots and black squares indicate earthquakes from M w 3.845+ and from M w 5.5+, respectively). The Instrumental Catalog (1981-April 30, 2017) (2,560 seismic events with M w 3.0+ and depth <=30 km) is indicated with green and red colors (green dots and red squares indicate earthquakes from M w 3.0+ and from M w 5.5+, respectively). The black polygon shows the investigated area, called the "areaCPTI15".

The Italian Seismogenic Sources Database
The Italian Seismogenic Sources Database version 3.2.1 [DISS Working Group, 2018] is a georeferenced 3D repository of tectonic faults, identified through geological, geophysical, geodetic, geomorphological, seismological data and macroseismic intensity investigations [Basili et al., 2008].
In our study, we will focus on 124 Individual Seismogenic Sources (ISS) from the DISS Database that are characterized by a full set of geometric (strike, dip, length, width and depth), kinematic (rake) and seismological parameters (single event displacement, magnitude, slip rate, recurrence interval) ( Figure 2). The following information is considered for each fault: location, dimension, depth and slip rate.
The slip rate (mm/year) is a parameter with the highest degree of uncertainty for most sources reported in DISS.
For our analysis, since there is no evidence for a preferred slip rate value (between the minimum and the maximum reported in DISS), we used the mean slip rate.

Smoothed seismicity rate model
Our model starts from a minimum magnitude M w 4.5, for magnitude bin of 0.1 units in spatial cells of 0.1° x 0.1°, because in the Italian region the earthquakes with magnitude smaller than M w 4.5 have a negligible contribution in PSHA; the only exception is the Ischia island, which is treated in a different manner in the context of MPS19 project [Meletti et al., 2019b].
To estimate the spatial distribution of seismicity we adopt a smoothed seismicity approach similar to that of Hiemer et al. [2014] but based on an isotropic Gaussian kernel [Frankel, 1995].

Ensemble earthquake rate model in Italy
The spatial density in each grid cell, R i , is defined as follows: (1) where N is the total number of events in the catalog and is the weight of the smoothed seismicity with a Gaussian kernel and the correction for the values of the different completeness magnitudes [Hiemer et al., 2014], defined by: (2) where: dist i,j is the distance between the centre of i-th cell and the j-th event; is the smoothing (or correlation) distance; the b parameter is the b-value of the Gutenberg-Richter (G-R) law [1944] and will be successively described; M c,j is the magnitude of completeness, T c is the length of the corresponding completeness period and M c-min is the lowest among the completeness magnitudes (M w 3.0 in our case). The correction in the last part of equation (2), , allows the temporal and spatial variability of the completeness magnitude in a seismic catalog to be taken into account appropriately (e.g. for longer historical catalogs), while the classical method of Frankel [1995] allows only a constant completeness magnitude (e.g. for shorter instrumental catalogs).
For each spatial cell, we normalized the R i values through the equation: ( 3) where R represents the final density of the spatial model and is the total number of cells. We also consider, for the spatial cell of 0.1° x 0.1°, the difference in km of the longitude over the latitude (N-S) span of Italy, which is a function of cos(latitude).
The final density obtained in each spatial cell by using equations (1), (2) and (3) is the bivariate PDF of the seismicity; i.e. the spatial distribution of seismicity without considering the total number of events or the magnitude distribution of events.
In order to assign a minimum value to all areas of low or null seismicity, where it is not excluded that a future earthquake may occur, we also distributed on all cells a value corresponding to 1% of the total, divided by the total number of cells (surprise coefficient), as in Kagan and Jackson [2000].
To estimate the only free parameter we use the classical Maximum Likelihood Estimation (MLE) approach. As we use a long-time catalog with time-varying completeness magnitude, we consider odd and even events as training and testing dataset (instead of the classical division in the first-half and second-half of the catalog) to have a uniform (with respect to time) distribution of events. We compute the log-likelihood of the spatial model given the events in the testing dataset, assuming a Poisson distribution for the number of events, for a set of possible smoothing distances (0 to 100 km) and then we take the maximum value obtained [Hiemer et al., 2014]. We perform two different MLEs, with odd events as training and even events as testing, and vice versa, after which we take as final value the mean value of the two values obtained. The smoothing correlation distance from CPTI15 + Instrumental catalog, according to HAC, is equal to 20.5 km (the value obtained for SAC is equal to 23 km). We use a minimum magnitude M w 3.0 for the estimation of the spatial distribution, on the contrary we use a minimum magnitude M w 4.5 for the estimation of the magnitude frequency distribution.
The rate value obtained from the spatial smoothed seismicity is scaled in magnitude by using a tapered Pareto G-R distribution [Kagan and Jackson, 2000;Kagan, 2002], which has a cumulative distribution function (CDF) ( ) equal to: where is the scalar seismic moment, measured in Newton·m (Nm), is the threshold seismic moment for the catalog completeness, is the parameter that controls the distribution in the upper ranges of M ("upper corner ( ) = 1 -moment", or "corner magnitude" for M w ) and represents the slope of the distribution [b-value = ³/₂ , formula by Bird and Kagan, 2004]. The considered tapered Pareto G-R distribution is characterized by its gradual rather than steep decrease in frequency at the upper cut-off moment and provides a better fit with both fundamental seismological principles and earthquake catalogs [Kagan, 2002;Bird and Kagan, 2004]. We prefer to use this tapered version of the G-R instead of the classical truncated G-R relation also because the maximum magnitude is one of the most difficult and problematic parameters to estimate [Zӧller and Holschneider, 2016]. In our estimation we assume that the parameters of the tapered G-R are spatially uniform, i.e. an estimation on the whole Italian region holds in each 0.1° x 0.1° spatial cell, and we also assume that the Italian seismic catalog is long enough to constrain these parameters [Akinci et al., 2018for Italy, Helmstetter et al., 2007. Because this distribution is valid for the scalar seismic moment, we converted the moment magnitude, M w , with the Hanks and Kanamori [1979] formula, in terms of Nm, to estimate the scalar seismic moment, M: = 10 1.5 + 9.05 (5) At the same time, we estimate the parameter, the corner seismic moment and the annual rate with the Weichert [1980] method, considering a tapered Pareto Gutenberg-Richter distribution, and applying this method with the maximum likelihood approach to the declustered Italian combined catalog (CPTI15 + Instrumental), starting from magnitude M w 4.5 and using a uniform temporal completeness for all the completeness sub-regions (except for "Outside zone", which has a temporal completeness too short compared to the other zones, see Table   S1a,b and Figure S1 in the Supporting Material).
We obtain, for the declustered catalog with the modified Wiechert method considering HAC, an annual seismic rate of 5.68 events with M w ≥ 4.5, a b-value equal to 0.99 ( = 0.66) and a corner magnitude equal to M w 7.3: in the Appendix A we show more mathematical details and the estimation of the uncertainty associated with these parameters. Maura Murru et al.
As far as the spatial distribution of events between HAC and SAC approach is concerned, it can be noted that even if the overall spatial distributions are quite similar, some differences are visible in particular areas. To illustrate the discrepancy in these areas, in Figure 4 we map the difference between the two models that use different completeness approaches. The color scale limits range from -1 and 1, that is between -100% and 100%. For example, in the Eastern Po Plain and Northern Sardinia, the cumulative annual rate for HAC is significantly higher, by between 40% and 80% (shown in orange and red), than the one for SAC. The opposite is the case for example in the Northern Adriatic Sea where the HAC rate is about 60% lower than the SAC rate (indicated in blue on the map in Figure 4).
The spatial difference evidenced between the two maps in Figure 4 is mainly due to the smoothing approach that we adopted [Hiemer et al., 2014]: the smoothing kernel in equation (2) weights the events according to their range of completeness. Adopting HAC or SAC can change the number of the strongest events in the catalog. As a result, the spatial distribution of the smoothed seismicity models according to HAC or SAC can change significantly in the epicentral area of these strongest events. This is a critical factor for all the smoothed seismicity models based on historical catalogs. Conversely the total number of estimated events using the two approaches, HAC or SAC (respectively 5.68 and 6.02 per year) have minor influence on the smoothed seismicity model.

Seismogenic source rate model
We assume that seismic moments (M) of all earthquakes are distributed according to the tapered Pareto distribution (see equation (4) in the text) [Kagan and Jackson, 2000, equation (10); Kagan, 2002, equation (11)] also for the seismogenic sources. In the computation of the slip rate it is not possible to distinguish the contribution to the slip rate given by mainshocks, aftershocks or foreshocks. The slip rate can be calculated from geodetic measurements or geological observations and in both cases it is linked to the cumulative effects of all earthquakes.
We assume that this cumulative effect can be associated with a complete (i.e. not declustered) seismic catalog. The parameters of the tapered Pareto distribution for all seismic sources are computed with the same Weichert method used in the previous paragraph, but considering the complete (undeclustered) catalogs. We assume that the parameters estimated for the whole Italian region hold for each singular seismogenic source. Differently from the classical PSHA studies, the corner magnitude of each seismogenic source does not depend on the fault area, but is the same for all sources, independently from their dimension. We made this strong assumption to avoid dangerous underestimation of the corner magnitude: if an earthquake brakes two or more adjacent segments the magnitude of the event will be greater than the maximum earthquake allowed for the single segment [e.g. the 2016 Kaikoura, New Zealand, earthquake breaks more than 10 adjacent segments, Ulrich et al., 2019].
The annual long-term tectonic moment rate, M ta , for each seismogenic source is given by the annual number of events with moment greater than zero(D) multiplied by average seismic moment: then D is equal to: where ∫ 0 is equal average seismic moment, is the Probability Density Function (PDF) and M ta is determined for each source by:

Ensemble earthquake rate model in Italy
where μ is the shear modulus (shear stress/shear strain), assumed to be equal to 30 GPa. The total moment rate obtained from all the seismogenic sources is 3,45E+17 Nm/yr.
In our analysis we assume, in line with the calculation of the 2013 European Seismic Hazard Map [e.g. Woessner et al., 2015], the full seismic coupling over the entire study region, i.e. that the whole tectonic and geodetic moment goes into seismic moment. This also entails that any tectonic deformation above the brittle/ductile transition results in earthquake release and not in aseismic frictional sliding. This can result in a possible overestimation of the earthquake rates in some seismogenic sources. Only some works referring to Italian certain areas (Pollino seismic gap, Southern Italy, and Altotiberina normal fault in the northern Apennines) exist with a documented aseismic movement [Cheloni et al., 2017;Gulandi et al., 2017].
In tectonic areas undergoing compression the aseismic movement is not present [Bird and Kagan, 2004;Bird et al., 2009;Howe and Bird, 2010].
To determine the value of the D, we need to solve the integral of equation (7), which represents the scalar seismic moment released by earthquakes; the resolution of which after several mathematical steps is: where is the minimum seismic moment, is the upper corner moment and Γ is the Gamma function [Kagan, 2002].
The mathematical development of this integral is described in the Supporting Material as section S4.
The obtained seismic rate value for each seismogenic source and magnitude bin is subdivided between the number of spatial cells incorporating the seismogenic source.
The cumulative undeclustered (complete) annual rate for the DISS Database, considering the average slip rate, is 4.85 events for M w 4.5+, excluding seismogenic sources that do not fall within the Italian territory. Since the process must be Poissonian for PSHA purpose, the undeclustered rates obtained for each seismogenic source for each magnitude bin starting from M w 4.5+/-0.05 (the initial value for our seismic model), were corrected using the method proposed by Marzocchi and Taroni [2014] on declustering in PSHA. Our aim here is moving from a complete (undeclustered) distribution to a declustered distribution: so we need to change the seismic annual rate and the b-value value of the distribution, because the declustered catalogs have both a lower annual rate and a lower b-value respect to the complete ones.
As suggested by Marzocchi and Taroni [2014], we modify the tapered Pareto G-R distribution by multiplying the annual rate by a factor γ that represents the percentage of mainshocks over the total number of events, and we adopt a b-value estimated by the declustered catalog. To estimate the factor γ we prefer to use the Instrumental Italian catalog, for its greater reliability in the magnitude completeness. We obtain γ = 0.576 due to 160 mainshocks out of 278 total events for the entire catalog with a magnitude 4.5 M w , using the Gardner and Knopoff [1974] declustering algorithm. We highlight that in this work we use the Marzocchi and Taroni [2014] approach, but in a different way: we move from a complete to a declustered distribution, so at the end we obtain smaller seismic annual rate for each magnitude bin.
The plot in Figure 5 represents the incremental annual rate for all seismogenic sources with respect to magnitude, both for the complete and declustered distribution. The b-value estimated by the Weichert method applied to the complete combined (CPTI15 + Instrumental) catalog is 1.05, on the contrary by applying the same method to the declustered combined catalog we obtain a b-value equal to 0.99. The corner magnitude is 7.3 for both catalogs.
Looking at Figure 5, is it possible to note that the complete distribution (blue curve) is approaching the declustered distribution (red curve) in the right part of the plot: this happens because the strongest events are almost all mainshocks, and therefore in the tails there is no difference between the complete and the declustered distributions.

Ensemble Models
We have built six different ensemble models. We named our models "ensemble models" because they consider different combinations between smoothed seismicity rates (subsection 3.1), both for historical and statistical analysis of the completeness, and annual rates obtained from the seismogenic sources version 3.

[DISS Working
Group, 2018] (subsection 3.2). Then, each one of the ensemble models is a combination of two different sub-models: the smoothed seismicity model and the seismogenic source model.
The annual rate value for each ensemble model, obtained for each geographical cell of 0.1° x 0.1° and for each magnitude bin starting from M w 4.45 (the first magnitude bin, M w 4.5, starts from M w 4.45), was determined as follows: 1) If the cell is outside any seismogenic source, only the rate obtained from the earthquake catalog, calculated as described previously in subsection 3.1, is considered.
2) If the cell is entirely within one of the seismogenic sources, also the seismogenic source rate is considered as described in subsection 3.2. This value is averaged with the smoothed seismicity rate of the same cell (subsection 3.1), considering three different weighing schemes: 30%-70%, 50%-50%, 70%-30% (the first percentage always refers to the seismogenic source rate, the second to the smoothed seismicity rate).
3) If the cell is partially within one or more seismogenic sources, the rate for the part of the cell area covered by each seismogenic sources is computed as in (2), the remaining part as in (1). Here we assume that the seismogenic sources do not intersect each other.
In our case, where we have one model that covers all the territory and another model that covers only a small part of the space, the ensemble approach is particularly useful because it provides seismic information on the territory with not mapped seismogenic sources. In this case, the information is furnished by the model that covers all the space and uses the seismic catalog through the application of a smoothing technique [Danciu et al, 2017].
We chose the 30%-70%, 50%-50%, 70%-30% combinations not knowing which of the two approaches (smoothed seismicity or seismogenic source rate) would provide better forecast. Therefore, these arbitrary choices were adopted to manage the uncertainty relative the best combination of the two models to use. An objective weighing scheme (e.g. based on the log-likelihood of the past seismic events for models that compose the ensemble, as in Marzocchi et al. [2012]) in our case is hard to apply, since one of the model that compose the ensemble do not cover the entire territory, but only the zones where seismogenic sources are available.
The ensemble approach also has the effect of smoothing the highest values of the seismogenic source rate (values concentrated in small regions) with respect to those due to the widespread smoothed seismicity.
The six ensemble models were obtained using the different percentages set for each rate contribution (Table 1). The first three models (Ensemble 1-2-3) adopted the Historical criteria (HAC), the second three (Ensemble 4-5-6) Statistical criteria (SAC). In this way we take into account the epistemic uncertainty relative to the two different criteria of completeness (HAC and SAC) and to the composition of the two types of rates due to the seismogenic sources and catalog.
Additional results, concerning models that use SAC for CPTI15, are provided in the Supporting Material. Regarding the spatial distribution of the models, in Figure 7 it is shown the Ensemble model obtained from the contribution of 50% (seismogenic source rate) and 50% (seismicity rate), considering the CAT_H catalog. Figure 7a represents the log 10 of the expected cumulative annual seismicity rate (M w ≥ 4.45) in each cell of 0.1° x 0.1°, while Figure 7b shows the same model, but in a linear scale.
To better highlight the effect of these choices, we show in Figure 8 the relative difference between the ensemble models with 70%-30% and 30%-70% (percentage referred to smoothed seismicity and rates of seismogenic sources).
The differences are visible only in the cells affected by the individual fault sources. It can be clearly noted that the 70%-30% ensemble model exhibits higher rates for most cells including the seismogenic sources. In fact, these cells (positive values highlighted in orange and red) exhibit a higher rate than those including only the smoothed seismicity. From Figure 8 it is possible to note that there is a good agreement between smoothed seismicity rates and individual source rates in Central Apennines zones (colors from yellow to light orange), whereas the seismogenic rates are higher than the ones from the catalog in North-East and Southern part of Italy (red color).
In the analysis of this comparison, it is important to remember that in our study the long-term tectonic moment rate, released annually by each individual source, is proportional to the average annual slip rate.
Regarding the magnitude frequency distribution, Figure 9 shows for the whole Italian territory the cumulative annual number of events (in a log 10 scale) for the models Ensemble 1-2-3 compared with respect to the cumulative rate of the real seismicity (1000-2017) that considers the CAT_H catalog.
The three ensemble models show a cumulative annual rate similar to the observed annual rate up to about 5.2, and then overestimate the number of events observed from 5.3 to 6.5. This difference can be justified by the different completeness approaches: the observed annual rates are obtained by summing up the rates in each individual subregion of completeness, while the models use the estimation made with the Weichert method applied to a uniform temporal completeness for all the completeness sub-regions (as explained in section 3.1). The uniform temporal completeness is obtained taking for each magnitude threshold the most recent year, then is less affected by possible incompleteness in the seismic catalog, therefore we prefer to anchor our model to this type of completeness (see Figure 1A in Appendix A for details).
Finally, we performed a sanity check to test the robustness of our ensemble models. To perform these tests, we used the mean ensemble model, obtained averaging the six ensemble model taking into account their relative weights. We checked both if the number and the spatial distribution of observed events is compatible with our model by using the Ntest and S-test ; these tests was already used to assess the performance of short and medium term earthquake forecasting models in Italy . We tested our model against the Italian earthquakes with. M w ≥ 5.5, obtaining for N-test a p-value=0.15 and for the S-test a p-value=0.99. These high p-values indicate the consistency of our model with real observations. We outline that we call this assessment a "sanity check" because the data used for the tests are the same used to build the model, then is not the gold standard of testing, the prospective test with independent data, but is still and important part for the construction of a robust seismicity model [Meletti et al., 2019b].

Discussion and conclusions
We developed a time-independent model that estimates the spatial distribution of the annual seismicity rate (M w ≥ 4.5) by combining the information contained in a seismic catalog (time, latitude, longitude, depth, magnitude) with the information from a seismogenic source database. This model, which merges various types of information, is more robust than a seismogenic source only based model and a model based only on earthquake catalog data.  [Gasperini et al., 2013]. We used both historical and instrumental catalogs considering their space and time magnitude completeness. The spatial smoothed seismicity was obtained applying to the seismic catalogs the smoothing method with completeness magnitude correction. We propose the tapered Gutenberg-Richter frequency-magnitude distribution that fits the long-term seismicity better than other distributions [Bird and Kagan, 2004], because it is computationally simple and manageable, and its parameters are less correlated than those of other distributions [Kagan, 2002]. Our declustered models assume a spatial uniform distribution of the b-value [coherently with Bird and Kagan, 2004], with a value of b=0.99 and a single value of corner magnitude ( . Relative difference in percentage between the CPTI15 historical completeness 70% (fault rate) -30% (seismicity rate) and the CPTI15 historical completeness 30% (fault rate) -70% (seismicity rate) ensemble models, compared to the historical completeness 70% (fault rate) -30% (seismicity rate) ensemble model.

Figure 9.
Cumulative annual rates on a logarithmic scale, for the three ensemble models that consider the CAT_H catalog vs annual rate obtained from the seismicity catalog adopted in this study (CPTI15 + Instrumental Catalog) (shown with red dots). The blue, brown and green lines refer to the models obtained from the contributions of 30% (fault rate) and 70% (seismicity rate), 50% (fault rate) and 50% (seismicity rate), 70% (fault rate) and 30% (seismicity rate), respectively.

Ensemble earthquake rate model in Italy
The use of a single "corner magnitude" for all zones of Italy could appear to be a rough approximation, implying the possible occurrence of very large magnitude earthquakes that might not be compatible with the geological and tectonic setting of the different areas. However, Zӧller and Holschneider [2016], showed how an estimation of a corner magnitude with a small regional catalog is difficult and always with an open bounded confidence interval (i.e. from a statistical point of view is not possible to clearly discriminate a tapered G-R and an unbounded G-R). So we prefer to aggregate all the zones to have a better statistical constraint. We just want to outline that a corner magnitude of 7.3 is not a hard bound, like the maximum magnitude, and with the tapered distribution is possible to have events with magnitude larger than 7.3.
The use of seismogenic source slip rates, which are based on long-term geological information, allows us to reduce the problems associated with the short time length of the available earthquake catalogs. On the other hand, where no seismogenic source is present, the area is not considered aseismic, but the seismic catalogs are used to estimate the smoothed seismicity.
In our models, where there is a seismogenic source, the final annual rate is given by assigning different weights to the seismogenic source rate and the smoothed seismicity rate, in order to combine the values of the seismogenic source rate (values concentrated in small regions) with those due to the widespread seismicity observed. It is important to note that the use of two different approaches for the analysis of the completeness of seismic catalog, historical and statistical [Meletti et al., 2019b], achieves spatially different results: in fact, the use of one or the other criterion creates strong spatial changes, while the total number of events of the two different types of completeness maintain similar values. Considering the three different combinations of the seismogenic source rate and seismic catalog rate, we obtained a total of six different ensemble models.
Comparing the results with different combinations of the seismogenic source rate and seismic catalog rate, we note that the seismogenic sources rates are mostly higher than those derived from the catalogs in the same cells.
A problem for a seismogenic source-based model may be due to the uncertainties of the parameters considered (assumption on the slip rate, on the aseismic component, on the exact dimension of the fault) which may become important in a probabilistic approach. In our joint model this problem has a minor impact, as it is integrated with the rates derived from historical and instrumental catalogs.
If we compare our methodology with that adopted by Hiemer et al. [2014], we can see that there are similarities between our approaches, both combining information from a seismic catalog and from the seismogenic sources. In fact, both methods only use the smoothed catalog in areas where seismogenic sources are not detected. However, Hiemer at al.
[2014] adopted a method of magnitude-dependent weighting, so that the contribution from the seismogenic source-based density linearly increases from 0 to 1 with increasing magnitude (in the magnitude range of 6.5 ≤ M w ≤ 8.0). We opted for the combination of the two contributions by a fixed weighting scheme, with values ranging from 0.3 to 0.7 regardless of magnitudes, to take account of the epistemic uncertainty associated with these arbitrary choices.
Our model also considers a new updated earthquake dataset; whose time coverage extends up to April 2017. The parameters from macroseismic data have been recalibrated and new conversion relationships were used for the instrumental magnitudes [e.g. Gasperini et al., 2013]. We expanded the Weichert [1980] method to estimate the annual rate, the b-value and the corner magnitude of the tapered Gutenberg-Richter distribution with respect to the truncated G-R distribution used in the SHARE project (www.share-eu.org). Differently from the UCERF3 model [Field et al., 2014], where both the aseismicity factor and the coupling coefficient are taking into account for most of the Californian faults, our time-independent model assumes that the seismic coupling with seismogenic sources is equal to 1. Our assumption is coherent both with the calculation of the 2013 European Seismic Hazard Map [e.g. Woessner et al., 2015] and also with the Italian seismic model of Valentini et al. [2017]. This assumption implies the hypothesis that all tectonic deformation above the brittle/ductile transition results in earthquakes release and not partially in aseismic frictional sliding. This can result in a possible overestimation of the earthquake rates in some seismogenic sources. Recently, some papers about seismic coupling were published for the Italian region by Carafa et al. [2017], Carafa et al. [2018], Nijholt et al. [2018]. The results produced by Carafa et al. [2017Carafa et al. [ , 2018 refer to a regional seismic coupling, but in the studied case we need knowledge of the seismic coupling on each single fault.
Moreover, we cannot even make use of the results produced by Nijholt et al. [2018] because they only represented a mechanical model for Southern Italy that has different boundary characteristics from the DISS sources.
Furthermore, the authors underlined that for seismic coupling they need further analysis because the implication of a non-seismogenic subduction interface is highly relevant. Finally, we performed a sanity check to test the robustness of our model against real observations, obtaining good results both for the total number and the spatial distribution of the earthquakes.
Summing up, we compiled a national seismic hazard map merging the information of the fault database with that of the seismic catalogs. Our six ensemble models will be incorporated in the logic tree of the new seismic hazard model for the Italian region [Meletti et al., 2019 b]. We included the models in the online Supplementing Material, in order to make both our study reproducible and to facilitate the comparison with other Italian seismic models