VIGIL: A Python tool for automatized probabilistic VolcanIc Gas dIspersion modeLling

Probabilistic volcanic hazard assessment is a standard methodology based on running a deter‑ ministic hazard quantification tool multiple times to explore the full range of uncertainty in the input parameters and boundary conditions, in order to probabilistically quantify the variability of outputs accounting for such uncertainties. Nowadays, different volcanic hazards are quantified by means of this approach. Among these, volcanic gas emission is particularly relevant given the threat posed to human health if concentrations and exposure times exceed certain thresholds. There are different types of gas emissions but two main scenarios can be recognized: hot buoyant gas emissions from fumaroles and the ground and dense gas emissions feeding density currents that can occur, e.g


Introduction
The injection of volcanic gases in the atmosphere can be very hazardous to life and global environment. While the impacts from large volcanic eruptions have been widely studied [e.g. Grattan and Pyatt, 1994;Parker et al., 1996;Self, 2006], the lower atmospheric boundary layer gas dispersion from persistently degassing volcanoes or particular tectonic areas also deserves attention since it can have an important effect on the regional and global environment [e.g., Williams-Jones and Rymer, 2015]. Volcanic gas hazard is mainly related to the reaching of hazardous concentration levels in the air of some species that are dangerous for human health and life for relatively long-time exposures (e.g. CO 2 , H 2 S and SO 2 ) or even a mixture of such species [e.g., IVHHN, 2005]. Figure 1a shows the range of gas hazards and the scale of their impacts [Edmonds et al., 2018]. Long-term exposure to volcanic gas, aerosol and particulate matter can produce acidification of soil, pollution of water, impinge on livestock and fisheries [Cronin and Sharp, 2002]. Volcanic gases may be passively transported by wind (e.g. fumaroles; Figure 1b) or may accumulate far from their source flowing down valleys as a gravity flow, engulfing and asphyxiating people [e.g. Costa and Macedonio, 2016;Folch et al., 2017;Figure 1c]. For these reasons, the evaluation of gas hazard should be part of any multi-hazard evaluation in volcanic environments [e.g. Selva et al., 2020]. which, after a large limnic eruption, flooded the surrounding area with gases denser than air (from Wikipedia; A tool to quantify volcanic gas hazard 3 The dispersion of gases generated from volcanoes and Earth degassing can be simulated by solving the complete set of conservation equations for mass, momentum, and energy. However, this approach is computationally expensive, since it requires three-dimensional computational fluid dynamics models [e.g. Cortis and Oldenburg, 2009;Macedonio and Costa, 2002]. In a probabilistic hazard context requiring numerous simulations, simplified gas transport models are commonly used to strongly reduce the computational cost [e.g., by using single species of gas, assuming the gas as incompressible; Costa and Macedonio, 2016]. Numerical modelling of both heavy [e.g. Burton et al. 2016;Costa et al., 2008;Folch et al., 2009Folch et al., , 2017 and light gases [e.g. Chiodini et al., 2010;Costa et al., 2005;Granieri et al., 2013;Pedone et al., 2017] accounts for topographic effects, variation of atmospheric conditions and wind directions. For a gas denser than air, the flow behaviour over complex topography is generally described using depth-averaged variables [shallow layer approach, e.g. Hankin and Britter, 1999], while for dispersion of a diluted gas passively driven by wind advection and atmospheric turbulence, simpler advection-diffusion equations are considered [e.g. Prabha and Mursch-Radlgruber, 1999] to make the problem computationally lighter [Costa and Macedonio, 2016]. Alternative methodologies consisting in simulating the spread and dispersal of gases in a numerical weather prediction model have also been proposed [e.g. Burton et al., 2016]. In this study, we use two open-source codes for modelling dilute and dense gas dispersion, respectively: DISGAS  Folch et al., 2009;Hankin and Britter, 1999;http://datasim.ov.ingv.it/models/twodee.html]. Both models are written in Fortran and need a terrain-following mass-consistent wind model (i.e. that ensures mass conservation), such as the Diagnostic Wind Model [DIAGNO v.1.1.6, hereafter referred to as DIAGNO; Douglas et al., 1990;http://datasim.ov.ingv.it/models/diagno. html]. DISGAS reproduces the passive dispersion of a gaseous pollutant governed by wind and atmospheric turbulence. It yields gas concentrations (e.g., CO 2 ) expressed as values in excess of background gas species levels in the air at heights selected by the user in a terrain-following coordinate system. The model does not account for the chemical reactivity of components and also neglects gas solubility in condensing H 2 O droplets. This, however, does not limit its application to the assessment of gas species dispersion, in light of the conservative behavior of this component [Pedone et al., 2017]. On the contrary, TWODEE simulates the heavy gas flow in a relatively calm ambient based on the shallow water equations for fluid depth, depth-averaged horizontal velocities and depth-averaged fluid density. It can be used for modelling gas dispersion near the ground over complex terrains [Folch et al., 2009;.
Computational models of volcanic gas dispersion have significantly advanced [e.g. Costa and Macedonio, 2016;Folch et al., 2009Folch et al., , 2017] and now represent a valuable tool to make quantitative gas dispersal evaluations that can be used to forecast potentially dangerous concentrations, study past events, quantify hazard and also understand the physic-chemical changes occurring in the magmatic feeding systems or in hydrothermal aquifers [i.e., Edmonds et al., 2018;Massaro et al., 2021]. Furthermore, they can be used to cross-validate measurements of gas emission rates [Massaro et al. 2022].
The use of such models in PVHA is based on multiple deterministic simulations of gas concentration in the area of interest aimed to explore not only the natural variability in input and boundary conditions (seasonal and daily wind variability, source position and gas flux at the emission sources), but also the impact of the uncertainty on other controlling factors such as, for example, the resolution of topographic data [Tierz et al., 2016;Selva et al., 2018;Massaro et al., 2021]. The underlying workflow motivated the development of VIGIL (automatized probabilistic VolcanIc Gas dIspersion modeLling), a new open-source tool written in Python and designed to control the complete workflow using DISGAS and TWODEE [Massaro et al., 2022]

The modelling strategy
The atmospheric dispersion of a gas released from a given source is governed by the starting density contrast between the gas and the environment (atmospheric air) and turbulent entrainment, the latter controlled by lateral eddies that increase the mixing with air around the edges of the plume thereby decreasing its averaged density [e.g. Costa et al., 2005;Folch et al., 2009;Hankin and Britter, 1999]. Based on the starting buoyancy of the gas phase relative to the environment, which can be quantified by the reduced gravity ( ! = # " − # & # ⁄ , where " and # are the gas and environment density, respectively), the source extension (which can be quantified by the plume radius at the source, ), the source gas flux ( ) and the wind speed ( ), which all determine the flow Richardson number at the source: two regimes are possible [Cortis and Oldenburg, 2009;Costa et al., 2013]: 1. If < 0.25, the gas transport is passive and dominated by the wind advection and diffusion (passive dispersion). We call this "light-gas" regime; 2. If > 1, the gas transport is dominated by the density contrast between the gas and the surrounding environment; in this case the gas flows over the topography until the density contrast persists. We call this "heavy-gas" regime.
Although the atmospheric gas dispersion can be generally described by solving 3D equations for the conservation of mass, momentum, and energy for each gas specie, several scenario-specific simplifications are assumed to reduce the computational time [i.e., single species of gas, incompressible fluid; Costa and Macedonio, 2016], eventually approaching the two end-member scenarios represented by the DISGAS and TWODEE models. In both cases, the momentum coupling with the atmospheric air is taken into account by considering the wind field, which is the main factor controlling the light-gas dispersion of gases and one of the controlling factors (together with density contrast) in the heavy-gas case. Details on the wind model pre-processor (DIAGNO) and the two models used for the passive light-gas and for the heavy-gas dispersion (DISGAS and TWODEE) can be found in VIGIL User's Manual.

The VIGIL workflow
VIGIL is a set of three Python scripts designed to manage the whole workflow leading to gas dispersion simulation outputs, from meteorological data retrieval and processing to graphical output production. VIGIL works both in forecast and reanalysis mode, and allows simulations over long period of times in both modes.
It is executed in the workflow as shown in Figure 2: -weather.py: depending on the simulation mode (forecast or reanalysis), it prepares the meteorological data by either retrieving forecast data from the NOAA-NCEP Global Forecast System (GFS) Numerical Weather Prediction (NWP) dataset (https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast) or reanalysis data from the ECMWF ERA5 NWP dataset [Copernicus Climate Change Service, 2017]. User-provided data from local weather stations can also be used in reanalysis mode. NWP data, which are available on a global grid with horizontal resolution of 0.25° for ERA5 and 18 miles for GFS, are interpolated onto the computational domain specified in input by the user. In reanalysis mode, the script is designed to randomly sample N days from a time interval defined by the user. If data from a local weather station are available, the script extracts these data in the time interval specified by the user from selected data files. The N days can be processed in parallel as blocks of N proc simultaneous processes specified by the user. Data are then organized in folders to be used by the meteorological processor DIAGNO.
-run_models.py: it runs DIAGNO and subsequently DISGAS or TWODEE. For each of the N simulations (days), the user can assess the gas emission sources within the computational domain by using random source locations (source locations are selected from a probability map) or by defined source locations (source locations are read from a list containing its coordinates). In both cases, the number of sources can be fixed or randomly sampled from a range defined by the user. In the same way, the associated gas fluxes can be read from a list (fixed source emission) or randomly sampled by an Empirical Cumulative Distribution Function (ECDF) (random source emission), which is built by run_models.py based on data provided by the user in the file flux.txt (see below). A combination of fixed and random emissions is also possible. The N days can also be processed in parallel as blocks of N proc simultaneous processes specified by the user.
-post_process.py: it reads the DISGAS or TWODEE outputs produced by run_models.py and converts the model outputs in concentration of other gas species (e.g., CO 2 , H 2 S) based on the gas species properties made available by the user in the file gas_properties.csv. Tracking points can be specified to track the time evolution of gas concentration at specific locations. In PVHA applications, the script merges the outputs of the N to build ECDF of the gas concentration and produces plots of the modelled concentrations at the user's specified exceedance probabilities ("probabilistic outputs" in Figure 2   To run VIGIL, the following files are mandatory: -diagno.inp: it denotes the DIAGNO input file. This file has a fixed structure, which is explained in the DIAGNO user manual. Working versions can be found in the Example folders: the user can use any working version of the diagno.inp file from the example without modifying it. Then weather.py modifies all the necessary entries. It is essential that the user does not alter the structure of the reference file manually.
-"modelname".inp: it is the control file of DISGAS and/or TWODEE. modelname should be "disgas" or "twodee" depending on the model at stack. The user is referred to the DISGAS and TWODEE user manuals for details on the structure of these files. Similar to diagno.inp, the user can take any of the files in the Example folders; run_models.py will modify these files accordingly.
-topography.grd: a file that describes the topography in Golden Surfer v.6 © GRD format following the format required by DISGAS and TWODEE.
-roughness.grd: a file that describes the terrain roughness height in Golden Surfer v.6 © GRD format following the format required by DISGAS, in which it is also possible to set a uniform value in the disgas.inp file so that the roughness.grd is not required. The surface roughness height influences the vertical wind profile. Typical values range from about 10 -5 m over an iced surface, 0.05 m over grass-covered soils up to 1 m or more over forest or urban areas.
Details on the GRD file structure are available in the DISGAS and TWODEE User Manuals.
Other optional files are required depending on the options chosen by the user. When weather data from local weather stations are used ("weather station" in Figure 2), these should be provided in separate files for each weather station stored into a "weather_stations" folder. In this case, the following input files are required: -weather_stations_list.txt: it contains the list of files with the weather station data and that are stored in the folder "weather_stations". Mandatory only when the user runs DIAGNO with local meteorological data.
-"file_name".txt: it contains the meteorological data acquired by a weather station. The file has to be included in the "weather_station" folder. Mandatory only when the user runs DIAGNO with local meteorological data.
Further optional files can be provided to control the emission sources. These are: -flux.txt: it contains a list (in a column) of possible values of the gas source emission rate (kg s -1 ) that VIGIL uses to build an ECDF from which it randomly selects a user-defined number of possible values to be attributed to each source.
-sources_input.txt: a file that contains a list of fixed sources provided by the user. It is structured as in the following: -probability_map.grd: a file that contains the probability that each cell in the hazard domain acts as a gas source location. It has the same format as the topography.grd and roughness.grd files (Golden Surfer v.6 © GRD) and is a structured as a matrix with NY rows and NX columns, where NX and NY are the number of cells in the domain along the x and y directions, respectively. NX and NY must coincide with the values indicated in modelname.inp.
Furthermore, the file gas_properties.csv is used by post_process.py when the user needs to convert the concentration of the gas species tracked by TWODEE or DISGAS into the concentration of another gas species and when a conversion of the resulting concentration from kg m -3 to ppm or vice-versa is required. This option can be useful when an abundant gas species (e.g. H 2 O) is set as the gas specie tracked in DISGAS and TWODEE and considered as carrier for less abundant non-reacting species (e.g. CO 2 ) [Massaro et al., 2021]. For this conversion, the molar ratio between the species (converted / original) and the molar weight of the new species in g mol -1 are needed.
Note that it is possible to: -provide more than one species. If more than one species is provided, all the columns with the molar ratios should be aligned to the left of the columns with the species molar weights; -provide more than one value of molar ratio, which is suggested since this parameter is affected by a significant uncertainty. In this case, the routine post_process.py randomly selects one of the molar ratio values to be used for the conversion from an ECDF built by the code based on the data provided in the gas_properties.csv file.
Finally, the file tracking_points.txt is required by post_process.py when the tracking points option is activated by the user. The file lists the locations (in UTM coordinate) and elevations (in m above the ground) of the selected locations.

Results
In this section, two examples show part of the wide range of capabilities of VIGIL that can be used for different applications. The first example is referred to DISGAS, while the second shows an application of TWODEE. In particular, we will focus on the novelty introduced by the VIGIL, that is to generate hazard maps for gas hazard assessment purposes.

DISGAS application: Solfatara crater (Campi Flegrei, Italy)
This application is focused on the dispersal of CO 2 released from Solfatara crater (Campi Flegrei, Italy) and the hazard related to the intense soil diffusive degassing and fumarolic sources which has shown large variation in the last 20 years [Chiodini et al., 2021]. In this case, we selected a scenario typical of gas fluxes of about 10 years ago, setting an emission from soil diffuse degassing at 1000 t d -1 and at 490 t d -1 from fumarolic vents [Granieri et al., 2013;Pedone et al., 2014;Cardellini et al., 2017]. Temperatures of the gases emitted from Solfatara are   [Chiodini et al., 2001] while the flux weighted temperature of diffusing soil is 66 °C [Costa et al., 2005]. Such temperatures contribute to balance, directly at the source, the increase of gas density due to the greater molecular weight (M CO2 /M air = 44/29). Moreover, typical surface wind velocities, intensity of the gas source, and size of the plume are such that < 0.25 (see Section 2). For these conditions, the passive dispersion assumption is appropriate [Costa et al., 2005].
In this example, we run DIAGNO and DISGAS on a computational domain of 8,025 km × 7,025 km × 75 km with a horizontal resolution of 25 × 25 m, while the spacing of the vertical layers ranges from 0.5 m above the ground to 75 m towards the top of the computational domain. The topography is represented by a 1 m-resolution DEM.
We use weather.py to randomly sample 1000 days from the period 1990-2020 and download meteorological data for these days from the ERA5 reanalysis dataset. Therefore, we run 1000 simulations with run_models.py using the fixed source locations of the active fumaroles, and also considering the CO 2 diffusive contribution on a gridded source [Cardellini et al., 2017], the latter reproduced by a set of equally-spaced sources (see sources_input.txt file in the example_1 folder of the Supplementary Material). Figure 3 shows some of the outputs produced by post_process.py for this test case. Here, we represent the concentration in ppm on a fixed linear scale (350-10,000 ppm) and we display the topography layer. Specifically, we show: the CO 2 concentration for a single simulation (Figure 3a) considering exceedance probabilities of 50%, 5% and 1% (Figures 3b-c-d). All plots represent the concentration field at the last time step of the simulations (+24 hours from the beginning of the emission), at 1.5 m above the ground. Figure 3 represents one of the capabilities of VIGIL, i.e. to analyze the meteorological variability in a framework of probabilistic simulations. At 50% exceedance probability (i.e., the most likely scenario; Figure 3b), the highest CO 2 concentrations are localized in the emission area. For smaller exceedance probability, we go towards less likely and more "worst-case" oriented scenarios (5% and 1% exceedance probabilities, Figures 3c-d). In these cases, the areas with the highest concentrations expand significantly and some tendency for preferential dispersion directions can be observed, which are likely related to the dominant wind pattern. The seasons play also a role, especially at mid-latitudes like in this application. The script weather.py allows the user to select specific years, months and days from the time period to sample. Therefore, it is possible to, e.g., select only days in winter by specifying December, January and February in the weather.py command.  hours from the emission start at 1.5 m above the ground.
All the input files, the VIGIL commands and the graphical outputs shown in this section are available in Supplementary Material 1. A more extensive analysis, considering the variability of the gas source and the persistency of concentrations in the air is the subject of an on-going study.

TWODEE application: the case of Mefite d' Ansanto (Italy)
Mefite d'Ansanto represents the largest natural emission of low temperature CO 2 -rich gases from non-volcanic environment ever measured on Earth [Chiodini et al., 2010]. Under low-wind conditions, the gas flows along a narrow natural channel producing a persistent invisible gas river (see Figure 5), which has already killed three people in the 1990s and many other according to historical chronicles [Chiodini et al., 2010, Costa et al., 2008.
Here, in order to reproduce the formation of the channelized gas river, we run DIAGNO and TWODEE on a computational rectangular domain of 1200 × 510 m with a horizontal resolution of 3 × 3 m. Concentration of the gas species is computed at seven levels, from 5 cm to 3 m above the ground. The topography is represented by an 8.9 m-resolution DEM modified after the 10 m-resolution DEM TINITALY/01 [Tarquini et al., 2007]. Similar to the previous example (section 4.1), we use weather.py to randomly sample 1000 days from the period 1990-2020 and download meteorological data for these days from the ERA5 reanalysis dataset. Therefore, we run 1000 simulations with run_models.py. The CO 2 is emitted from an area towards the east of the computational domain with an overall flux of 23.18 kg s -1 (2003.1 ton d -1 ); this has been approximated by 48 dense gas sources all represented by squares of 10 m sides with a constant flow rate of 0.483 kg s -1 (see sources_input.txt file in the "example_2" folder of the Supplementary Material). Figure 6 shows the outputs produced by post_process.py. Similar to the previous application, we represent the concentration in ppm on a fixed linear scale (350-10,000 ppm) displaying the topography layer. All plots are obtained by sampling the ECDF created by post_process.py from the 1000-days dataset at specific exceedance probabilities and vertical levels; furthermore, all plots refer to the concentration averaged over the whole simulation duration (2 hours). Specifically, we show: the CO 2 concentration at 0.05 m above the ground at 5% and 1% exceedance probability (Figures 6a-b); the CO 2 concentration at 2 m above the ground at 5% and 1% (Figures 5c-d) exceedance probability.  b) CO 2 concentration at 0.05 m above the ground corresponding to the 1% exceedance probability; c) CO 2 concentration at 2 m above the ground corresponding to the 5% exceedance probability; d) CO 2 concentration at 2 m above the ground corresponding to the 1% exceedance probability. DEM retrieved from TINITALY/01 dataset (Tarquini et al. 2007). Figure 6 shows several interesting aspects of the simulated process. The most evident one is the density stratification, with the outputs at 0.05 m above the ground showing significantly higher values of CO 2 concentration.
For low exceedance probabilities (e.g. 1%) values above 10,000 ppm averaged over the whole simulation duration

Discussion
The two application examples here presented show some of the potential of VIGIL. The software can in fact be used to generate hazard maps of volcanic gas exploring the meteorological variability as the first controlling factor.
The variability of other parameters characterizing the gas emission source like the number of sources, their location and emission rate can also be considered. In this case, for each of the simulations linked to the meteorological variations (e.g. for each simulation day and time interval), the emission source parameters can vary according to the user requirements as discussed in the Methods section. Therefore, another layer of uncertainty related to the source can be overlapped to the uncertainty of meteorological conditions.
Treating the source parameters uncertainty at the same level as the meteorological variability, i.e. fully combining M simulations each with a specific set of source parameters with the N simulations with varying meteorological conditions is currently not supported and will be introduced in the next development steps. This can be crucial to assess the hazard of volcanic gases in contexts where the source location can change and/or is unknown and the gas emission rates vary and/or are unknown. In contexts of relatively steady degassing, like those discussed in the application example, the current version of VIGIL is suitable to quantify the probabilistic gas hazard and, by exploring the meteorological variability sampling a sufficiently high number of days of reanalysis one can quantify the potential exposure to dangerous levels of volcanic gases concentration on long-term. VIGIL also allows sam-pling the reanalysis dataset specifying the months from which to sample the data, hence making a seasonal analysis possible. In fact, in some contexts it has been shown how the seasonal wind pattern plays a role in controlling the gas concentration levels [e.g. Granieri et al., 2013].
Recent and ongoing eruptions in the Reykjanes peninsula (Fagradalsfjall, Iceland), Soufriére St. Vincent (St. Vincent and The Grenadines) and Nyiragongo (Democratic Republic of Congo) stressed the importance of monitoring and modelling volcanic gas dispersion and accumulation during an effusive eruption (the case of Fagradalsfjall), during and after an explosive event (the Soufriére case) and possible lake overturns causing limnic eruptions, which is the cause of concern at the Lake Kivu shores close to the Nyiragongo volcano. The latter case represents a potential application of VIGIL for long-term dense gas hazard quantification using TWODEE. Furthermore, the next version of VIGIL will allow to run the complete workflow in forecast mode with the possibility to take the source uncertainty into account.

Conclusions
In this paper we presented VIGIL, a set of Python scripts designed to manage the complete workflow from gas dispersion simulations to production and visualization of hazard maps aimed at hazard analyses or sensitivity studies. In fact, VIGIL handles the pre-processing (meteorological data retrieval and processing), processing (actual dispersion simulations) and post-processing (e.g. graphical outputs), with the possibility to perform each step exploiting the multiprocessing capabilities of Python. The latter is a fundamental improvement to conduct PVHA applications, which require numerous simulations to take the variability of input parameters of the dispersion models (e.g. meteorological conditions) into account. Additionally, VIGIL represents a valuable tool for sensitivity analyses and validation studies of hazard simulation tools.
VIGIL is interfaced with DIAGNO to provide the mass-consistent wind field necessary to run two dispersion models, DISGAS and TWODEE.
The full range of functionalities is presented in the User's manual provided in the package.
For the future, we plan to implement in VIGIL also further options, such as: -the calculation of the gas persistence time above user-defined concentration and exposure time thresholds; -the implementation of full ensemble mode also for the source uncertainty; -the overlapping of multiple layers in the graphical outputs produced by post_process.py. The current version produces contour plots of gas concentration in the pure computational domain; layers with DEM and other features (e.g. towns, roads, etc.) would significantly improve the value and the information contained in the graphical outputs; -the possibility to specify the format of the outputs (e.g., netCDF), which are currently saved in Golden Surfer GRD format only; -the possibility to quantitatively compare outputs with user-provided measurements for the test case under analysis. This can be useful for validation studies.