The creation of avalanche bulletins is still a largely expert driven and manual task. Forecasters manually inspect vast amounts of spatio-temporal data, which describe the condition of the snowpack and of the local weather in the Alps. Based on their intuition and knowledge, they will then assign danger levels on an ordinal scale from 1 to 5 (low to high avalanche danger) to create the avalanche bulletin for the Swiss Alps. This labour intensive task is carried out once or twice a day during snow season, and becomes vulnerable to errors and biases. Forecasters can hardly explore all of the relevant data. In this SDSC collaborative project, we aim at exploring the feasibility of using data-driven statistical models to support the process of avalanche danger forecast, explore relevant data, and ultimately get one step closer to obtain an automated decision tool supporting human experts.
This blog post introduces the research project “DEAPSnow: Improving snow avalanche forecasting by data-driven automated predictions”, a joint collaboration between the Swiss Data Science Center and the Swiss institute for snow and avalanche research.
Avalanche danger bulletins are essential in Switzerland
Switzerland is dominated by the Alps. The complex orography coupled to markedly seasonal weather patterns, makes most of the country susceptible to high avalanche danger. For this reason, the Swiss Confederation has mandated the Institute for Snow and Avalanche Research (SLF) in Davos, part of the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL), to issue daily an avalanche bulletin to warn the public about the avalanche hazard during the winter season. Avalanche hazard is communicated using the five ordinal levels 1–Low, 2–Moderate, 3–Considerable, 4–High and 5–Very High of the European danger level scale.
Timely and accurate prediction of avalanche danger is not only crucial information for wintertime activities and ski stations, but is an important source of information regarding land-use planning and for the mapping of natural hazard areas. On a shorter temporal scale, the avalanche bulletin is an important source of information to assess risk on public and private infrastructures after heavy snowfall. For all these reasons, an accurate and timely avalanche danger forecast, is a crucial piece for most Alpine villages and related wintertime economic activities.
The creation of a bulletin is still a task largely driven by expert-based knowledge. Highly trained and skilled avalanche forecasters gather and parse a tremendous amount and variety of data. This information is a massive spatio-temporal multi-faceted data cube that needs ultimately to be reduced into a set of five danger levels, ranging from 1 to 5, an elevation and mountain aspects (N, E, S, W) to which said danger level score applies. This summarizes snowpack instabilities and avalanche susceptibility into a format readable and interpretable by both layman users and experts [1,2]. For instance, in Fig. 1 below, the danger level for part of the southern prealps (the area in yellow) corresponds to moderate risk (level 2) from 1400m upwards, at all slope orientations.
Supporting automated delineation of danger levels
As one can guess, avalanche forecasting and the creation of an avalanche bulletin is an difficult data-intensive process, with scarce automation in terms of decision making. The forecast of danger levels is a very complicated process because experience and situational interpretation that transcend the mere data are needed to assign danger levels. To further exacerbate the complexity of the task, the danger level is not a quantity strictly defined in terms of physical properties of the snowpack, but it is only defined as a qualitative set of levels on an ordinal scale. For instance, danger level 1 is defined as:
“The snowpack is well bonded and stable in general. Triggering is generally possible only from high additional loads (e.g. several skiers) in isolated areas of very steep, extreme terrain. Only small and medium natural avalanches are possible”
while danger level 5 is defined as:
“The snowpack is poorly bonded and largely unstable in general. Numerous very large and often extremely large natural avalanches can be expected, even in moderately steep terrain.”
This poses two main issues: first, the danger level itself has to be assigned by interpreting the situation at a given day. This entails questions about temporal consistency of bulletins, since a change in the forecasters team, change in sensor or used climate and snow models, could potentially entail changes is assignment of danger levels. A danger level of 3 assigned in 1992 could correspond to a danger level of 4 after reanalysis in 2020. Crucially, since the danger level is the result of expert interpretation, there is no direct way of directly measuring it using physical parameters or accurately verifying it at post-hoc. Depending on the overall conditions, at the moment of the creation of the bulletin, forecasters have to take data from punctual measurements locations and generalize the estimated danger level over all the warning regions. This process involves some level of smoothing: some areas might receive and assigned danger level that differs from the one indicated by measurements, in particular if neighboring stations point at a higher danger level. Elevation of each measurement station should also be taken into account, since the amount of snow and in general conditions favorable to avalanche formations are varying according to elevation.
The second issue is that each forecaster, in order to optimally interpret the situation, has to parse a massive variety of data. These include manual observations, visual estimates, subjective judgments by individual external observers, meteorological data from automated weather stations, output from numerical weather prediction and snow cover models, avalanche occurrence and snow stratigraphy data. Each one of these datasets come with a specific spatio-temporal resolution, validity, accuracy and a history of subjective preference by the forecaster.
The “DEAPSnow” research project aim at answering important questions that would lead to the creation of a tool able to assess local danger level prediction and therefore support forecasters in their task.
- Can a danger level be estimated automatically using machine learning models?
- What data is required and at what temporal resolution? How much historical data should be taken into account for each prediction?
- What family of methods is best at predicting the danger level?
- How would such a pipeline work for real-time prediction using snowpack and weather forecast data?
Creation of unique datasets enabling the use of machine learning
Ultimately, the avalanche danger is a function of snowpack stability, which in turn is affected by weather and climatology. These data are collected by Intercantonal Measurement and Information System (IMIS) network. IMIS stations collect a range of measurements every 30 minutes, which are sent to a receiver server located at the SLF centre in Davos, Graubunden. These data are fed into a numerical model — the SNOWPACK model) — to estimate a large set of features related several snow characteristics.
To approach the research questions outlined above, we compiled a large dataset containing measurements from weather stations, measurements about the snow state and the output of a numerical model targeting the snowpack and its layer evolution. The numerical model is fit to observed snow conditions, at the location shown in Fig. 2. Measurements are dense in space, but not dense enough to represent all local weather and snow characteristics. Whether an avalanche occurs or not is dominated by a series of local processes, but the regional level avalanche danger is related to larger-scale snow cover and weather characteristics, which make it possible to use such dataset also in an automated processing.
The avalanche bulletin is published on a daily basis at 17:00, forecasting the danger for the next day. It has to be noted, that the danger level is forecasted at the so-called danger regions (black polygon boundaries in Fig. 1), while we append it to punctual time series measurements representative of local measurements. We compiled data from the past 22 seasons, and we attempt to predict the danger level given the measurements and physical model outputs on a daily basis.
Machine learning to the rescue
A basic machine learning task can be phrased as a standard supervised classification problem: given the measurements on a given day, we aim at predicting the danger level forecast attached for it (i.e. made the previous day). In real scenarios, we do not have access to real next-day measurements, but we do have access to simulated forecast measurements as provided by a climate model and the snowpack numerical model, which are the same as those used by forecasters.
We first focused only on the prediction of dry snow avalanches. Such data subset is accessible by parsing the ancillary information about type of expected avalanches, as provided by the forecasters. Many preprocessing steps have to be undertaken in order to filter data and parse stations which measure parameters related to actual avalanche formation processes (e.g. based on elevation, amounts of measured snow, etc.)
This supervised classification problem is extremely unbalanced, and it is representative of the actual danger level forecasted in the Alps occurring every year. Fig. 3 represents a bar-plot of the counts of the danger levels forecasts for all the measurement stations, over 22 years. Notice that danger level 5 is not even appearing in the plot, since only counting 0.06% (N=236) of all the events.
Figure 3: Counts of danger level forecasts over 22 years, from 1997 to 2020
We trained several models, ranging from linear regressors and classifiers, boosted decision trees, random forests, recurrent neural networks and convolutional neural networks. All the models have access to some form of historical information, the last two models access previous days measurements directly, as sequences, while all the other have access to smoothed measurements by additional variables summarizing multiple day statistics (e.g. mean over 3 days, 7 days, etc.). It turns out that, given the vast heterogeneity of the input data, random forests perform best. We train the models over 20 years and use winters of 2018/2019 and 2019/2020 as independent test sets.
These preliminary results are encouraging:
- Averaged per-class (f-score) indicates an agreement of >70% with the official bulletins. Although ground truth labels are uncertain and potentially biased in ways we cannot detect, this accuracy score is very high, and the models have shown to be temporally and spatially consistent.
- Random forests naturally return a ranking of features based on their importance in the model. The ranking of the features is consistent with variables that are analysed by forecasters. These variables mostly relate to fresh snow and wind driven snow accumulation of the last several days and several indices and profile parameters related with the stability of the snow cover.
- Errors committed by the baseline model are often committed by predicting a danger level close to the official forecast. That is, It never happens that a “real” danger level 4 is mistaken for a danger level 1 or a danger level 2. This means that the classification problem is well posed, and errors that are potentially costly (in terms of real world consequences) are not being committed. Fig. 4 shows the error matrix and the diagonal is clearly dominating, as one would hope.
Figure 4: Error Matrix for the 2018/2019 and 2019/2020 seasons