The Wisdom of the Crowd: Using ensemble machine learning techniques as an early warning indicator for systemic banking crises

Finance master project by Gabriela Lavagna, Helena Patterson, and Robizon Razmadze ’21

Blurred chart on a computer screen
Image by Pexels from Pixabay

Editor’s note: This post is part of a series showcasing BSE master projects. The project is a required component of all Master’s programs at the Barcelona School of Economics.


We develop early warning models for systemic crisis prediction using machine learning techniques on macrofinancial data for 36 countries for quarterly data spanning 1970-2013. Machine learning models outperform logistic regression in out-of-sample predictions under the recursive window forecasting mechanism. In particular, using the ensemble random forest algorithm for both feature selection and prediction substantially outperforms the logit models. We identify the key economic and financial drivers of our models using the random forest framework by extracting each feature’s Gini impurity and corresponding information gain. Throughout the time period, the most important predictors are credit, foreign liabilities, asset prices and foreign currency reserves. 


  • The aim of the study was to construct a machine learning methodology to improve the predictive ability of systemic crises models. We applied these algorithms on macrofinancial data for 36 countries for quarterly data spanning 1970-2013. The results of the paper show that predictions of financial crises are more accurately obtained via machine learning algorithms as opposed to logit regression models in out-of-sample predictions (obtaining an AUC score of 0.77-0.81).
  • During the analysis, our goal was not only to be able to improve the predictive power of the models, but also to be able to select the most relevant and concise predictor variables. We applied the random forest ensemble algorithm to undertake feature selection and concluded that, over the years, the variables credit, foreign liabilities, asset prices and currency reserves were most important in predicting systemic crises.
  • The value-add derived from the models developed by us can be viewed in two main directions: Firstly, the need to use time series data as a means of predicting crises has meant many authors in the past have been unable to avoid the ‘looking to the future’ issue. We managed to alleviate this risk by using a recursive window estimation mechanism. The main benefit of this methodology is that it would allow policymakers to observe the predictors in real-time. Second, by being able to rank variables in order of importance, we were able to reveal the key economic and financial drivers which should be used by policymakers in evaluating any pressing risk of systemic crises. 

Connect with the authors

About the BSE Master’s Program in Finance

COVID-19 US County Dashboard

Charlie Thompson ’14 (ITFD) has built an RStats dashboard that tracks COVID-19 cases in the United States at the county level using the latest data from The New York Times.

Check it out to see how US communities are “flattening the curve”:

Charlie Thompson ’14 is a Data Scientist at Spotify. He is an alum of the Barcelona GSE Master’s in International Trade, Finance, and Development.

LinkedIn | Twitter