Data Science team “Non-Juvenile Asymptotics” wins 3rd prize in annual Novartis Datathon

Patrick Altmeyer, Eduard Gimenez, Simon Neumeyer and Jakob Poerschmann ’21 competed against 57 teams from 14 countries.

Screenshot of team members on videoconference
Members of the “Non-Juvenile Asymptotics” Eduard Gimenez, Patrick Altmeyer, Simon Neumeyer and Jakob Poerschmann, all Barcelona GSE Data Science Class of 2021

The Novartis Datathon is a Data Science competition taking place annually, usually in Barcelona. In 2020, the Barcelona GSE team “Non-Juvenile Asymptotics” consisting of Eduard Gimenez, Patrick Altmeyer, Simon Neumeyer and Jakob Poerschmann won third place after a fierce competition against 57 teams from 14 countries all over the globe. While the competition is usually hosted in Barcelona, the Covid-friendly version was fully remote. Nevertheless, the increased diversity of teams clearly made up for the missed out atmosphere.

This year’s challenge: predict the impact of generic drug market entry

The challenge of interest concerned predicting the impact of generic drug market entry. The risk of losing ground against cheaper drug replicates once the patent protection runs out is evident for pharmaceutical companies. The solutions developed helped solving exactly this problem, making drug development much easier to plan and calculate.

While the problem could have been tackled in various different ways, the Barcelona GSE team focused on initially developing a solid modeling framework. This represented a risky extra effort in the beginning. In fact more than half of the competition period passed without any forecast submission by the Barcelona GSE team. However, the initial effort clearly paid off: as soon as the obstacle was overcome, the “Non-Juvenile Asymptotics” were able to benchmark multiple models at rocket speed.

Fierce competition until the very last minute

The competition was a head-to-head race until the last minute. Still in first place until minutes before the final deadline, the predictions of two teams from Hungary and Spain ended up taking the lead by razor sharp margins.

Congratulations to the winners!!!

Group photo of the team outside the entrance of Universitat Pompeu Fabra
The team at Ciutadella Campus (UPF)

Connect with the team

BGSE “Just Peanuts” qualifies for Data Science Game finals in Paris

A team of Barcelona GSE Data Science students from the Class of 2017 will compete in the final round of the Data Science Game in Paris at the end of September. 

data science game

A team of Barcelona GSE Data Science students from the Class of 2017 will compete in the final round of the Data Science Game in Paris at the end of September.

Among 400 international teams from 220 universities that participated in the first round, the BGSE team is among the 20 teams who have qualified for the final. The team is called “Just Peanuts” and its members are Roger Garriga, Javier Mas, Saurav Poudel, and Jonas Paul Westermann.

In the following interview, they talk about the Data Science Game and their expectations for the final.

What is the Data Science Game?

The Data Science Game is an annual Data Science competition for university students organized by ENSAE (Paris). Teams of up to four people can participate and represent their university. There is a free-for-all qualification round online and the top 20 teams are invited to the Finale in Paris.

Why did you decide to participate?

During the course we already took part in one data science challenge as part of the Computational Machine Learning course. That was quite fun and we have been generally wanting to take part in Kaggle-like challenges throughout the year. On top of that, we of course need to represent the Barcelona GSE and put the word out about our amazing Master’s.

Can you explain the task your team had to perform in the first round of the game?

The challenge for the online qualification round was related to predicting user’s music preferences. Data was provided by Deezer, a Music streaming service based in France. The training dataset consisted of 7+ million rows each pertaining to one user-song interaction describing weather the user listened to the song (for longer that 30 seconds) or not and whether the song was suggested to the user by the streaming service as well as further variables relating to the song/user.

How/by whom was the first round judged/scored?

The online round was hosted on Kaggle, a common website for these kinds of data science prediction challenges. Scoring was done according to the ROC AUC metric (reciever operator characteristic Area under the curve).

Was it difficult to combine participating in the game with your courses and assignments in the master program?

As we started really investing time into the challenge only quite late (about two weeks before the end) we spent a lot of time during the final days. The last 120 hours before submission were probably entirely spent on the challenge which definitely cut into our normal working schedules. Especially the last weekend before the deadline was very intense and spent mostly sitting shirtless at the table of a very overheated apartment living off frozen pizza and chips.

What specifically from the master’s helped you succeed in the game?

Part of the final model we used and what also made the first miles in terms of achieving a good score was a library recommended by one of the PhD students who also give lectures in our course. But also beyond that, we used all kinds of background knowledge and experience gained from the course. A constant scheme during the challenge were problems with difference in distribution and construction of the training and testing datasets. This gave inaccurately high cross-validation results and made it difficult to assess the quality of predictions.

Another issue was simply the size of the data that meant training and parameter tuning were extremely time consuming and we needed to expand our infrastructure beyond our own laptops. For both of those problems we’ve talked about possible solutions during the Master’s and applied combinations thereof.

What will you have to do for the final round? Can you tell us about your strategy or will that give too much information to the other teams?

The final round will be a two-day hackathon-like data science challenge on-site in Paris. No information has been shared with us on details of the challenge but we are thinking it might be something related to sound processing to continue the theme from part one.

How can we follow your progress in the competition?

We will surely be writing an update after the Paris trip and probably also give some social media updates during the event.