The Challenges

From Epidemium
Jump to: navigation, search

Challenges 1 & 2


June 6 - December 5 2017


The impact of environmental factors, including pollution and socio-economic development, on the incidence of cancer remains unknown. Countries that have adopted similar socio-economic development models and exposed to comparable environmental situations have different cancer incidences. This underlines the need to refine our models of understanding of the epidemiology of cancers. The consequences of a better understanding of the the evolution of cancers and their causal factors are important both from a medical and a public health point of view, as a result of prioritizing the public health actions that would be deduced therefrom, and Better targeting of possible prevention campaigns.

The public recent and massive availability of socio-economic data (economic development, mass consumption), demographic and environmental factors (pollution, exposure to carcinogens, urban planning data) and the development of new approaches to Statistics (machine learning) makes it possible to envisage better predictions of the incidence of the cancers.

By making large data sets widely available and covering the usual epidemiological themes, Epidemium calls on collective intelligence to build models for visualizing and anticipating the spatio-temporal evolution of cancers and thus opens the possibility for all to contribute to the knowledge of the dynamics of cancers and their determinants.


Challenge 1: See cancers
Visualization challenge
Constructing a Data-Visualization of the incidence of cancers by exposing the epidemiological factors associated with their dynamics.

The challenge consists in developing data visualization tools aimed at the general public and the medico-scientific community at large. Particular attention should be given to design with regard to user experience. These tools will have to integrate the variables and results of the prediction challenge, highlighting the dynamic changes in time and space of potential risk factors.

Discover the data

Challenge 2: Foresee cancers
Prediction challenge (the incidence of cancers)

Developing a predictive tool for the progression of cancer in time and space,depending on the known or supposed factors that determine its evolution.

The challenge consists in developing of a predictive tool for the incidence of cancers.The data used to learn the models are detailed below.

The Training set will be composed of the data covering the period from 1950 to 2003.The validation set will be composed of the data covering the period from 2003 to 2007.Note that the data from 2007 to 2012 in the process of being obtained and will be Integrated into the validation set.

This challenge is divided into two distinct parts:

  • prediction in the world,
  • prediction by country.

The winning team will be the one whose prediction will be the most efficient on the validation set. A leaderboard will be set up with real-time updating.

Discover the data

Challenges 3

Challenge 3: Prediction of cancer mortality in developing countries in time and space
Challenge 3 (FR) : lire cette page en français.


October 2017 - March 2018


Growth in developing countries (excluding Africa) imposes cancer as one of the major causes of mortality, even greater than other diseases that used to be the leading cause of death on those continents - namely infectious diseases. Thus, knowing more about cancer and its root causes, and projecting its evolution in time and space, is a decisive issue for both medical research and public health.

Given the particularity of socio-economic contexts and development models in southern countries, cancer epidemiology has, out of doubts, specific components depending on the regions of the world in which it is expressed. To date, it remains a major challenge to improve medical knowledge. And despite the fact that cancer epidemiology is being widely investigated in the northern countries, it still constitutes an uncharted scientific field of knowledge in the southern regions. Besides this, the disease approach in these latter regions is largely inspired by the existing model compensated with a North-South gradient.


This Challenge will strive to match cancer data with population factors (from developing countries, excluding Africa) that are thought to induce or protect against cancer, pursuing an improvement of cancer models that have rarely been explored in these regions of the world.

The focus will be on the most prevalent cancers. According to GLOBOCAN 2012, the three most prevalent cancers are lung cancer (1.8 million cases, 13.0% of all cancers), breast cancer (1.7 million cases, 11.9% of the total) and colorectal cancer (1.4 million cases, 9.7% of the total). These figures are an average, so there may exist disparities among developing countries in the incidence of cancers. An approach by continent, and possibly sub-continent, would be appreciated.


Participants will articulate their analysis from three datasets:

  • A main dataset:
    • An open dataset of cancer mortality worldwide from the World Health Organization (WHO) IARC, broken down by type of cancer, country, year, gender. On average, the depth of accessible data is about 25 years. Among the represented countries are the developing countries, (excluding those of the African continent) and, for comparative purposes, mortality data from industrialized countries.
  • Two complementary datasets:
    • An open dataset of the incidence of cancer worldwide derived from the World Health Organization (WHO) IARC, broken down by type of cancer, country, year, gender. The data has a minimum depth of 15 years;
    • An open dataset of population indicators, produced by the World Bank: data from economic, social, educational, agro-environmental indicators, etc.

Discover the data

Areas of technology

  • Statistics, Machine Learning, Big Data, Temporal Series
  • Python, R and other languages ​​and soft according to the adopted approaches (package of "forecast", tensor flow if use of Networks of Neurons, etc.)