Computational Challenge

Computational Challenge

Objective of the Challenge

The Challenge is now closed.

The Markers of Exposure Response Identification challenge aims to verify that robust and sparse human-specific or species-independent gene signatures predictive of smoking or cessation status can be extracted from whole blood gene expression data from human, or human and rodent. Participants will develop inductive (e.g., the signature model can be applied to a single new sample without retraining) rather than transductive (e.g., training and test set processed together and used to retrain models prior to classification prediction) signature models to classify subjects as smokers versus non-current smokers, and then former smokers versus never smokers. For these 2-class problem predictions, participants will be provided with training datasets, and will have the freedom to use other publicly available gene expression dataset(s). Trained signature models will be applied directly to predict the class of totally independent and unseen blood gene expression samples (test set), also including samples from subjects who have been exposed to Reduced-Risk Products* (RRP) or who switched to RRP* after smoking conventional cigarettes (verification set, see details below).

>> View the Challenge video here !

Why Participate in this Challenge?

Be part of the scientific community developing the 21st century predictive systems toxicology.

In addition, you will:

  • Gain early access to new high-quality and big data
  • Receive an independent assessment of your methods
  • Contribute to writing peer-reviewed scientific articles describing the outcome of the challenge
  • Grow your professional network by engaging with researchers from around the world

Therefore, the new proposed computational challenge titled “Markers of Exposure Response Identification” articulates scientific questions around this problem.



Microarray-based phenotype prediction

Microarray-based technologies are the most widely used platforms for measuring whole-genome gene expression levels. Despite the goal to gain biological insights in various experimental conditions, microarray data have been used extensively to develop classification models that include the identification of gene signatures predictive of disease phenotypes, tumor subtyping, adverse drug response, and treatment outcome (1,2,3,4). In many cases, models are trained on a subset of a dataset while the other subset of the same dataset is used for prediction. This process can lead to model overfitting with poor classification performance when the model is applied on new independent datasets, indicative of limited generalization of the model. The Diagnostic Signature Challenge (DSC) organized in the sbv IMPROVER framework was designed to assess to what extent models trained on transcriptomics data available in public repositories could predict disease phenotypes of individual subjects using totally new unrelated datasets. The challenge additionally evaluated whether some computational approaches performed better than others in this task within and across diseases. The outcomes of the challenge are summarized in Tarca et al.’s publication (5). One interesting aspect was that most of the classification models developed were transductive (e.g., training and test set processed together and used to retrain models prior to classification prediction) rather than inductive (e.g., the signature model can be applied to a single new sample without retraining), which may be problematic when a single sample needs to be classified. The present sbv IMPROVER computational challenge proposes a new classification problem that will be constrained to this latter aspect.

Application to toxicogenomics?

The exposure to cigarette smoke is a major risk factor for the development of various diseases (e.g., cardiovascular and lung diseases) (6). Cigarette smoke contains thousands of constituents. Some of those constituents (or cigarette smoke-derived metabolites) that pass into the blood circulation elicit systemic effects distal from the lungs. For example, changes in gene expression in circulating peripheral blood cells are associated with several systemic immune and inflammatory-related disorders (7,8). Smoking cessation has been shown to revert some cigarette smoke-induced functional and molecular changes back to normal or to intermediate levels that are dependent on the subject’s smoking history (e.g., smoking duration, consumption) and cessation period (9,10,11). The identification of specific markers of response to smoking or cessation in whole blood cells is important to monitor the exposure status of an individual subject. Therefore, blood samples collected in independent clinical studies that included conventional cigarette smokers, former smokers, and never smokers have been profiled for gene expression and the data are provided to the participants for the challenge as training and test sets.

The innovation of tobacco products termed as RRPs* with lower risks for health is critical to reduce the incidence of smoking-related diseases (12). Heating rather than burning tobacco products has been reported to markedly decrease the amount of harmful constituents in the aerosol (13,14). Recent investigations on the biological impact of heat-not-burn technology-based RRPs* have shown significant reduced exposure effects related to lung inflammation and oxidative stress responses as well as vascular functions (e.g., transmigration, chemotaxis) compared with a conventional cigarette (11,15,16). In addition to conventional cigarette or cessation treatment groups, experimental study designs have included switching to RRP* (subject smoking a conventional cigarette who switched to an RRP*) treatment groups to assess the biological impact following exposure to RRP* in those groups and to determine whether the level of perturbation is close to the level observed after smoking or cessation. Reduced exposure studies are designed to help us understand if the exposure to harmful and potentially harmful constituents (HPHCs) is reduced in adult smokers who use RRPs* as compared to cigarettes. Blood samples collected in clinical studies conducted with a RRP*, Tobacco Heating System (THS) 2.2, have been profiled for gene expression and will be provided to participants for verification in the context of the present sbv IMPROVER computational challenge.


Figure 1: Markers of Exposure Response Identification.

  • a. Blood samples are collected from human and mouse subjects in exposed or non-exposed groups.
  • b. Gene expression profiles (GEx) are measured using microarray-based technology.
  • c. Participants are provided with GEx and asked to develop a classification approach that identifies a gene signature capable of classifying subjects to the correct exposure group.

The challenge “Markers of Exposure Response Identification includes two sub-challenges:



  1. Farmer, P., Bonnefoi, H., Anderle, P., Cameron, D., Wirapati, P., Becette, V., Andre, S., Piccart, M., Campone, M., Brain, E., Macgrogan, G., Petit, T., Jassem, J., Bibeau, F., Blot, E., Bogaerts, J., Aguet, M., Bergh, J., Iggo, R., and Delorenzi, M. (2009) A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer. Nature medicine 15, 68-74
  2. Gordon, G. J., Jensen, R. V., Hsiao, L. L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research 62, 4963-4967
  3. Wirapati, P., Sotiriou, C., Kunkel, S., Farmer, P., Pradervand, S., Haibe-Kains, B., Desmedt, C., Ignatiadis, M., Sengstag, T., Schutz, F., Goldstein, D. R., Piccart, M., and Delorenzi, M. (2008) Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast cancer research : BCR 10, R65
  4. Zhang, J. D., Berntenis, N., Roth, A., and Ebeling, M. (2014) Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity. The pharmacogenomics journal 14, 208-216
  5. Tarca, A. L., Lauria, M., Unger, M., Bilal, E., Boue, S., Kumar Dey, K., Hoeng, J., Koeppl, H., Martin, F., Meyer, P., Nandy, P., Norel, R., Peitsch, M., Rice, J. J., Romero, R., Stolovitzky, G., Talikka, M., Xiang, Y., Zechner, C., and Collaborators, I. D. (2013) Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge. Bioinformatics 29, 2892-2899
  6. Messner, B., and Bernhard, D. (2014) Smoking and cardiovascular disease: mechanisms of endothelial dysfunction and early atherogenesis. Arteriosclerosis, thrombosis, and vascular biology 34, 509-515
  7. Faner, R., Gonzalez, N., Cruz, T., Kalko, S. G., and Agusti, A. (2014) Systemic inflammatory response to smoking in chronic obstructive pulmonary disease: evidence of a gender effect. PloS one 9, e97491
  8. Na, H. K., Kim, M., Chang, S. S., Kim, S. Y., Park, J. Y., Chung, M. W., and Yang, M. (2015) Tobacco smoking-response genes in blood and buccal cells. Toxicology letters 232, 429-437
  9. Halvorsen, B., Lund Sagen, E., Ueland, T., Aukrust, P., and Tonstad, S. (2007) Effect of smoking cessation on markers of inflammation and endothelial cell activation among individuals with high risk for cardiovascular disease. Scandinavian journal of clinical and laboratory investigation 67, 604-611
  10. Boue, S., De Leon, H., Schlage, W. K., Peck, M. J., Weiler, H., Berges, A., Vuillaume, G., Martin, F., Friedrichs, B., Lebrun, S., Meurrens, K., Schracke, N., Moehring, M., Steffen, Y., Schueller, J., Vanscheeuwijck, P., Peitsch, M. C., and Hoeng, J. (2013) Cigarette smoke induces molecular responses in respiratory tissues of ApoE(-/-) mice that are progressively deactivated upon cessation. Toxicology 314, 112-124
  11. Phillips, B., Veljkovic, E., Peck, M. J., Buettner, A., Elamin, A., Guedj, E., Vuillaume, G., Ivanov, N. V., Martin, F., Boue, S., Schlage, W. K., Schneider, T., Titz, B., Talikka, M., Vanscheeuwijck, P., Hoeng, J., and Peitsch, M. C. (2015) A 7-month cigarette smoke inhalation study in C57BL/6 mice demonstrates reduced lung inflammation and emphysema following smoking cessation or aerosol exposure from a prototypic modified risk tobacco product. Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association 80, 328-345
  12. Administration, F. a. D. (2012) Modified Risk Tobacco  Product Applications: Draft Guidance for Industry.
  13. Schorp, M. K., Tricker, A. R., and Dempsey, R. (2012) Reduced exposure evaluation of an Electrically Heated Cigarette Smoking System. Part 1: Non-clinical and clinical insights. Regulatory toxicology and pharmacology : RTP 64, S1-10

Share this page