Sub-challenge 2: Species translatable blood gene signature as exposure response marker

Sub-challenge 2: Species translatable blood gene signature as exposure response marker

Background

Most pre-clinical in vivo studies are conducted in rodents, which raises the question of translatability and applicability of results to human. In the precedent challenge edition, Species Translation Challenge (STC), we addressed questions about species translatability and propose to tackle this question again in the context of gene signature identification and phenotype prediction.

Aim

To verify that robust and sparse (maximum of 40 genes) species-independent gene signatures and models can be obtained from human and mouse whole blood gene expression data to predict smoking exposure (smoker vs. non-current smoker) or cessation (former smoker vs. never smoker) status in both species.

Datasets

Human and mouse blood gene expression datasets from independent studies are provided for training and testing. The test dataset includes additional samples (verification data, see below) used only for verification purposes, and will not be considered for scoring.

The blood samples were obtained from our clinical studies or a banked repository (human samples), and from in vivo mouse studies (mouse samples):

  • Human training dataset (dset1): The Queen Ann Street Medical Center (QASMC) clinical case–control study was conducted at The Heart and Lung Centre (London, UK), according to Good Clinical Practices (Study description available here).
  • Human test dataset (dset2): Blood samples were obtained from a banked repository (BioServe Biotechnologies Ltd., Beltsville, MD, USA) based on well-defined inclusion criteria, and are referenced as BLD-SMK-01.

 In addition to the Informed Consent Form (ICF) for the participation in these studies, subjects were provided with information and asked for their consent to collect blood samples for bio-banking for transcriptomics profiling. The blood sampling for transcriptomics and the data related to these samples were anonymized. Anonymized data and samples were initially single or double coded where the link between the subjects’ identifiers and the unique code(s) was subsequently deleted.

  •  Mouse training dataset (dset4): A 7-month cigarette smoke inhalation study was conducted with C57BL/6 mice (1). The study design includes 5 different groups, however, only 3 groups corresponding to 3R4F-exposed mice (3R4F), mice exposed to air after 2month-3R4F exposure (Cessation), and mice continuously exposed to air (Sham) will be provided for training.
  •  Mouse test and verification dataset (dset5): A 8-month cigarette smoke inhalation study was conducted with Apoe-/- mice. The study design includes 5 different groups cooresponding to 3R4F-exposed mice (3R4F), mice exposed to air after 2month-3R4F exposure (Cessation), mice exposed to RRP (THS2.2), mice exposed to RRP (THS2.2) after 2month-3R4F exposure (Switch) and mice continuously exposed to air (Sham) will be provided for testing and verification as shown in the schema below.

 The blood sample collection and all procedures involving animals was performed in an Association for Assessment and Accreditation of Laboratory Animal Care International(AAALAC)-accredited, Agri-Food & Veterinary Authority of Singapore-licensed facility with approval from an Institutional Animal Care and Use Committee (IACUC protocol #15015), in compliance with the National Advisory Committee for Laboratory Animal Research Guidelines on the Care and Use of Animals for Scientific Purposes (NACLAR, 2004).

 The schema below provides a description of the composition of the datasets. The datasets provided for training are described in more detail in the “Data provided” sections and the Technical Document.

Note: the 3R4F (conventional reference cigarette), cessation, and sham groups in mouse studies correspond to smokers, former smokers, and never smokers in human studies, respectively.

Scientific questions

  • Are whole blood gene expression changes in human and mouse, sufficiently informative to define unique signature models that can be applied directly without retraining on both human and mouse samples to predict smoking exposure (smoker vs non-current smoker) or cessation (former smoker vs never smoker) status?
  • How do samples from the verification set classify?

Classification models

Participants are requested to develop inductive rather than transductive signature models to predict the sample class (for details see the “Background: Microarray-based phenotype prediction” section). Species-independent signature model(s) will be developed: to predict smoking exposure status discriminating smoker vs never smoker, and to predict cessation status discriminating former smoker vs never smoker. After training, unique species-independent signature model/classifier(s) must be applied directly without retraining on both species data (test and verification sets) to predict the sample class. The gene signature must be sparse with a maximum of 40 genes.

Stepwise class predictions

The participants are requested to proceed with class predictions stepwise as follows:

 

 

Step 1: The trained signature model will be applied on unlabeled sample data (test and verification sets) to classify samples as smoker or non-current smoker (including samples from former smokers and never smokers) with associated confidence level.

Step 2: The second trained signature model will be applied exclusively on samples predicted as non-current smokers in step 1 to classify those samples as former smoker or never smoker with associated confidence level.

Participants have the freedom to use two separate models for 2-class prediction for each step, or directly a 3-class prediction model.

References:

  1. Phillips, B., Veljkovic, E., Peck, M. J., Buettner, A., Elamin, A., Guedj, E., Vuillaume, G., Ivanov, N. V., Martin, F., Boue, S., Schlage, W. K., Schneider, T., Titz, B., Talikka, M., Vanscheeuwijck, P., Hoeng, J., and Peitsch, M. C. (2015) A 7-month cigarette smoke inhalation study in C57BL/6 mice demonstrates reduced lung inflammation and emphysema following smoking cessation or aerosol exposure from a prototypic modified risk tobacco product. Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association 80, 328-345

Share this page