Data Provided

Data Provided

The data consist of Gene Expression (GEx) levels measured in whole blood samples from human (Hs) or mouse (Mm) individuals. Human samples were collected during QASMC, ZRHR-REXC-03-EU and ZRHR-REXC-04-JP clinical studies or obtained from biobanked repository (BLD-SMK-01). Mouse samples were collected during the two inhalation studies described above using C57BL/6 and Apoe-/- mouse strains. The groups and the number of individuals/samples per group for each independent study are defined in Table 3. More detailed descriptions of the study designs, subjects, and data generation and processing are provided in the Technical document. Total RNA isolated from blood samples were processed and hybridized on a GeneChip Human Genome U133 Plus 2.0 Array from Affymetrix. Raw data (CEL files) were processed and normalized per dataset using frozen Robust Microarray Analysis (fRMA) (1) and custom brainarray cdf files (2), and quality checked. Samples that did not pass quality control criteria were excluded. The groups and final number of subjects/samples per group in each dataset after data quality check are defined in Table 3.

Species

Study code

Smokers (Hs)/3R4F (Mm)

Former smokers (Hs)/Cessation (Mm)

Never smokers (Hs)/Sham (Mm)

RRP

Switch

Hs

dset1 (Train)

109

58

60

-

-

Hs

dset2 (Test)

X

X

X

-

-

Mm

dset4 (Train)

40

27

45

X

X

Mm

dset5 (Test and *Verif*)

X

X

X

*X

*X

Table 3: Composition of training, testing, and verification datasets. The numbers indicate the number of samples available for the corresponding group (after quality control). For the data provided for class prediction, which contain samples from the testing and verification sets, the numbers are replaced by X. The scoring will consider only samples from the testing set, and not from the verification set because these samples are used for verification only.

Timelines
The test and verification datasets will be released in sequentially in two subsets. After a fixed date, the submission of the predictions for the first data subset will be closed, and the second data subset will be released.

Ensure that you are logged in before trying to download the data.

Download the data

Only normalized GEx matrices (Table 4) are provided to the participants. The CEL files are not provided but they will be released to the participants after the challenge closure.

Identifier

Sample1

Sample2

SampleM

Gene1

Expr as log2 expij

Gene2

GeneN

Table 4: Normalized GEx matrix format. Each cell contains the log2 normalized expression value of the corresponding gene and subject. A sample is associated to one subject. M corresponds to the total number of samples provided in the training set or test and verification sets.

Metadata that are relevant for the sub-challenge will be provided to the participants.

References:

  1. McCall, M. N., Bolstad, B. M., and Irizarry, R. A. (2010) Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242-253
  2. Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., Jones, E. G., Bunney, W. E., Myers, R. M., Speed, T. P., Akil, H., Watson, S. J., and Meng, F. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research 33, e175

Share this page