Species Translation Challenge - Scoring

Species Translation Challenge - Scoring


Gold Standard

For each of sub-challenges 1, 2, and 3, the submissions will be scored by comparing the submissions to the “Gold Standard”, that is unseen by the participants and based on the actual measured activation state of each phosphoprotein and stimulus (for sub-challenges 1 and 2) or the actual gene set enrichment (for sub-challenge 3).  The experimental data used to generate the Gold Standard were processed and normalized in a similar fashion to the data in the training sets provided to the participants.  For sub-challenge 4, the scoring will be made as described below and as in the Challenge Rules.


Scoring Methodology

To establish a fair and meaningful score, that is not biased by any particular performance measure, different metrics will be used and aggregated. A high-level scoring methodology and a list of possible metrics to be utilized at scoring are provided here and in the Challenge Rules. The detailed scoring methods and the final selection of metrics will be determined by the Scoring Review Panel and disclosed once the scoring is completed in accordance with the Challenge Rules. Each sub-challenge is self-contained and will be scored independently.

For binary classification in sub-challenges 1, 2 and 3, the scoring will take place in following steps:

  • Compare the predictions to the Gold Standard and compute a performance measure with a combination of two or more of the following possible metrics selected by IBM personnel and validated by the Scoring Review Panel (the selection of the metrics and the validation of those metrics will be based on the application of scientific principles):
    • Area Under Precision Recall curve (AUPR): for this metric, the order of the predictions will be given by the submitted confidence scores and the Gold Standard will be binarized.
    • Area Under Receiver Operating Characteristic curve (AUROC): for this metric, the order of the predictions will be given by the submitted confidence scores and the Gold Standard will be binarized.
    • Jaccard similarity or Matthews correlation coefficient: for this metric, both the submission confidence scores and the Gold Standard will be binarized.
    • Spearman and Pearson correlations.
    • Different measures of Accuracy.
    • Other metrics may be used upon suggestion of the Scoring Review Panel, as necessary.
  • Compute the z-score from the team prediction versus a reference distribution.

For sub-challenge 4, the Submissions will be scored based on the quality of the submitted networks and on scientific merit determined from the Submission’s write-up for the network inference. Subchallenge 4 will be scored only when there are at least five valid Submissions. The scoring will be based on the Scoring Review Panel’s scientific assessment of the quality of the Submissions including the following criteria: a comparison of the submitted network and the “true network” (a network created with more information than given to the Subchallenge participants), the scientific rigor of the method, the justification of the methodology, the clarity of the writing, and the originality of the method, all with reference to the current knowledge base in this area as reflected in the scientific and academic literature. Overall, the scoring analysis is conducted to ascertain whether the Submissions are genuine predictions based on scientific analysis with scientific merit.


Tie Resolution

If several teams reach the same score, the Scoring Review Panel will perform a scientific review of the available information in order to distinguish the submissions as further described in the Challenge Rules. In the case of the tie persisting, the incentives will be allocated per the Challenge Rules.


Scorers and Scoring Review Panel

A team of researchers from the IBM T.J. Watson Research Center in New York (USA) will establish a scoring methodology and perform the scoring on the blinded submissions under the review of an independent Scoring Review Panel.

The sbv IMPROVER Scoring Review Panel* consists of the following experts in the field of systems biology:

  • Dr. Leonidas Alexopoulos, National Technical University of Athens / Protatonce
  • Dr. Jim Costello, Harvard Medical School
  • Prof. Rudiyanto Gunawan, Swiss Federal Institute of Technology (ETH) Zurich
  • Prof. Torsten Schwede, Biozentrum University of Basel & Swiss Institute for Bioinformatics
  • Prof. Alfonso Valencia, Spanish National Bioinformatics Institute (INB)

The sbv IMPROVER Scoring Review Panel will review the scoring process for the Species Translation Challenge to ensure fairness and transparency.

*Additional members may be added to the Scoring Review Panel during the open phase of the challenge.




  • Blinded Scoring: Submissions will be anonymized before scoring, so that both the scorer and the Scoring Review Panel do not have access to the identity of the participating teams or the members of the teams. To help us maintain this, the Submissions must not include any information regarding the identity or affiliations of the team or the members of the team.
  • Submissions and Significance: For each sub-challenge, a minimum of 5 submissions are required. One of the submissions must be statistically significant in at least one metric, at a level of significance given by a false discovery rate of 0.05. This false discovery rate will take into account the number of submissions as a multiple comparison problem. If these requirements are not met for a particular sub-challenge, the challenge organizers retain the right not to declare a sub-challenge best performer in accordance with the Challenge Rules.
  • Timelines: The scoring process will start as soon as the relevant sub-challenge has been closed (timings are given in the Challenge Rules). If all conditions are met and the open period of the sub-challenge is not extended, the anonymized ranking of the participating teams will be disclosed and the best performers per sub-challenge will be informed via email on 20 September 2013.


Please contact us should you have further questions.






Share this page