2017 TREC Precision Medicine Track

2017 Precision Medicine Track

The 2017 track focus on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients.

We will be using synthetic cases created by precision oncologists at the University of Texas MD Anderson Cancer Center. Each case will describe the patient's disease (type of cancer), the relevant genetic variants (which genes), basic demographic information (age, sex), and other potential factors that may be relevant. The cases are semi-structured and require minimal natural language processing.

Participants of the track will be challenged with retrieving (1) biomedical articles, in the form of article abstracts (largely from MEDLINE/PubMed), addressing relevant treatments for the given patient, and (2) clinical trials (from ClinicalTrials.gov), addressing relevant clinical trials for which the patient is eligible. The first set of results represents the retrieval of existing knowledge in the scientific literature, while the second represents the potential for connecting patients with experimental treatments if existing treatments have been ineffective.

Tentative Schedule

Date	Note
April 2017	Document collection available for download.
May 2017	Applications for participation in TREC 2017 due.
May 2017	Topics available for download.
August 1, 2017	Submission deadline
October 2017	Relevance judgments and individual evaluation scores released.
November 15–17, 2017	TREC 2017 conference at NIST in Gaithersburg, MD, USA.

Task Description

Documents

There are two target document collections for the Precision Medicine track: scientific abstracts and clinical trials. Both XML and TXT versions are available for both sets. Note that the XML is the official collection, as it has the complete information for each abstract/trial. The TXT versions are provided for ease of processing, but no guarantees are made that all information is contained within these files.

Obtaining the Collection

Scientific Abstracts: A January 2017 snapshot of PubMed abstracts is used for the scientific abstracts. Additionally, abstracts obtained from AACR and ASCO proceedings are included in this category (these are more targeted toward cancer therapy, and likely to include precision medicine studies not in PubMed). These are only available as TXT files, and the file name (without extension) should be used as the ID in the submission files.

Clinical Trials: An April 2017 snapshot of ClinicalTrials.gov is used for the clinical trial descriptions.

clinicaltrials_xml.tar.gz [700 MB]
clinicaltrials_txt.tar.gz [275 MB]

Topics

The topics for the track consist of synthetic patient cases created by MD Anderson precision oncologists. The topics consist of the disease, genetic variants, demographic, and potentially other information about the patients. For example:

	Patient1	Patient2
Disease:	Acute lymphoblastic leukemia	thyroid cancer
Variant:	ABL1, PTPN11	RET, BRAF
Demographic:	12-year-old male	63-year-old female
Other:	No relevant factors	Ecog grade of 2

Obtaining the Topics

The topics will be provided below once available:

topics2017.xml

The topics are formatted in XML:

<topics task="2017 TREC Precision Medicine"> <topic number="1"> <disease>Acute lymphoblastic leukemia</disease> <gene>ABL1, PTPN11</gene> <demographic>12-year-old male</demographic> <other>No relevant factors</other> </topic> ... </topics>

Additionally some extra topics are available, accompanied by abstract and clinical trial IDs that were found to be (at least partially) relevant:

extra_topics2017.pdf

Evaluation

The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs for each corpus (scientific abstracts and clinical trials), each consisting of a ranked list of up to one thousand IDs (PMIDs for MEDLINE abstracts, provided IDs for extra abstracts (part of file name), and NCT IDs for trials). The highest ranked results for each topic will be pooled and judged by physicians trained in medical informatics.

Assessors will be instructed to judge abstracts and clinical trials according to each of the four topic dimensions (disease, gene, demographic, and other). Each of these corresponds to 3-4 categories (e.g., a disease can be an "exact", "more general", "more specific", or "not disease" match). Please read the Relevance Guidelines for more details.

Scientific Abstracts: The goal of retrieving scientific abstracts is to identify relevant articles for the treatment, prevention, and prognosis of the disease under the specific conditions for the given patient. Abstracts discussing information not useful for these goals will not be considered relevant.

Clinical Trials: The goal of retrieving clinical trials is to identify trials for which the given patient is eligible to enroll, or would have been eligible to enroll had the trial been open. The timing and location of the trial are not factors in determining relevance, only the eligibility criteria.

As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.

Submission Instructions

The submission deadline is August 1, 2017.

Submission File Format

The format for run submssions is standard trec_eval format. Each line of the submission file should follow the form:

TOPIC_NO Q0 ID RANK SCORE RUN_NAME

where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, ID is the identifier of the retrieved document (PMID or NCT ID), RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation). The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:

PubMed Abstracts:

1 Q0 28348404 1 0.9999 my-run

The above line indicates that the run named "my-run" retrieves for topic number 1 document 28348404 at rank 1 with a score of 0.9999.

Clinical Trials:

1 Q0 01155453 1 0.9999 my-run

The above line indicates that the run named "my-run" retrieves for topic number 1 document 01155453 at rank 1 with a score of 0.9999.

TREC Precision Medicine / Clinical Decision Support Track