The 2017 track focus on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients.
We will be using synthetic cases created by precision oncologists at the University of Texas MD Anderson Cancer Center. Each case will describe the patient's disease (type of cancer), the relevant genetic variants (which genes), basic demographic information (age, sex), and other potential factors that may be relevant. The cases are semi-structured and require minimal natural language processing.
Participants of the track will be challenged with retrieving (1) biomedical articles, in the form of article abstracts (largely from MEDLINE/PubMed), addressing relevant treatments for the given patient, and (2) clinical trials (from ClinicalTrials.gov), addressing relevant clinical trials for which the patient is eligible. The first set of results represents the retrieval of existing knowledge in the scientific literature, while the second represents the potential for connecting patients with experimental treatments if existing treatments have been ineffective.
|April 2017||Document collection available for download.|
|May 2017||Applications for participation in TREC 2017 due.|
|May 2017||Topics available for download.|
|August 1, 2017||Submission deadline|
|October 2017||Relevance judgments and individual evaluation scores released.|
|November 15–17, 2017||TREC 2017 conference at NIST in Gaithersburg, MD, USA.|
There are two target document collections for the Precision Medicine track: scientific abstracts and clinical trials. Both XML and TXT versions are available for both sets. Note that the XML is the official collection, as it has the complete information for each abstract/trial. The TXT versions are provided for ease of processing, but no guarantees are made that all information is contained within these files.
Scientific Abstracts: A January 2017 snapshot of PubMed abstracts is used for the scientific abstracts. Additionally, abstracts obtained from AACR and ASCO proceedings are included in this category (these are more targeted toward cancer therapy, and likely to include precision medicine studies not in PubMed). These are only available as TXT files, and the file name (without extension) should be used as the ID in the submission files.
Clinical Trials: An April 2017 snapshot of ClinicalTrials.gov is used for the clinical trial descriptions.
The topics for the track consist of synthetic patient cases created by MD Anderson precision oncologists. The topics consist of the disease, genetic variants, demographic, and potentially other information about the patients. For example:
|Disease:||Acute lymphoblastic leukemia||thyroid cancer|
|Variant:||ABL1, PTPN11||RET, BRAF|
|Demographic:||12-year-old male||63-year-old female|
|Other:||No relevant factors||Ecog grade of 2|
The topics will be provided below once available:
The topics are formatted in XML:
Additionally some extra topics are available, accompanied by abstract and clinical trial IDs that were found to be (at least partially) relevant:
The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs for each corpus (scientific abstracts and clinical trials), each consisting of a ranked list of up to one thousand IDs (PMIDs for MEDLINE abstracts, provided IDs for extra abstracts (part of file name), and NCT IDs for trials). The highest ranked results for each topic will be pooled and judged by physicians trained in medical informatics.
Assessors will be instructed to judge abstracts and clinical trials according to each of the four topic dimensions (disease, gene, demographic, and other). Each of these corresponds to 3-4 categories (e.g., a disease can be an "exact", "more general", "more specific", or "not disease" match). Please read the Relevance Guidelines for more details.
Scientific Abstracts: The goal of retrieving scientific abstracts is to identify relevant articles for the treatment, prevention, and prognosis of the disease under the specific conditions for the given patient. Abstracts discussing information not useful for these goals will not be considered relevant.
Clinical Trials: The goal of retrieving clinical trials is to identify trials for which the given patient is eligible to enroll, or would have been eligible to enroll had the trial been open. The timing and location of the trial are not factors in determining relevance, only the eligibility criteria.
As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.
The submission deadline is August 1, 2017.
The format for run submssions is standard trec_eval format. Each line of the submission file should follow the form:
where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, ID is the identifier of the retrieved document (PMID or NCT ID), RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation). The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:
The above line indicates that the run named "my-run" retrieves for topic number 1 document 28348404 at rank 1 with a score of 0.9999.Clinical Trials:
The above line indicates that the run named "my-run" retrieves for topic number 1 document 01155453 at rank 1 with a score of 0.9999.