The 2017 track focus on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients.
We will be using synthetic cases created by precision oncologists at the University of Texas MD Anderson Cancer Center. Each case will describe the patient's disease (type of cancer), the relevant genetic variants (which genes), basic demographic information (age, sex), and other potential factors that may be relevant. The cases are semi-structured and require minimal natural language processing.
Participants of the track will be challenged with retrieving (1) biomedical articles, in the form of article abstracts (largely from MEDLINE/PubMed), addressing relevant treatments for the given patient, and (2) clinical trials (from ClinicalTrials.gov), addressing relevant clinical trials for which the patient is eligible. The first set of results represents the retrieval of existing knowledge in the scientific literature, while the second represents the potential for connecting patients with experimental treatments if existing treatments have been ineffective.
|April 2017||Document collection available for download.|
|May 2017||Applications for participation in TREC 2017 due.|
|May 2017||Topics available for download.|
|August 1, 2017||Submission deadline|
|October 2017||Relevance judgments and individual evaluation scores released.|
|November 15–17, 2017||TREC 2017 conference at NIST in Gaithersburg, MD, USA.|
There are two target document collections for the Precision Medicine track: scientific abstracts and clinical trials. Both XML and TXT versions are available for both sets. Note that the XML is the official collection, as it has the complete information for each abstract/trial. The TXT versions are provided for ease of processing, but no guarantees are made that all information is contained within these files.
Scientific Abstracts: A January 2017 snapshot of PubMed abstracts is used for the scientific abstracts. Additionally, abstracts obtained from AACR and ASCO proceedings are included in this category (these are more targeted toward cancer therapy, and likely to include precision medicine studies not in PubMed). These are only available as TXT files, and the file name (without extension) should be used as the ID in the submission files.
Clinical Trials: An April 2017 snapshot of ClinicalTrials.gov is used for the clinical trial descriptions.
The topics for the track consist of synthetic patient cases created by MD Anderson precision oncologists. The topics consist of the disease, genetic variants, demographic, and potentially other information about the patients. For example:
|Disease:||Acute lymphoblastic leukemia||thyroid cancer|
|Variant:||ABL1, PTPN11||RET, BRAF|
|Demographic:||12-year-old male||63-year-old female|
|Other:||No relevant factors||Ecog grade of 2|
The topics will be provided below once available:
The topics are formatted in XML:
Additionally some extra topics are available, accompanied by abstract and clinical trial IDs that were found to be (at least partially) relevant:
The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs for each corpus (scientific abstracts and clinical trials), each consisting of a ranked list of up to one thousand IDs (PMIDs for scientific abstracts, NCT IDs for trials). The highest ranked results for each topic will be pooled and judged by physicians trained in medical informatics. Assessors will be instructed to judge articles as either "definitely relevant", "partially relevant," or "not relevant." See below for how these apply to the two collections. Because we plan to use a graded relevance scale, the performance of the retrieval submissions will be measured using normalized discounted cumulative gain (NDCG).
Scientific Abstracts: The goal of retrieving scientific abstracts is
to identify relevant articles for the treatment of the disease under
the specific conditions for the given patient. Abstracts discussing
information not useful for treatment will be considered not relevant.
Definitely Relevant: The abstract provides useful treatment information for this exact patient, in terms of the disease and variant(s).
Partially Relevant: The abstract provides useful potentially useful treatment information for this patient but is not an exact match.
Not Relevant: The abstract does not provide any useful treatment information for this patient.
Clinical Trials: The goal of retrieving clinical trials is to
identify trials for which the given patient is eligible to enroll, or
would have been eligible to enroll had the trial been open.
The timing and location of the trial are not factors in determining
relevance, only the eligibility criteria.
Definitely Relevant: The patient is eligible for the clinical trial.
Partially Relevant: The trial covers the correct disease, but some factor likely makes the patient not strictly eligible.
Not Relevant: The patient is not eligible for the trial in any way.
As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.
The submission deadline is currently projected to be late July 2017.
The format for run submssions is standard trec_eval format. Each line of the submission file should follow the form:
where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, ID is the identifier of the retrieved document (PMID or NCT ID), RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation). The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:
The above line indicates that the run named "my-run" retrieves for topic number 1 document 28348404 at rank 1 with a score of 0.9999.Clinical Trials:
The above line indicates that the run named "my-run" retrieves for topic number 1 document 01155453 at rank 1 with a score of 0.9999.