2016 TREC Clinical Decision Support Track

2016 Clinical Decision Support Track

Similar to the 2014 and 2015 track, the focus of the 2016 Clinical Decision Support Track will be the retrieval of biomedical articles relevant for answering generic clinical questions about medical records. Unlike previous years, actual electronic health record (EHR) patient records will be used instead of synthetic cases.

We will be using admission notes from MIMIC-III. An admission note describes a patient's chief complaint, relevant medical history, and any other information obtained during the first few hours of a patient's hospital stay, such as lab work. Specifically, MIMIC-III focuses on ICU (Intensive Care Unit) patients. Unlike the synthetic cases used in previous years, admission notes are the actual data generated by clinicians (mostly physicians, including residents, and nurses). They contain a significant number of abbreviations as well as other linguistic jargon and style.

Participants of the track will be challenged with retrieving full-text biomedical articles that answer questions related to several types of clinical information needs for a given EHR note. Each topic will consist of a note and one of three generic clinical question types, such as "What is the patient's diagnosis?" Retrieved articles will be judged relevant if they provide information of the specified type that is pertinent to the given patient. The evaluation of submissions will follow standard TREC evaluation procedures.

Tentative Schedule

Date	Note
March 2016	Document collection available for download.
April 2016	Applications for participation in TREC 2016 due.
June 2016	Topics available for download.
July 27, 2016	Submission deadline
October, 2016	Relevance judgments and individual evaluation scores released.
November 15–18, 2016	TREC 2016 conference at NIST in Gaithersburg, MD, USA.

Task Description

Documents

The target document collection for the track is the Open Access Subset of PubMed Central (PMC). PMC is an online digital database of freely available full-text biomedical literature. Because documents are constantly being added to PMC, to ensure the consistency of the collection, for the 2016 task we obtained a snapshot of the open access subset on March 28, 2016 (Note this is an updated collection from last year's track), which contained a 1.25 million articles. The full text of each article in the open access subset is represented as an NXML file (XML encoded using the NLM Journal Archiving and Interchange Tag Library).

Document Identifiers

Each article in the collection is identified by a unique number (PMCID) that will be used for run submissions. The PMCID is specified by the <article-id> element within each article's NXML file. Please note that although each article is represented by multiple identifiers (e.g., PubMed, PMC, Publisher, etc.), we are only concerned with PMCIDs for this task. The various identifier types are specified using the pub-id-type attribute of the <article-id> element. Valid values of pub-id-type that indicate a PMCID include pmc and pmcid.

For example, the PMCID of article 3148967 may be specified in the article's NXML file as follows.

<article-id pub-id-type="pmc">3148967</article-id>

To make processing the documents easier, we have also renamed each article NXML according to the article's PMCID. For example, the document for article 3148967 is named 3148967.nxml.

Obtaining the Collection

The March 28 snapshot can be downloaded from NLM here:

Each of the 4 files listed above is several GB in size. The article NXMLs in each archive are split into multiple directories to allow for easy directory listings. Please note that the directory structure was created merely as a convenience and is not meant to convey any information about the articles.

Topics

The topics for the track are EHR admission notes curated by physicians from the MIMIC-III data. Specifically, the notes are extracted from the history of present illness (HPI) section of the note, which most resembles the narrative cases used in previous tracks. The HPI describes information such as a patient's chief complaint, medical history, tests performed by a physician to diagnose the patient's condition, possibly the patient's current diagnosis, and finally, any steps taken by medical professionals to treat the patient.

There are many clinically relevant questions that can be asked of a given note. In order to simulate the actual information needs of physicians, the topics are annotated according to the three most common generic clinical question types (Ely et al., 2000) shown in the table below. Participants will be tasked with retrieving biomedical articles useful for answering generic questions of the specified type about each note.

Type	Generic Clinical Question	Number of Topics
Diagnosis	What is the patient's diagnosis?	10 to 15
Test	What tests should the patient receive?	10 to 15
Treatment	How should the patient be treated?	10 to 15

For example, for a note labeled "diagnosis" participants should retrieve PMC articles a physician would find useful for determining the diagnosis of the patient described in the report. Similarly, for a note labeled "treatment," participants should retrieve articles that suggest to a physician the best treatment plan for the condition exhibited by the patient described in the report. Finally, for "test" notes participants should retrieve articles that suggest relevant interventions that a physician might undertake in diagnosing the patient.

We will be providing three versions of the patient records. First, the EHR admission note (only the HPI section, which is the "case"). Second, a more layman-friendly "description" similar to previous tracks, which removes much of the jargon and replaces clinical abbreviation for better readability. Third, a "summary" similar to previous tracks, which is a 1-2 sentence summary of the description.

In order to make the results of the track more meaningful, we require that participants use only EHR notes, only descriptions, or only summaries for any given run submission. Participants are, of course, free to submit multiple runs so that they can experiment with the different representations. Participants will be required to indicate on the run submission form which version of the topics they used. While the note may be more challenging, it is the most realistic topic. Therefore, we will be requiring that participants must utilize the note in a subset of their submissions if they plan to submit the maximum allowed number of runs. Each participant will be allowed up to 5 runs. At most three of those runs may be description or summary versions of the topics. Thus, to submit 5 runs, at least 2 runs must use the note; to submit 4 runs, at least 1 run must use the note. Submitting more than 5 runs, or submitting more than three that do not use the EHR note will result in runs being ignored by NIST assessors.

Once the topics are closer to being ready, a set of example notes, descriptions, and summaries will be provided.

Obtaining the Topics

The topics will be provided below once available:

topics2016.xml

Topic numbers are specified using the number attribute of each <topic> element and topic types (i.e., diagnosis, test, and treatment) are specified with the type attribute. Topic descriptions are given in <description> elements and topic summaries are given in <summary> elements. Below is an example of the format.

<topics> <topic number="1" type="test"> <note>Patient admission note</note> <description>Description of topic 1</description> <summary>Summary of topic 1</summary> </topic> ... </topics>

Sample Topics

Participants are free to use previous years topics and judgements in designing their systems (though these do not contain an EHR note):

topics2014.xml
qrels-treceval-2014.txt (for standard measures)
qrels-sampleval-2014.txt (for inferred measures)
topics2015A.xml
topics2015B.xml
qrels-treceval-2015.txt (for standard measures)
qrels-sampleval-2015.txt (for inferred measures)

Additionally, the 2013 ImageCLEF medical task utilized similar cases:

case-based-topics-imageclef-2013.xml

Please take caution when using the ImageCLEF topics. They are provided here only for reference. We have reformatted them somewhat to match this track's topic format, but differences remain. In particular, the ImageCLEF topics only contain the shorter <summary> tags, and all the topics should be considered to be of type diagnosis.

Evaluation

The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs (with the requirement of using the EHR note as explained above), each consisting of a ranked list of up to one thousand PMCIDs. The highest ranked articles for each topic will be pooled and judged by medical librarians and physicians trained in medical informatics. Assessors will be instructed to judge articles as either "definitely relevant" for answering questions of the specified type about the given patient, "definitely not relevant," or "potentially relevant." The latter judgement may be used if an article is not immediately informative on its own, but the assessor believes it may be relevant in the context of a broader literature review. Because we plan to use a graded relevance scale, the performance of the retrieval submissions will be measured using normalized discounted cumulative gain (NDCG).

As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.

Submission Instructions

The submission deadline is currently projected to be late July 2016.

Submission File Format

The format for run submssions is standard trec_eval format. Each line of the submission file should follow the form:

TOPIC_NO Q0 PMCID RANK SCORE RUN_NAME

where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, PMCID is the PubMed Central identifier of the retrieved document, RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation). The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:

1 Q0 3148967 1 0.9999 my-run

The above line indicates that the run named "my-run" retrieves for topic number 1 document 3148967 at rank 1 with a score of 0.9999.

TREC Precision Medicine / Clinical Decision Support Track