Division of Health Data Science

What we do

In 2009, the HITECH (Health Information Technology for Economic and Clinical Health) act was passed the United States and has since spurred the rapid adoption of EHRs (Electronic Health Records). Every day, hospitals around the U.S. and at Duke generate enormous volumes of data that contain critical information about the care that is provided to patients. However, the tools necessary to utilize this data to better inform care and to create technology that augments clinical decision making are still in their infancy. Examples of successful integration of machine-learning models into real-world clinical practice is even more rare. The Woo Center’s Division of Health Data Science aims to utilize both machine-learning expertise as well as close collaborations with clinical faculty and organizations such as the Duke Institute for Health Innovation to transform data from electronic health records and other clinical data sources into actionable insights. Utilizing state-of-the-art machine learning, software engineering, and clinical expertise, the Health Data Science division seeks to both develop and integrate novel and effective solutions to better diagnose patients, provide useful insights to clinicians, predict adverse outcomes, and ultimately create an improved patient experience. By doing so, we hope to catalyze the formation of true Learning Health Systems.

Example projects:

  • Adult Decompensation (2019 DIHI RFA). Many patients who are admitted to hospitals are put on a path to recovery. However, the trajectory of this recovery can sometimes unexpectedly change into a state of decompensation, or the deterioration of a patient’s overall condition. Due to the complex, multifaceted nature of a patient’s recovery process, it can be difficult to predict when a patient will decompensate or even detect the presence of deterioration. If it is not detected, decompensation can lead to the activation of a Rapid Response Team (RRT) or other severe interventions. Currently, the caregiving process is reactive rather than proactive, and there is potential to improve patient outcomes by predicting when events that are a result of decompensation will occur. Our team is working to develop a real-time prediction model that will predict patient decompensation hours before adverse events in hopes of triggering advanced care early enough to prevent adverse outcomes. By leveraging the vast quantity of electronic health record data, we hope to be able to augment the decision-making capabilities of nurses and physicians currently delivering care to patients.

  • Evaluation Framework for Real-Time Inpatient Machine Learning Models. Machine Learning models that are used as decision-support tools within the inpatient setting often make predictions at a regular time intervals. For example, a machine learning model that is being run on patients at Duke University Hospital to predict sepsis generates predictions every hour about a patient’s likelihood of developing sepsis. Unlike a model which only needs to generate a sole prediction, models which make sequential predictions have an added layer of complexity in terms of properly evaluating and comparing their relative effectiveness. Our team is exploring these complexities in evaluation -- specifically within a clinical setting -- and developing open-source tools to enable similar evaluations to take place in other settings.

  • Integration and Evaluation of Clinical Notes in Medical Predictions. Although Electronic Health Record systems contain large amounts of structured data, much of the important information about a patient’s health and current status is included within clinical notes. Although there has been some research into utilizing the unstructured information in clinical notes to make better predictions about a patient’s status or their future outcomes, there is still a knowledge gap around the barriers to implementing a production-level system that would incorporate provider notes as part of machine learning models. Our team is exploring the ways that notes are written, modified, and streamed to production systems and the additional benefit that natural language processing techniques could add to both existing and novel machine-learning based solutions in health systems.

How to participate

The Woo Center works to improve health care through educational experiences, research projects, and entrepreneurial opportunities for Duke faculty and students in collaboration with clinical and industry partners worldwide.

The following are ways in which you can participate in this work:


Students interested in gaining valuable experience in this Division are encouraged to forward your CV directly to the Lead Data Scientist. If there is an opportunity that matches your interests, you will be contacted for an interview.

Potential collaborators

Clinical staff within the Duke Health system with a potential project should complete the Contact form below, and include your name, department, contact details, and a description of your proposed project. The Woo Center is unfortunately unable to accept all proposals due to limited resources, but will select projects based on their greatest impact on health care and health care processes.

Meet the team

Suresh Balu | suresh.balu@duke.edu
Division Lead

Suresh Balu serves as Associate Dean for Innovation and Partnership for the School of Medicine and as Program Director, for the Duke Institute for Health Innovation (DIHI). In his role as Associate Dean, Suresh is responsible for creating, implementing, and sustaining innovation and partnership initiatives for the School of Medicine, specifically, to support the strategic priorities for clinical and translational research. As Program Director for DIHI, Suresh works closely with Duke Health leadership to develop innovation frameworks and approaches across healthcare delivery, education and research. 


Michael Gao | michael.gao@duke.edu
Lead Data Scientist

Michael Gao is a Data Enthusiast who is interested in leveraging modern computing tools and statistical/machine learning approaches to solve problems in health care, technology, and beyond. His research interests include statistical software, bayesian inference, machine learning, and implementation of such methods into real-world operational settings. 


Anders Dohlman | anders.dohlman@duke.edu
Graduate Fellow

Anders Dohlman received his B.A. from Wesleyan University in 2015, with a double major in Mathematics and Biology. Prior to graduate school, Anders worked with researchers at UNC, Harvard Medical School, and Mount Sinai Hospital, applying methods from bioinformatics, systems biology, and machine learning. At Duke, he studies host-microbe interactions in cancer using high-throughput sequencing data. Anders is also an IBIEM fellowship recipient and internship coordinator with the Woo Center for Big Data and Precision Health.  


Nicholas Giroux | nicholas.giroux@duke.edu
Graduate Fellow