mHealth Biomarkers

A biomarker is an objective indicator of a physical or state that can be observed and measured accurately and reproducibly. The World Health Organization has defined a biomarker as "any substance structure or process that can be measured in the body or its products and influence or predict the incidence of outcome or disease." [1, 2]

Team members

James M. Rehg (Thrust 2 Lead, Georgia Tech)
Deepak Ganesan (UMass Amherst)
Benjamin Marlin (UMass Amherst)
Mustafa al'Absi (Minnesota)
Santosh Kumar (Memphis)
Gregory Abowd (Georgia Tech)
Cho Lam (Utah)
Bonnie Spring (Northwestern)
David Wetter (Utah)

Soujanya Chatterjee (Memphis)
Addison Mayberry (UMass)
Nazir Saleheen (Memphis)
Rumanna Bari (Memphis)

MD2K has two biomedical applications for its research, detection of smoking relapse among abstinent smokers, and of onset of congestion in congestive heart failure patients (CHF). Once identified, researchers focus on the risk factors for smoking lapses and CHF.

Since it began in 2014, MD2K has successfully identified robust markers of stress, smoking, craving (eating/smoking), fatigue, eating and TV viewing (for detecting exposures to alcohol ads).

To identify and validate these biomarkers, MD2K researchers have correlated sensor data with self-reports known as Ecological Momentary Assessments (EMAs) to identify biomarkers within the sensor data. These biomarkers -- of stress, craving, smoking, drug use, and eating -- are exposed within the millions of bits of sensor data by using computational models that identify combinations within the sensor data.

For example, a wrist sensor might identify a hand movement to the mouth that could mean anything. But, when that information is combined with respiratory sensor that shows inhalation and exhalation, the moment a puff on a cigarette occures becomes clear.

Methodological advances resulted in novel methods for:

• Efficiently computing biomarker features from compressively sampled ECG data
• Structured prediction models to compute biomarkers in the presence of temporally imprecise labels
• Better lab-to-field generalizability in biomarker computation by addressing covariate shift and prior probability shift (in feature computation).

Data science research on discovery of mHealth predictors led to a new pattern mining approach for identifying a significant stress episode from the (minute-level) time-series of stress biomarkers and a discovery dashboard (with interactive motif discovery capabilities) to facilitate visual exploration of multivariate biomarker time series. Modeling advances resulted in a new latent state model for modeling patterns in mobile health event data. The model combines the benefits of a continuous time Hidden Markov Model (CT-HMM) for capturing patterns in event data with irregular arrival times with survival analysis, which provides interpretable models for predicting the risk of adverse events over future time intervals.

[1] WHO International Programme on Chemical Safety Biomarkers in Risk Assessment: Validity and Validation. 2001. Retrieved from

[2] Strimbu, K., & Tavel, J. A. (2010). What are Biomarkers? Current Opinion in HIV and AIDS, 5(6), 463–466.





Copyright © 2017 MD2K. MD2K is supported by the National Institutes of Health Big Data to Knowledge Initiative (Grant #1U54EB020404)

Team: Cornell Tech, GA Tech, Harvard, U. Memphis, Northwestern, Ohio State, Open mHealth, UCLA, UCSD, UCSF, UMass, U. Michigan, Utah, WVU