Team members

Mani Srivastava (Thrust 3 Lead, UCLA )
Timothy Hnat (Chief Software Architect, Memphis)
Tyson Condie (UCLA)
Simone Carini (UCSF)
Santosh Kumar (Memphis)
Syed Monowar Hossain (Memphis)
Nasir Ali (Memphis)
Nusrat Nasrin (Memphis).

Students, Post Docs
Bo-Jhang Ho (UCLA)
Matteo Interlandi (UCLA)
Addison Mayberry (UMass)

Mobile Sensor Big Data Architecture

Development and validation of any new mHealth biomarker requires conducting research studies in lab and field settings to collect raw sensor data with appropriate labels (e.g., self-reports). A general-purpose software platform that can enable such data collection consists of software on sensors, mobile phones, and the cloud, which all need to work together. Each of these software must be modular so as to enable seamless mix-and=match to customize it for various study needs. The software architecture for such a platform needs several attributes.

First, it must support concurrent connections to a wide variety of high-rate wearable sensors with an ability to plug-in new sensors.

Second, all three platforms must ingest the large volume of rapidly arriving data for which native support does not yet exist in the smartphone hardware or operating system without falling behind and losing data.

Third, it needs to support reliable storage of a quickly-growing volume of sensor data, the archival of which is critical to the development and validation of new biomarkers.

Fourth, it is desirable to quickly analyze incoming data to monitor signal quality so that any errors in sensor attachment or placement can be promptly fixed to maximize data yield.

Click on image for larger version
System Overview

There are three core users of mCerebrum and Cerebral Cortex. (1) The user wearing sensors and interacting with mCerebrum which uploads data to Cerebral Cortex, (2) the health science researcher that conduct studies, visualize field data, run population-scale analysis, and (3) the data science researcher that constructs models through machine learning, runs interative analysis through web-based dashboards, and buils scalable data analytics across large populations. (Click on image for larger version)

Fifth, the smartphone and/or the cloud needs to support the sense-analyze-act pipeline for high-rate streaming sensor data. This is necessary to prompt self-reports (for collection of labels) as well as confirm/refute prompts for validation of new biomarkers in the field. Sense-analyze-act support is also needed to aid development and evaluation of sensor-triggered interventions.

Sixth, it needs seamless sharing of streaming data from multiple sensors to enable computation of multi-sensor biomarkers (e.g., stress, smoking, eating).

Seventh, the platform needs to be general-purpose and extensible to support a wide variety of sensors, biomarkers, and study designs.

Eighth, it needs to be architecturally scalable so that it can support concurrent computation of a large number of biomarkers (each of which requires complex processing) without saturating the computational capacity or depleting the battery life of the smartphone.

Ninth, the smartphone platform needs to carefully control interruptions to study participants from various sources (e.g., self-report, ecological momentary assessment (EMA) and interventions (EMI), fixing sensor attachments) limiting user burden and cognitive overload while satisfying the numerous study requirements.

Tenth, the cloud platform must support concurrent data collection from hundreds, if not thousands of smartphone instances deployed in the field and reliably offload raw sensor data, derived features and biomarkers, and self-reports.

Eleventh, the cloud platform needs to provide a dashboard to remotely monitor the quality of data collection and participant compliance so as to intervene when necessary to ensure high data-yield.

Twelfth, for mobile sensor big data analytics, the cloud platform must support export of sensor data, features, biomarkers, and self-reports for population-scale analysis, as well as offer exploratory visualization and analysis. Last, but not least, the cloud platform must support annotation of data with metadata and provenance information so as to enable comparative analysis, reproducibility, and third party research.

The big data computing architectures for all three platforms (sensors, smartphones, and cloud) by MD2K are aimed at meeting all of the above requirements. The publications listed below provide details of these architectures.





Copyright © 2018 MD2K. MD2K is supported by the National Institutes of Health Big Data to Knowledge Initiative (Grant #1U54EB020404)
Team: Cornell Tech, GA Tech, Harvard, U. Memphis, Northwestern, Ohio State, Open mHealth, UCLA, UCSD, UCSF, UMass, U. Michigan, Utah, WVU