Longitudinal data extracted from electronic health record (EHR) data presents numerous opportunities for the development methodology of clinical decision support tools aimed at improving the delivery of healthcare. However, EHR data also pose many modeling challenges due to the intrinsic nature of such data, for instance, incompleteness, (not at random) missingness, temporal scale resolution, etc. Though all of these make it difficult to directly apply existing machine learning methodology to address problems of representation, identification, and prediction, we can leverage recent developments in learning and encoding architectures to create models explicitly designed to accommodate to such settings. In this seminar we will cover three recently introduced approaches motivated by real-world applications in healthcare that will highlight the existing challenges, how they were addressed and experimental results to support how these newly proposed models outperform existing alternatives.
Ricardo Henao, a quantitative scientist, is an Associate Professor in the Biological and Environmental Science and Engineering (BESE) Division, member of the Smart Health Initiative (SHI), at KAUST (King Abdullah University of Science and Technology). He is also currently an Associate Professor in the department of Biostatistics and Bioinformatics, Department of Electrical and Computer Engineering (ECE), member of the Information Initiative at Duke (iiD), Duke AI Health and the Duke Clinical Research Institute (DCRI), all at Duke University. The theme of his research is the development of novel statistical methods and machine learning algorithms primarily based on probabilistic modeling. His expertise covers several fields including applied statistics, signal processing, pattern recognition and machine learning. His methods research focuses on hierarchical or multilayer probabilistic models to describe complex data, such as that characterized by high-dimensions, multiple modalities, more variables than observations, noisy measurements, missing values, time-series, multiple modalities, etc., in terms of low-dimensional representations for the purposes of hypothesis generation and improved predictive modeling. Most of his applied work is dedicated to the analysis of biological data such as gene expression, medical imaging, clinical narrative, and electronic health records. His recent work has been focused on the development of sophisticated machine learning models, including deep learning approaches, for the analysis and interpretation of clinical and biological data with applications to predictive modeling for diverse clinical outcomes.