Selected Publications and Projects
Reconstructing Patient Trajectories Prior to Mild Cognitive Impairment (MCI) Diagnosis
Developed a pipeline to identify patients with MCI using both structured data (ICD codes) and unstructured data (clinical notes) from EHRs, reconstructing their disease progression trajectories in the years preceding MCI diagnosis. Integrated NLP-derived features from notes with structured fields such as prior diagnoses and demographics to build a comprehensive longitudinal view of patient encounters. This approach supports early signal detection, characterization of pre-diagnostic patterns, and more precise cohort definition for dementia research.​
​
Contributions:
Project Lead
-
Designed and implemented a pipeline to extract MCI cases from structured ICD codes and unstructured clinical notes.
-
Integrated NLP outputs with structured EHR variables (prior diagnoses, demographics) to build longitudinal patient trajectories.
-
Developed methods to detect early diagnostic signals and characterize pre-diagnostic patterns in dementia.


Diagnostic Reversion in Dementia Care: MCI after Dementia?
Investigated diagnostic reversion—cases where mild cognitive impairment (MCI) is diagnosed after dementia—in a cohort of 5,965 patients aged 50+ from UCSF Health EHRs. Found reversion occurred in 13.7% of patients and was associated with language (higher odds in Spanish speakers), cardiovascular risk, and comorbidity burden, while older age predicted lower odds. Applied Group LASSO–regularized logistic regression and random forests to identify predictors. Results suggest reversion is common and socially patterned, reflecting potential misdiagnosis, clinical uncertainty, or variation in presentation across care settings.​
​
Full Manuscript now available on medRxiv.
[currently under review]
​
Contributions:
Project Lead
-
Designed and executed EHR-based study on diagnostic reversion in dementia.
-
Applied machine learning and statistical models to identify key predictors.
-
Integrated clinical, demographic, and comorbidity data for analysis.
-
Interpreted findings and prepared results for publication.

Association of Cholesterol Values with Cognitive Outcomes Over a 14-year Period
The impact of cholesterol on late-life cognition remains uncertain. In 13,258 adults aged 50+ from the Health and Retirement Study (2006–2020), higher HDL-C was associated with slightly better memory scores, while higher non–HDL-C was linked to slightly poorer scores. Neither measure predicted memory decline over time, suggesting cholesterol has only a small influence on memory variation in later life.
​
Contributions:
-
Designed and executed statistical analysis plan using linear mixed models.
-
Curated and prepared longitudinal cholesterol and cognition data.
-
Engineered baseline and time-updated cholesterol variables.
-
Built models, generated figures, and summarized results.
-
Interpreted findings and co-led manuscript writing.

News Filtering System
With an effort to mitigate the bias and misinformation in news articles, we confront the discernible subjectivity in news platforms by developing a news filtering model that summarizes and maintains the valuable content of published material. Our model prompts users to input a news topic, and in return, they receive a paragraph summary of content related to the given topic. The outputted information is obtained as a result of the application of clustering and extractive summarization techniques.
​
Contributions:
Algorithm Lead
-
Co-led the implementation of algorithms for the least biased extractive summarization of news articles. ​
-
Utilization of multi-core CPUs in Google Cloud Platform for model performance efficiency in evaluation and calculations.
Gredient Mobile Application
Gredient was a mobile application developed to focus on safeguarding the health and improving shopping experiences of the 32 million American consumers with food allergies by helping them easily check the contents of ingredient labels. The Gredient iOS app is based on an optical character recognition model hosted in a serverless web architecture that can scale to allow millions of people to use our service.
Contributions:
Algorithm and Data Architecture Lead
-
Spearheaded the algorithm development for optical character recognition (OCR) of ingredients on the image of food product ingredient labels.
-
Implemented and optimized OCR model to assess customers on the ingredient safety of food products.
-
Co-led the back-end design of severless architecture with AWS Lambda, DynamoDB, SageMaker and Rekognition.