Resume
Work Experience
October 2021 - Present
Graduate Research Assistant
BCHSI - UCSF
Leveraged real-world evidence from longitudinal EHR data, claims, and clinical notes to model disease progression, identify early diagnostic signals, and quantify risk factors in neurodegenerative diseases. Applied statistical modeling (Cox regression, causal mediation), machine learning (Naïve Bayes, logistic regression, random forests, SVMs), and NLP (BERT, cTAKES) to build interpretable, end-to-end pipelines for patient phenotyping, trajectory modeling, and risk stratification. Designed workflows encompassing study design, temporal modeling, feature engineering, quality assurance, and model validation, ensuring clinical relevance and reproducibility. Collaborated with clinicians, and data engineers to integrate multi-modal data sources and deliver scalable, actionable insights for healthcare research.
April 2021 -Sept 2021
Staff Research Associate
Memory and Aging Center - UCSF
Developed, trained, and deployed a convolutional neural network (CNN) for automated segmentation of post-mortem brain tissue images, reducing manual annotation time by up to 80% in neuropathology studies. Implemented the pipeline on AWS (EC2, S3, JupyterHub) for scalable, cloud-based image analysis, enabling large-scale processing across male and female aging cohorts. Designed image preprocessing, model training, and evaluation workflows to ensure accuracy, reproducibility, and adaptability for downstream biomarker discovery and drug target validation in Alzheimer’s disease research.
Aug 2020 - Jan 2021
Computer Systems Engineer
Lawrence Berkeley National Lab
Developed an end-to-end deep learning imaging pipeline using U-Net segmentation to detect neurons in immunofluorescent brain tissue, reducing manual analysis time and improving reproducibility. Designed and implemented image preprocessing, augmentation, model training, and evaluation workflows, enabling scalable application across multiple research groups. Led the deployment and user adoption of the tool in two laboratories, supporting high-throughput, reproducible image analysis for neuroscience research.
June 2019 - Aug 2020
Graduate Research Assistant
Lawrence Berkeley National Lab
Designed and optimized deep learning-based image analysis frameworks for materials science applications, focusing on 3D imaging of fibrous materials. Leveraged convolutional neural networks (CNNs) and high-performance computing (multi-core CPUs, GPUs) to accelerate large-scale imaging pipelines and improve scalability for high-throughput experiments. Developed data preprocessing, volumetric segmentation, and classification workflows that reduced processing time by over 50%, enabling faster experimental cycles and more reproducible results.
Jan 2019 - May 2019
Undergraduate Research Intern
Lawrence Berkeley National Lab
Developed image analyses models for automated cell segmentation in pap smear microscopy images, prototyping a tool to assist pathologists by reducing the inspection area by ~70%. Designed image preprocessing, segmentation, and evaluation pipelines to accelerate pathology review, improve workflow efficiency, and demonstrate the potential for scalable, automated screening in cytology.
Education
Sept 2021 - Aug 2025
Doctorate in Biological and Medical Informatics
University of California, San Francisco
Advisors
Marina Sirota, PhD and M.Maria Glymour, ScD
May 2019 - Aug 2020
Master of Information and Data Science
University of California, Berkeley
Aug 2016 - Dec 2018
Bachelor of Arts in Applied Mathematics: Data Science
University of California, Berkeley
Skills & Expertise
-
Programming & Data Analysis: Python (pandas, NumPy, scikit-learn, SciPy), R, SQL, PySpark, Bash
-
Machine Learning & AI: Random Forests, Support Vector Machines (SVMs), Logistic Regression, Naive Bayes, Transformers, Clustering, CNNs, U-Net, BERT (fine-tuned), TensorFlow, PyTorch, Keras
-
Natural Language Processing (NLP): Named Entity Recognition (NER), Text Classification, Sentiment Analysis, Summarization, cTAKES, spaCy, Hugging Face Transformers, OCR (AWS Rekognition, Tesseract), Tokenization
-
Statistical Modeling & Epidemiology: Cox proportional hazards models, Causal mediation analysis, Linear and logistic regression, Longitudinal mixed models, Splines, Multiple imputation
-
Data Modalities: Structured & unstructured EHR, Clinical notes, Radiology reports, Biomarker time series, Neuroimaging, Microscopy and 3D-imaging.
-
Cloud & Infrastructure: AWS (EC2, S3, Lambda, SageMaker, DynamoDB), Google Cloud Platform (multi-core CPUs), Docker, JupyterHub
-
Tools & Visualization: Git, GitHub, matplotlib, seaborn.
-
Research Strengths: Longitudinal data analysis, Disease trajectory modeling, Diagnostic prediction, Real-world evidence (RWE), Interpretable ML, Reproducible pipelines
-
Collaborative Experience: Cross-functional work with clinicians, neuropsychologists, imaging scientists, and epidemiologists across academic and clinical research settings.